Is That Really Sorted?

Roland Schubert

Published Dec 20, 2021

Sometimes it's the little things that can drive you to madness. I frankly admit it - many times I was looking for complicated errors and it was an insignificant detail I had ignored. This occasionally tends to happen when sorting data.

Es wurde kein Alt-Text für dieses Bild angegeben.

Here it seems to be quite obvious, which order would result, if you sort in ascending order.

However, this result is somewhat surprising. I would not have expected to find the "9" at the end, behind "1111" or "42", for example. 9 is certainly the smaller number, isn't it?

If there are already such strange results with numbers, what might it look like with letters?

It was almost to be expected that the results would look like this ... first all uppercase letters, then the lowercase letters, then all with accent.

Recommended by LinkedIn

Sets in Dart

Audrey Otieno 1 year ago

Singly Linked List Algorithms in Swift

Neha Sharma  6 years ago

Where do we use them?

Reghuram Karunamurthi 9 months ago

Why these strange results? Now, in the numbers-only example, there are two influencing factors (or ways to get the expected result).

One reason is quite simple - just because all values are numeric the datatype itself will not be numeric.

The data is loaded from a csv file using the Input Data Tool, and so all fields have a string data type at first. If we change the data type to "double", the sorting will also look completely different:

The second reason is related to a setting in the Sort tool, where you can choose between two fundamentally different approaches:

Dictionary Sort Order
Unicode Sort Order

As so often, you get exactly what you wanted. The Unicode sort order sorts according to the order of the characters in the Unicode table. First the digits, then the upper case letters, then the lower case letters, finally letters with accent. By the way, ">", "<" and colon would be sorted between numbers and capital letters, the comma before the numbers.

This is exactly the order we have chosen here - "Use Dictionary Order" is not selected, so Unicode sort order is used. In general, all values are sorted character by character (it is sorted first by the first character, then by the second, and so on, regardless of whether it is numeric or string data).

The same setting was effective for the string data, and the result was an order according to the Unicode table.

If we change this (i.e. select the "Use Dictionary Order" option) it will be sorted by letter - first lowercase, then uppercase, then the same order for the letter with accent).

If you select the dictionary order, you can additionally set the exact language (i.e. take into account the rules in the respective country). In some cases, it even goes beyond that - in Germany, there are two different standards, which are especially about the classification of German letters with accent ("Ä" = "A" or "Ä" = "AE").

So you should keep in mind that if you choose a "local" version, there may be some special rules that might cause an unexpected result. Most problems can be avoided if you pay attention to the data type and set the sorting method (Unicode/Dictionary) correctly.

To view or add a comment, sign in

Is That Really Sorted?

Roland Schubert

Recommended by LinkedIn

More articles by Roland Schubert

Others also viewed

HashSet mystery

Variable Names, Some tips

Unlocking the Power of Tries: A Deep Dive into Efficient String Manipulation

Sherlock and anagrams (hacker rank) from the Dictionaries and Hashmaps section + extensive time and space complexity analysis.

OUTER APPLY — Cleaning Duplicate or Multi-Row Data Without CTE Hell

The Anatomy of Data

Everything you need to know about pointers. Part 1

USERELATIONSHIP () in DAX

Sorting left-justified primary keys in Unidata

Tables vs. Text: The Handling of Structured Data in a RAG System

Explore content categories

Recommended by LinkedIn

More articles by Roland Schubert

Time For Summary

A Little Bit More: Oversampling

Building Groups Based on Relations

Different Types of Correlation

Grouping Data

Comparing To Lists

Famous (or Not-So-Famous) Last Words

Break on Error

Year-To-Date Calculations

Compare Date and Time

Others also viewed

HashSet mystery

Variable Names, Some tips

Unlocking the Power of Tries: A Deep Dive into Efficient String Manipulation

Sherlock and anagrams (hacker rank) from the Dictionaries and Hashmaps section + extensive time and space complexity analysis.

OUTER APPLY — Cleaning Duplicate or Multi-Row Data Without CTE Hell

The Anatomy of Data

Everything you need to know about pointers. Part 1

USERELATIONSHIP () in DAX

Sorting left-justified primary keys in Unidata

Tables vs. Text: The Handling of Structured Data in a RAG System

Explore content categories