Is That Really Sorted?
Sometimes it's the little things that can drive you to madness. I frankly admit it - many times I was looking for complicated errors and it was an insignificant detail I had ignored. This occasionally tends to happen when sorting data.
Here it seems to be quite obvious, which order would result, if you sort in ascending order.
However, this result is somewhat surprising. I would not have expected to find the "9" at the end, behind "1111" or "42", for example. 9 is certainly the smaller number, isn't it?
If there are already such strange results with numbers, what might it look like with letters?
It was almost to be expected that the results would look like this ... first all uppercase letters, then the lowercase letters, then all with accent.
Recommended by LinkedIn
Why these strange results? Now, in the numbers-only example, there are two influencing factors (or ways to get the expected result).
One reason is quite simple - just because all values are numeric the datatype itself will not be numeric.
The data is loaded from a csv file using the Input Data Tool, and so all fields have a string data type at first. If we change the data type to "double", the sorting will also look completely different:
The second reason is related to a setting in the Sort tool, where you can choose between two fundamentally different approaches:
As so often, you get exactly what you wanted. The Unicode sort order sorts according to the order of the characters in the Unicode table. First the digits, then the upper case letters, then the lower case letters, finally letters with accent. By the way, ">", "<" and colon would be sorted between numbers and capital letters, the comma before the numbers.
This is exactly the order we have chosen here - "Use Dictionary Order" is not selected, so Unicode sort order is used. In general, all values are sorted character by character (it is sorted first by the first character, then by the second, and so on, regardless of whether it is numeric or string data).
The same setting was effective for the string data, and the result was an order according to the Unicode table.
If we change this (i.e. select the "Use Dictionary Order" option) it will be sorted by letter - first lowercase, then uppercase, then the same order for the letter with accent).
If you select the dictionary order, you can additionally set the exact language (i.e. take into account the rules in the respective country). In some cases, it even goes beyond that - in Germany, there are two different standards, which are especially about the classification of German letters with accent ("Ä" = "A" or "Ä" = "AE").
So you should keep in mind that if you choose a "local" version, there may be some special rules that might cause an unexpected result. Most problems can be avoided if you pay attention to the data type and set the sorting method (Unicode/Dictionary) correctly.