Confusion in the natural sorting order

Most graphical file managers don't sort strictly lexicographically, but rather use "natural" sorting. Blocks of numbers in names are interpreted as numbers—the larger block of numbers wins, even if the opposite would be true alphabetically. The idea behind natural sorting: What people usually want is "9 before 10," "Chapter 2 before Chapter 10"—without having to add leading zeros.


The following file pairs are naturally sorted in ascending order as follows:

  • build-9e2.log
  • build-950.log

Amazing, but explainable: The first digit \(9\) is smaller than the first block of digits \(950\) .

  • IMG_12113419_90.jpg
  • IMG_0554363070_90.jpg

The number \(12113419\) is less than \(554363070\) (the leading \(0\) is removed).

  • temp_0C.txt
  • temp_2C.txt
  • temp_-3C.txt
  • temp_10C.txt
  • temp_-12C.txt

The numbers compared are \(0\) , \(2\) , \(3\) , \(10\) , \(12\) – the “-” is not considered part of the number.

Even "alphabetical" isn't globally unambiguous: Capitalization, umlauts like ä (German), or multi-character letters like ch (Czech) lead to legitimate variants. "Purely alphabetical" is therefore context-dependent. Windows Explorer implements this in the StrCmpLogicalW function. While its source code (shlwapi.dll) is proprietary and not public, there are reimplementations, for example, from ReactOS.:

{
    TRACE("%s, %s\n", wine_dbgstr_w(str), wine_dbgstr_w(comp));
 
    if (!str || !comp)
        return 0;
 
    while (*str)
    {
        if (!*comp)
            return 1;
        else if (*str >= '0' && *str <= '9')
        {
            int str_value, comp_value;
 
            if (*comp < '0' || *comp > '9')
                return -1;
 
            /* Compare the numbers */
            StrToIntExW(str, 0, &str_value);
            StrToIntExW(comp, 0, &comp_value);
 
            if (str_value < comp_value)
                return -1;
            else if (str_value > comp_value)
                return 1;
 
            /* Skip */
            while (*str >= '0' && *str <= '9') str++;
            while (*comp >= '0' && *comp <= '9') comp++;
        }
        else if (*comp >= '0' && *comp <= '9')
            return 1;
        else
        {
            int diff = ChrCmpIW(*str, *comp);
            if (diff > 0)
                return 1;
            else if (diff < 0)
                return -1;
 
            str++;
            comp++;
        }
    }
 
    if (*comp)
      return -1;
 
    return 0;
}

Google Drive, OneDrive, KDE, and others display a similar sorting behavior. CLI tools like ls and find However, they sort differently than GUI file managers. Semantics are in the file names, not in the API. If you want results without surprises, define conventions: consistent separators, padded numbers, and clear handling of units. Then "alphabetical" becomes predictable again.

Back