Natural order for strings with numbers in Java
File names default sort order
If you are developing an application which works with files, one day you’ll try to get list of the names of all files in a particular directory. It could be like this simplified snippet:
public List<String> getAllImageNames() {
List<String> names = new ArrayList<String>();
File imagesDir = new File(IMAGE_BASE_DIRECTORY));
if (!imagesDir.exists()) {
return names;
}
for (File image : imagesDir.listFiles(COMMON_IMG_FILTER)) {
names.add(image.getName());
}
return names;
}
private static final FileFilter COMMON_IMG_FILTER = new FileFilter() {
public boolean accept(File pathname) {
return pathname.isFile() && (!pathname.getName().startsWith("."));
}
};
Let’s assume that you have some image files with the same name suffix and an index number as a trailing part:
| System order | Natural order |
|---|---|
| IMG_1.jpg | IMG_1.jpg |
| IMG_2.jpg | IMG_2.jpg |
| IMG_21.jpg | IMG_3.jpg |
| IMG_22.jpg | IMG_21.jpg |
| IMG_3.jpg | IMG_22.jpg |
Everything seems to be all right, but returned list is sorted in a way Java usually sorts strings. While this results could completely satisfy your program API, it is not very useful to work with such lists for a human. Moreover, such lists are very common in our life, for example, many digital photo cameras store pictures with such names.
Let’s make the problem more general. We have a list of strings, which could contain digits, and we want to sort this array to get a natural ordered list. You’d say that it is piece of cake, because Java has good facility to perform array/list sorting : Collections.sort() and Comparator. Therefore, we need an appropriate Comparator implementation.
Known solutions for natural order sorting
Unfortunately, there are no any well-known library like Jakarta Commons to perform this sort. And if you try to google it, you will find only some posts like this one with home-grown solutions and grumbling about lack of existing proven solution. The most important thing is to know the right keywords to google against it. Most of articles on this topic are about Natural order.
I found some valuable resources:
- Natural String Order at Stephen Friedrich’s Blog - an article of Java professional with working Java source and even JUnit tests. The implemented algorithm is very comprehensive and can work with different set of sorting rules (Ascii, case insensitive and locale specific).
- HumaneStringComparator: Sorting Strings for People - another one Comparator implementation by Tim Fennell.
- Implementation by Pierre-Luc Paour.
- Natural Order String Comparison - C version of implementation and links to another languages.
- Natural Order Numerical Sorting - overview of the problem, links to another articles and solutions, but almost all of them are broken or not Java. It seemed that the author was keen about that idea, registered a special domain and gathered essential information, and then left this site floating without any maintain.
Stress test
One of the main software development rule states: don’t invent your own wheel unless the problem is not your core business. As string sorting was a utility purpose matter, before implementing my own solution I decided to test already found. The test was easy and straightforward - every Comparator had to sort the same number of randomly generated and shuffled strings. The strings were generated by fixed pattern: [a-z][0..1000].jpg. Input data was the large array (100000) of such strings:
..............
jqazrrveqy113.jpg
wzedsgzmvo912.jpg
aayexqpfdu311.jpg
zvpzxjwkml354.jpg
nelacribtl964.jpg
rehsgmzugb244.jpg
eoptzxybtz459.jpg
ukbeogpmhe157.jpg
zgvnrzohwc176.jpg
.............. .
I was interested in only one algorithm efficiency parameter - time elapsing during full sort. As usual, the elapsing time itself doesn’t tell much about the efficiency. To have a minimum achievable value to compare with I chose a result given by the comparator based on a standard Java String.compareTo() method which should be the fastest because it doesn’t deal with string parsing.
public int compare(String strA, String strB) {
// Assuming that strA always != null
return strA.compareTo(strB);
}
};
The test results are shown in a table:
| Author | Version | Sort Time | Pros | Cons | |
|---|---|---|---|---|---|
| Pierre-Luc Paour | NaturalOrderComparator | 453ms | Quite fast | hard to read C-style algorithm | |
| Stephen Friedrich | NaturalComparator | 4828ms | Locale specific | Too slow | |
| Stephen Friedrich | NaturalComparatorAscii | 360ms | The fastest one | Only ASCII | |
| Stephen Friedrich | NaturalComparatorIgnoreCaseAscii | 500ms | Fast enough, case insensitive | Only ASCII | |
| Tim Fennell | HumaneStringComparator | 4797ms | Very good example of OOP style, brief understandable algorithm, use Java facilities extensively | Too slow and inefficient | |
| Java native | String.compareTo() | 235ms | standard | standard |
Conclusion
As you can see from the result sheet, the most powerful and fast is Stephen Friedrich’s package. It is written with good Java style, well commented and supplied with JUnit tests. It provides different kind of sorting depends on what are going to sort - simple ASCII strings like file names or country specific strings containing special characters.
Of course, one day I’d like to see any well-proven solution in the Jakarta Commons project and simply add it as a jar.



June 1st, 2007 at 2:42 pm
Thank you for the good review of the problem. It would be interesting to look through these methods implementations.
June 4th, 2007 at 4:58 am
Interesting read! Thanks for writing such a great post.
June 4th, 2007 at 11:38 pm
Thank guys for the support! It’s much easier to write when you have some faithful readers
June 6th, 2007 at 1:47 am
You can actually write such a technical articles to much more wider audience, they will appreciate it!
Would you like to get a wider audience for your articles?