寻找字母顺序和自然顺序的组合(又名用户理智排序)

我认为这很容易找到premade,但似乎我在网上找到的任何解决方案只能解决部分问题。

我想对用户提供的文件名列表(文件大多以人和/或地址命名),有时用不同语言(大多数是德语,有点法语和意大利语混合在一起) ,很少有任何其他西方语言)。

我们的想法是以(德国)用户普遍认为合理的方式呈现此列表。 这意味着订单应该遵循Locale.GERMANjava.text.Collat​​or ,但同时期望是对字符串中的数字进行例外处理,因此“10”在“2”之后。

我找到了在网上进行自然排序的代码,但它依赖于逐字符比较(而Collat​​or不支持)。 我可以使用子字符串来破解某些内容,但在比较器内部,我认为在每次比较调用时创建多个子字符串并不是最明智的想法。

任何想法如何有效地实现这一点(在执行时间和实现时间),或者更好的是经过测试和即用的实现?

如果您使用@millimoose(http://www.davekoelle.com/alphanum.html)建议的比较器,请将其修改为通过Collat​​or

public class AlphanumComparator implements Comparator { private Collator collator; public AlphanumComparator(Collator collator) { this.collator = collator; } ..... public int compare(Object o1, Object o2) { ...... result = thisChunk.compareTo(thatChunk); //should become collator.compare(thisChuck, thatChuck); .... 

这段代码似乎有问题,例如“01”更大,然后是“2”。 但这取决于您的偏好,如果这很重要,请将其修改为在数字比较之前跳过前导零。

这是适用的代码(基于Alphanum算法 ),如在接受的答案中。 优化代码以减少垃圾创建并处理前导零(01 <001 <2)。 它也是一般化的,现在更灵活,因为它不再局限于java.lang.String,而是现在需要java.lang.CharSequence。 玩得开心:

 import java.text.Collator; import java.util.Comparator; /** * Comparator for ordering by Collator while treating digits numerically. * This provides a "natural" order that humans usually perceive as 'logical'. * * It should work reasonably well for western languages (provided you * use the proper collator when constructing). For free control over the * Collator, use the constructor that takes a Collator as parameter. * Configure the Collator using Collator.setDecomposition()/setStrength() * to suit your requirements. */ public class AlphanumComparator implements Comparator { /** * The collator used for comparison of the alpha part */ private final Collator collator; /** * Create comparator using platform default collator. * (equivalent to using Collator.getInstance()) */ public AlphanumComparator() { this(Collator.getInstance()); } /** * Create comparator using specified collator */ public AlphanumComparator(final Collator collator) { if (collator == null) throw new IllegalArgumentException("collator must not be null"); this.collator = collator; } /** * Ideally this would be generalized to Character.isDigit(), but I have * no knowledge about arabic language and other digits, so I treat * them as characters... */ private static boolean isDigit(final int character) { // code between ASCII '0' and '9'? return character >= 48 && character <= 57; } /** * Get subsequence of only characters or only digits, but not mixed */ private static CharSequence getChunk(final CharSequence charSeq, final int start) { int index = start; final int length = charSeq.length(); final boolean mode = isDigit(charSeq.charAt(index++)); while (index < length) { if (isDigit(charSeq.charAt(index)) != mode) break; ++index; } return charSeq.subSequence(start, index); } /** * Implements Comparator.compare */ public int compare(final CharSequence charSeq1, final CharSequence charSeq2) { final int length1 = charSeq1.length(); final int length2 = charSeq2.length(); int index1 = 0; int index2 = 0; int result = 0; while (result == 0 && index1 < length1 && index2 < length2) { final CharSequence chunk1 = getChunk(charSeq1, index1); index1 += chunk1.length(); final CharSequence chunk2 = getChunk(charSeq2, index2); index2 += chunk2.length(); if (isDigit(chunk1.charAt(0)) && isDigit(chunk2.charAt(0))) { final int clen1 = chunk1.length(); final int clen2 = chunk2.length(); // count and skip leading zeros int zeros1 = 0; while (zeros1 < clen1 && chunk1.charAt(zeros1) == '0') ++zeros1; // count and skip leading zeros int zeros2 = 0; while (zeros2 < clen2 && chunk2.charAt(zeros2) == '0') ++zeros2; // the longer run of non-zero digits is greater result = (clen1 - zeros1) - (clen2 - zeros2); // if the length is the same, the first differing digit decides // which one is deemed greater. int subi1 = zeros1; int subi2 = zeros2; while (result == 0 && subi1 < clen1 && subi2 < clen2) { result = chunk1.charAt(subi1++) - chunk2.charAt(subi2++); } // if still no difference, the longer zeros-prefix is greater if (result == 0) result = subi1 - subi2; } else { // in case we are working with Strings, toString() doesn't create // any objects (String.toString() returns the same string itself). result = collator.compare(chunk1.toString(), chunk2.toString()); } } // if there was no difference at all, let the longer one be the greater one if (result == 0) result = length1 - length2; // limit result to (-1, 0, or 1) return Integer.signum(result); } } 

编辑2014-12-01:Konstantin Petrukhnov在评论中指出的修正版本。