按频率顺序排序单词？（最少到最大）

有没有人知道如何使用内置的collection.sort和comparator接口按照频率（从最小到最大）的顺序对单词列表进行排序？

我已经有一个方法可以获取文本文件中某个单词的计数。现在，我只需要创建一个方法来比较每个单词的计数，然后将它们放在按最小频率排序到最大值的列表中。

任何想法和提示将非常感谢。我在开始使用这种特殊方法时遇到了麻烦。

 public class Parser implements Comparator { public Map wordCount; void parse(String filename) throws IOException { File file = new File(filename); Scanner scanner = new Scanner(file); //mapping of string -> integer (word -> frequency) Map wordCount = new HashMap(); //iterates through each word in the text file while(scanner.hasNext()) { String word = scanner.next(); if (scanner.next()==null) { wordCount.put(word, 1); } else { wordCount.put(word, wordCount.get(word) + 1);; } } scanner.next().replaceAll("[^A-Za-z0-9]"," "); scanner.next().toLowerCase(); } public int getCount(String word) { return wordCount.get(word); } public int compare(String w1, String w2) { return getCount(w1) - getCount(w2); } //this method should return a list of words in order of frequency from least to greatest public List getWordsInOrderOfFrequency() { List wordsByCount = new ArrayList(wordCount.values()); //this part is unfinished.. the part i'm having trouble sorting the word frequencies List result = new ArrayList(); } }

首先你对scanner.next()似乎不正确。 next()将返回下一个单词并在每次调用时移动到下一个单词，因此以下代码：

 if(scanner.next() == null){ ... }

并且

 scanner.next().replaceAll("[^A-Za-z0-9]"," "); scanner.next().toLowerCase();

将消耗，然后只是扔掉的话。你可能想做的是：

 String word = scanner.next().replaceAll("[^A-Za-z0-9]"," ").toLowerCase();

在while循环的开头，这样你的单词的变化就会保存在word变量中，而不会被丢弃。

其次， wordCount映射的使用略有破坏。你想要做的是检查这个word是否已经在地图中，以决定要设置的字数。要做到这一点，不要检查scanner.next() == null ，而应该查看地图，例如：

 if(!wordCount.containsKey(word)){ //no count registered for the word yet wordCount.put(word, 1); }else{ wordCount.put(word, wordCount.get(word) + 1); }

或者你可以这样做：

 Integer count = wordCount.get(word); if(count == null){ //no count registered for the word yet wordCount.put(word, 1); }else{ wordCount.put(word, count+1); }

我更喜欢这种方法，因为它更清洁，并且每个单词只查找一个地图，而第一种方法有时会进行两次查找。

现在，要获得按频率降序排列的单词列表，您可以先将地图转换为列表，然后按照本文中的建议应用Collections.sort() 。以下是适合您需求的简化版本：

 static List getWordInDescendingFreqOrder(Map wordCount) { // Convert map to list of  entries List> list = new ArrayList>(wordCount.entrySet()); // Sort list by integer values Collections.sort(list, new Comparator>() { public int compare(Map.Entry o1, Map.Entry o2) { // compare o2 to o1, instead of o1 to o2, to get descending freq. order return (o2.getValue()).compareTo(o1.getValue()); } }); // Populate the result into a list List result = new ArrayList(); for (Map.Entry entry : list) { result.add(entry.getKey()); } return result; }

希望这可以帮助。

编辑：更改了@ dragon66建议的比较function。谢谢。

您可以从以下内容中比较和提取想法：

 public class FrequencyCount { public static void main(String[] args) { // read in the words as an array String s = StdIn.readAll(); // s = s.toLowerCase(); // s = s.replaceAll("[\",!.:;?()']", ""); String[] words = s.split("\\s+"); // sort the words Merge.sort(words); // tabulate frequencies of each word Counter[] zipf = new Counter[words.length]; int M = 0; // number of distinct words for (int i = 0; i < words.length; i++) { if (i == 0 || !words[i].equals(words[i-1])) // short-circuiting OR zipf[M++] = new Counter(words[i], words.length); zipf[M-1].increment(); } // sort by frequency and print Merge.sort(zipf, 0, M); // sorting a subarray for (int j = M-1; j >= 0; j--) { StdOut.println(zipf[j]); } } }

一个解决方案，接近您原来的post，其中包含Torious在评论中建议的更正和排序：

 import java.util.*; public class Parser implements Comparator  { public Map wordCount; void parse () { Scanner scanner = new Scanner (System.in); // don't redeclare it here - your attribute wordCount will else be shadowed wordCount = new HashMap (); //iterates through each word in the text file while (scanner.hasNext ()) { String word = scanner.next (); // operate on the word, not on next and next of next word from Scanner word = word.replaceAll (" [^A-Za-z0-9]", " "); word = word.toLowerCase (); // look into your map: if (! wordCount.containsKey (word)) wordCount.put (word, 1); else wordCount.put (word, wordCount.get (word) + 1);; } } public int getCount (String word) { return wordCount.get (word); } public int compare (String w1, String w2) { return getCount (w1) - getCount (w2); } public List getWordsInOrderOfFrequency () { List justWords = new ArrayList (wordCount.keySet()); Collections.sort (justWords, this); return justWords; } public static void main (String args []) { Parser p = new Parser (); p.parse (); List ls = p.getWordsInOrderOfFrequency (); for (String s: ls) System.out.println (s); } }

rodions解决方案是一种generics地狱，但我没有简单 – 只是不同。

最后，他的解决方案更短更好。

在第一次看起来，似乎TreeMap可能是合适的，但它按键排序，并且按值排序没有帮助，我们无法切换键值，因为我们通过键查找它。

所以下一个想法是生成HashMap，并使用Collections.sort，但它不需要Map，只需要列表进行排序。在Map中，有一个entrySet，它生成另一个Collection，它是一个Set，而不是List。那是我采取另一个方向的点：

我实现了一个Iterator：我遍历entrySet，只返回Keys，其值为1.如果值为2，我将它们缓冲以供以后使用。如果Iterator耗尽，我会查看缓冲区，如果它不为空，我将来使用缓冲区的迭代器，增加我寻找的最小值，并创建一个新的Buffer。

迭代器/可迭代对的优点是，可以通过简化的for循环获得这些值。

 import java.util.*; // a short little declaration :) public class WordFreq implements Iterator >, Iterable > { private Map  counter; private Iterator > it; private Set > buf; private int maxCount = 1; public Iterator > iterator () { return this; } // The iterator interface expects a "remove ()" - nobody knows why public void remove () { if (hasNext ()) next (); } public boolean hasNext () { return it.hasNext () || ! buf.isEmpty (); } public Map.Entry  next () { while (it.hasNext ()) { Map.Entry  mesi = it.next (); if (mesi.getValue () == maxCount) return mesi; else buf.add (mesi); } if (buf.isEmpty ()) return null; ++maxCount; it = buf.iterator (); buf = new HashSet > (); return next (); } public WordFreq () { it = fill (); buf = new HashSet > (); // The "this" here has to be an Iterable to make the foreach work for (Map.Entry  mesi : this) { System.out.println (mesi.getValue () + ":\t" + mesi.getKey ()); } } public Iterator > fill () { counter = new HashMap  (); Scanner sc = new Scanner (System.in); while (sc.hasNext ()) { push (sc.next ()); } Set > set = counter.entrySet (); return set.iterator (); } public void push (String word) { Integer i = counter.get (word); int n = 1 + ((i != null) ? i : 0); counter.put (word, n); } public static void main (String args[]) { new WordFreq (); } }

由于我的解决方案从stdin读取，因此您使用以下命令调用它：

 cat WordFreq.java | java WordFreq

按频率顺序排序单词？（最少到最大）

Jaxen仍然得到积极的支持/发展吗？

在Jersey中使用名称绑定注释

JComboBox对象实例中的项

如何使用类自定义来解决从xsd生成jaxb对象时的冲突

如何在java中以yyyyMMddHHmm格式接收输入？

需要用附件打开ms outlook

java.lang.NumberFormatException：用于JSP页面中的输入字符串

用假会话测试控制器

如何在所有子构造函数中自动包含父方法的执行？

从Java应用程序中执行Pig

按频率顺序排序单词？ （最少到最大）

Jaxen仍然得到积极的支持/发展吗？

在Jersey中使用名称绑定注释

JComboBox对象实例中的项

如何使用类自定义来解决从xsd生成jaxb对象时的冲突

如何在java中以yyyyMMddHHmm格式接收输入？

需要用附件打开ms outlook

java.lang.NumberFormatException：用于JSP页面中的输入字符串

用假会话测试控制器

如何在所有子构造函数中自动包含父方法的执行？

从Java应用程序中执行Pig

按频率顺序排序单词？（最少到最大）