按频率顺序排序单词? (最少到最大)

有没有人知道如何使用内置的collection.sortcomparator接口按照频率(从最小到最大)的顺序对单词列表进行排序?

我已经有一个方法可以获取文本文件中某个单词的计数。 现在,我只需要创建一个方法来比较每个单词的计数,然后将它们放在按最小频率排序到最大值的列表中。

任何想法和提示将非常感谢。 我在开始使用这种特殊方法时遇到了麻烦。

 public class Parser implements Comparator { public Map wordCount; void parse(String filename) throws IOException { File file = new File(filename); Scanner scanner = new Scanner(file); //mapping of string -> integer (word -> frequency) Map wordCount = new HashMap(); //iterates through each word in the text file while(scanner.hasNext()) { String word = scanner.next(); if (scanner.next()==null) { wordCount.put(word, 1); } else { wordCount.put(word, wordCount.get(word) + 1);; } } scanner.next().replaceAll("[^A-Za-z0-9]"," "); scanner.next().toLowerCase(); } public int getCount(String word) { return wordCount.get(word); } public int compare(String w1, String w2) { return getCount(w1) - getCount(w2); } //this method should return a list of words in order of frequency from least to greatest public List getWordsInOrderOfFrequency() { List wordsByCount = new ArrayList(wordCount.values()); //this part is unfinished.. the part i'm having trouble sorting the word frequencies List result = new ArrayList(); } } 

首先你对scanner.next()似乎不正确。 next()将返回下一个单词并在每次调用时移动到下一个单词,因此以下代码:

 if(scanner.next() == null){ ... } 

并且

 scanner.next().replaceAll("[^A-Za-z0-9]"," "); scanner.next().toLowerCase(); 

将消耗,然后只是扔掉的话。 你可能想做的是:

 String word = scanner.next().replaceAll("[^A-Za-z0-9]"," ").toLowerCase(); 

while循环的开头,这样你的单词的变化就会保存在word变量中,而不会被丢弃。

其次, wordCount映射的使用略有破坏。 你想要做的是检查这个word是否已经在地图中,以决定要设置的字数。 要做到这一点,不要检查scanner.next() == null ,而应该查看地图,例如:

 if(!wordCount.containsKey(word)){ //no count registered for the word yet wordCount.put(word, 1); }else{ wordCount.put(word, wordCount.get(word) + 1); } 

或者你可以这样做:

 Integer count = wordCount.get(word); if(count == null){ //no count registered for the word yet wordCount.put(word, 1); }else{ wordCount.put(word, count+1); } 

我更喜欢这种方法,因为它更清洁,并且每个单词只查找一个地图,而第一种方法有时会进行两次查找。

现在,要获得按频率降序排列的单词列表,您可以先将地图转换为列表,然后按照本文中的建议应用Collections.sort() 。 以下是适合您需求的简化版本:

 static List getWordInDescendingFreqOrder(Map wordCount) { // Convert map to list of  entries List> list = new ArrayList>(wordCount.entrySet()); // Sort list by integer values Collections.sort(list, new Comparator>() { public int compare(Map.Entry o1, Map.Entry o2) { // compare o2 to o1, instead of o1 to o2, to get descending freq. order return (o2.getValue()).compareTo(o1.getValue()); } }); // Populate the result into a list List result = new ArrayList(); for (Map.Entry entry : list) { result.add(entry.getKey()); } return result; } 

希望这可以帮助。

编辑:更改了@ dragon66建议的比较function。 谢谢。

您可以从以下内容中比较和提取想法:

 public class FrequencyCount { public static void main(String[] args) { // read in the words as an array String s = StdIn.readAll(); // s = s.toLowerCase(); // s = s.replaceAll("[\",!.:;?()']", ""); String[] words = s.split("\\s+"); // sort the words Merge.sort(words); // tabulate frequencies of each word Counter[] zipf = new Counter[words.length]; int M = 0; // number of distinct words for (int i = 0; i < words.length; i++) { if (i == 0 || !words[i].equals(words[i-1])) // short-circuiting OR zipf[M++] = new Counter(words[i], words.length); zipf[M-1].increment(); } // sort by frequency and print Merge.sort(zipf, 0, M); // sorting a subarray for (int j = M-1; j >= 0; j--) { StdOut.println(zipf[j]); } } } 

一个解决方案,接近您原来的post,其中包含Torious在评论中建议的更正和排序:

 import java.util.*; public class Parser implements Comparator  { public Map wordCount; void parse () { Scanner scanner = new Scanner (System.in); // don't redeclare it here - your attribute wordCount will else be shadowed wordCount = new HashMap (); //iterates through each word in the text file while (scanner.hasNext ()) { String word = scanner.next (); // operate on the word, not on next and next of next word from Scanner word = word.replaceAll (" [^A-Za-z0-9]", " "); word = word.toLowerCase (); // look into your map: if (! wordCount.containsKey (word)) wordCount.put (word, 1); else wordCount.put (word, wordCount.get (word) + 1);; } } public int getCount (String word) { return wordCount.get (word); } public int compare (String w1, String w2) { return getCount (w1) - getCount (w2); } public List getWordsInOrderOfFrequency () { List justWords = new ArrayList (wordCount.keySet()); Collections.sort (justWords, this); return justWords; } public static void main (String args []) { Parser p = new Parser (); p.parse (); List ls = p.getWordsInOrderOfFrequency (); for (String s: ls) System.out.println (s); } } 

rodions解决方案是一种generics地狱,但我没有简单 – 只是不同。

最后,他的解决方案更短更好。

在第一次看起来,似乎TreeMap可能是合适的,但它按键排序,并且按值排序没有帮助,我们无法切换键值,因为我们通过键查找它。

所以下一个想法是生成HashMap,并使用Collections.sort,但它不需要Map,只需要列表进行排序。 在Map中,有一个entrySet,它生成另一个Collection,它是一个Set,而不是List。 那是我采取另一个方向的点:

我实现了一个Iterator:我遍历entrySet,只返回Keys,其值为1.如果值为2,我将它们缓冲以供以后使用。 如果Iterator耗尽,我会查看缓冲区,如果它不为空,我将来使用缓冲区的迭代器,增加我寻找的最小值,并创建一个新的Buffer。

迭代器/可迭代对的优点是,可以通过简化的for循环获得这些值。

 import java.util.*; // a short little declaration :) public class WordFreq implements Iterator >, Iterable > { private Map  counter; private Iterator > it; private Set > buf; private int maxCount = 1; public Iterator > iterator () { return this; } // The iterator interface expects a "remove ()" - nobody knows why public void remove () { if (hasNext ()) next (); } public boolean hasNext () { return it.hasNext () || ! buf.isEmpty (); } public Map.Entry  next () { while (it.hasNext ()) { Map.Entry  mesi = it.next (); if (mesi.getValue () == maxCount) return mesi; else buf.add (mesi); } if (buf.isEmpty ()) return null; ++maxCount; it = buf.iterator (); buf = new HashSet > (); return next (); } public WordFreq () { it = fill (); buf = new HashSet > (); // The "this" here has to be an Iterable to make the foreach work for (Map.Entry  mesi : this) { System.out.println (mesi.getValue () + ":\t" + mesi.getKey ()); } } public Iterator > fill () { counter = new HashMap  (); Scanner sc = new Scanner (System.in); while (sc.hasNext ()) { push (sc.next ()); } Set > set = counter.entrySet (); return set.iterator (); } public void push (String word) { Integer i = counter.get (word); int n = 1 + ((i != null) ? i : 0); counter.put (word, n); } public static void main (String args[]) { new WordFreq (); } } 

由于我的解决方案从stdin读取,因此您使用以下命令调用它:

 cat WordFreq.java | java WordFreq