java ArrayList中最常见的n – 单词

我需要在ArrayList中找到最频繁的单词(n个单词,所以如果n = 5,最常见的5个单词)。

private ArrayList wordList = new ArrayList(); public ArrayList mostOften(int k) { ArrayList lista = new ArrayList(); Set unique = new HashSet(wordList); for (String key : unique) System.out.println(key + ": " + Collections.frequency(wordList, key)); return lista; } 

该函数需要返回按频率排序的最常用单词列表。 如果2个单词具有相同的频率,我需要按字母顺序对它们进行排序。 我已经发布了我尝试的内容,但这只发现了频率,我不知道如何做其余的事情。 有帮助吗?

您可以编写一个使用列表初始化的Comparator类。 然后,您可以使用列表和比较器调用Collections.sort()。 代码可能如下所示:

 import java.util.ArrayList; import java.util.Collections; import java.util.Comparator; import java.util.HashSet; import java.util.List; import java.util.Set; public class FrequencyComparator implements Comparator{ List list; @Override public int compare(String o1, String o2) { if (Collections.frequency(list, o1) > Collections.frequency(list, o2)){ return -1; }else if (Collections.frequency(list, o1) < Collections.frequency(list, o2)){ return 1; }else{ return o1.compareTo(o2); } } public FrequencyComparator(List list){ this.list = list; } public static void main(String[] args) { List list = new ArrayList(); list.add("Hello"); list.add("You"); list.add("Hello"); list.add("You"); list.add("Apple"); list.add("Apple"); list.add("Hello"); Set unique = new HashSet<>(list); List uniqueList = new ArrayList<>(unique); Collections.sort(uniqueList, new FrequencyComparator(list)); System.out.println(uniqueList); //Take the most frequent 2 objects System.out.println(uniqueList.subList(uniqueList.size() - 2, uniqueList.size()); } } 
 public class WordFrequency { public static void main(String[] args) { List list = new ArrayList<>(); list.add("Hello"); list.add("Hello"); list.add("aaaa"); list.add("aaaa"); list.add("World"); list.add("abc"); list.add("abc"); list.add("cba"); list.add("abc"); list.add("World"); list.add("abc"); System.out.println(mostOften(list)); } public static List mostOften(List words){ Map wordMap = new HashMap<>(); for (String word : words) { Word currentWord = wordMap.get(word); if(currentWord == null) wordMap.put(word, new Word(word, 1)); else currentWord.frequency++; } List wordList = new ArrayList<>(wordMap.values()); wordList.sort(new Comparator() { @Override public int compare(Word o1, Word o2) { if(o1.frequency == o2.frequency) return o1.word.compareToIgnoreCase(o2.word); /* sort words with high frequency first */ return Integer.compare(o2.frequency, o1.frequency); } }); return wordList; } } public class Word{ String word; int frequency; public Word(String word, int total) { this.word = word; this.frequency = total; } public String toString(){ return "[" + word + ", " + frequency + "]"; } } 
 class Pair { String text; int freq; public Pair(String text, int freq) { super(); this.text = text; this.freq = freq; } } public List sortFreq(List wordList) { Set unique = new HashSet(wordList); List list = new ArrayList(unique.size()); for (String key : unique) { int freq = Collections.frequency(wordList, key); Pair tempPair = new Pair(key, freq); list.add(tempPair); } Collections.sort(list,new Comparator() { @Override public int compare(Pair o1, Pair o2) { if(o1.freq == o2.freq){ return o1.text.compareTo(o2.text); } return o2.freq - o1.freq; } }); return list; } 

这是一个使用带有流的Java 8的解决方案,包括计算单词频率,排序和限制为k单词:

 List wordList = new ArrayList(); int k = 5; List mostFrequentWords = wordList.stream().collect(Collectors.collectingAndThen( Collectors.groupingBy(Function.identity(), Collectors.counting()), map -> map.entrySet().stream() .sorted(Comparator.> comparingLong(Entry::getValue).reversed() .thenComparing(Entry::getKey)) .map(Entry::getKey) .limit(k) .collect(Collectors.toList())));