Java字典搜索器

我正在尝试实现一个程序,它将接受用户输入,将该字符串拆分为标记,然后在字典中搜索该字符串中的单词。 我解析字符串的目标是让每个标记都是英文单词。

例如:

Input: aman Split Method: a man am an aman am an am an ama n Desired Output: a man 

我目前有这个代码可以执行所有操作直到所需的输出部分:

  import java.util.Scanner; import java.io.*; public class Words { public static String[] dic = new String[80368]; public static void split(String head, String in) { // head + " " + in is a segmentation String segment = head + " " + in; // count number of dictionary words int count = 0; Scanner phraseScan = new Scanner(segment); while (phraseScan.hasNext()) { String word = phraseScan.next(); for (int i=0; i<dic.length; i++) { if (word.equalsIgnoreCase(dic[i])) count++; } } System.out.println(segment + "\t" + count + " English words"); // recursive calls for (int i=1; i<in.length(); i++) { split(head+" "+in.substring(0,i), in.substring(i,in.length())); } } public static void main (String[] args) throws IOException { Scanner scan = new Scanner(System.in); System.out.print("Enter a string: "); String input = scan.next(); System.out.println(); Scanner filescan = new Scanner(new File("src:\\dictionary.txt")); int wc = 0; while (filescan.hasNext()) { dic[wc] = filescan.nextLine(); wc++; } System.out.println(wc + " words stored"); split("", input); } } 

我知道有更好的方法来存储字典(例如二叉搜索树或哈希表),但我不知道如何实现它们。

我坚持如何实现一个方法,该方法将检查拆分字符串,以查看每个段是否是字典中的单词。

任何帮助都会很棒,谢谢

如果要支持20个或更多字符,则尽可能以合理的时间分割输入字符串。 这是一种更有效的方法,内联评论:

 public static void main(String[] args) throws IOException { // load the dictionary into a set for fast lookups Set dictionary = new HashSet(); Scanner filescan = new Scanner(new File("dictionary.txt")); while (filescan.hasNext()) { dictionary.add(filescan.nextLine().toLowerCase()); } // scan for input Scanner scan = new Scanner(System.in); System.out.print("Enter a string: "); String input = scan.next().toLowerCase(); System.out.println(); // place to store list of results, each result is a list of strings List> results = new ArrayList>(); long time = System.currentTimeMillis(); // start the search, pass empty stack to represent words found so far search(input, dictionary, new Stack(), results); time = System.currentTimeMillis() - time; // list the results found for (List result : results) { for (String word : result) { System.out.print(word + " "); } System.out.println("(" + result.size() + " words)"); } System.out.println(); System.out.println("Took " + time + "ms"); } public static void search(String input, Set dictionary, Stack words, List> results) { for (int i = 0; i < input.length(); i++) { // take the first i characters of the input and see if it is a word String substring = input.substring(0, i + 1); if (dictionary.contains(substring)) { // the beginning of the input matches a word, store on stack words.push(substring); if (i == input.length() - 1) { // there's no input left, copy the words stack to results results.add(new ArrayList(words)); } else { // there's more input left, search the remaining part search(input.substring(i + 1), dictionary, words, results); } // pop the matched word back off so we can move onto the next i words.pop(); } } } 

输出示例:

 Enter a string: aman a man (2 words) am an (2 words) Took 0ms 

这是一个更长的输入:

 Enter a string: thequickbrownfoxjumpedoverthelazydog the quick brown fox jump ed over the lazy dog (10 words) the quick brown fox jump ed overt he lazy dog (10 words) the quick brown fox jumped over the lazy dog (9 words) the quick brown fox jumped overt he lazy dog (9 words) Took 1ms 

如果我的回答看起来很愚蠢,那是因为你真的很亲密,而且我不确定你被困在哪里。

给出上面代码的最简单方法是简单地为单词数添加一个计数器,并将其与匹配单词的数量进行比较

  int count = 0; int total = 0; Scanner phraseScan = new Scanner(segment); while (phraseScan.hasNext()) { total++ String word = phraseScan.next(); for (int i=0; i 

将它作为哈希表实现可能会更好(它确实更快),并且它非常简单。

 HashSet dict = new HashSet() dict.add("foo")// add your data int count = 0; int total = 0; Scanner phraseScan = new Scanner(segment); while (phraseScan.hasNext()) { total++ String word = phraseScan.next(); if(dict.contains(word)) count++; } 

还有其他更好的方法可以做到这一点。 一个是trie(http://en.wikipedia.org/wiki/Trie),它的查找速度稍慢,但可以更有效地存储数据。 如果你有一个大字典,你可能无法在内存中使用它,所以你可以使用像BDB这样的数据库或键值存储(http://en.wikipedia.org/wiki/Berkeley_DB)

package LinkedList;

import java.util.LinkedHashSet;

公共类dictionaryCheck {

 private static LinkedHashSet set; private static int start = 0; private static boolean flag; public boolean checkDictionary(String str, int length) { if (start >= length) { return flag; } else { flag = false; for (String word : set) { int wordLen = word.length(); if (start + wordLen <= length) { if (word.equals(str.substring(start, wordLen + start))) { start = wordLen + start; flag = true; checkDictionary(str, length); } } } } return flag; } public static void main(String[] args) { // TODO Auto-generated method stub set = new LinkedHashSet(); set.add("Jose"); set.add("Nithin"); set.add("Joy"); set.add("Justine"); set.add("Jomin"); set.add("Thomas"); String str = "JoyJustine"; int length = str.length(); boolean c; dictionaryCheck obj = new dictionaryCheck(); c = obj.checkDictionary(str, length); if (c) { System.out .println("String can be found out from those words in the Dictionary"); } else { System.out.println("Not Possible"); } } 

}