查找字符串的子字符串包含数组中的所有单词

我有一个字符串和一个单词数组,我必须编写代码来查找字符串的所有子字符串,包含任何顺序的数组中的所有单词。 该字符串不包含任何特殊字符/数字,每个单词用空格分隔。

例如:

字符串给出:

aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc 

数组中的单词:

 aaaa bbbb cccc 

输出样本:

 aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa cccc bbbb aaaa cccc bbbb bbbb bbbb bbbb cccc bbbb bbbb bbbb bbbb aaaa aaaa cccc bbbb 

我已经使用for循环实现了这个,但这是非常低效的。

我怎样才能更有效地做到这一点?

我的代码:

  for(int i=0;i= words.length) { String res = check(i); if(!res.equals("")) { System.out.println(res); System.out.println(""); } reset_all(); } else { break; } } public static String check(int i) { String res = ""; num_words = 0; for(int j=i;j<str_arr.length;j++) { if(has_word(str_arr[j])) { t.put(str_arr[j].toLowerCase(), 1); h.put(str_arr[j].toLowerCase(), 1); res = res + str_arr[j]; //+ " "; if(all_complete()) { return res; } res = res + " "; } else { res = res + str_arr[j] + " "; } } res = ""; return res; } 

我的第一种方法是类似下面的伪代码

  for word:string { if word in array { for each stored potential substring { if word wasnt already found { remove word from notAlreadyFoundList if notAlreadyFoundList is empty { use starting pos and ending pos to save our substring } } store position and array-word as potential substring } 

这应该有不错的性能,因为你只遍历字符串一次。

[编辑]

这是我的伪代码的实现,尝试一下,看看它是否表现更好或更差。 它的工作原理是,一旦找到最后一个单词,就会找到匹配的子字符串。 如果您真的想要所有匹配项,请更改标记为//ALLMATCHES的行:

 class SubStringFinder { String textString = "aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc"; Set words = new HashSet(Arrays.asList("aaaa", "bbbb", "cccc")); public static void main(String[] args) { new SubStringFinder(); } public SubStringFinder() { List matches = new ArrayList(); for (String textPart : textString.split(" ")) { if (words.contains(textPart)) { for (Iterator matchIterator = matches.iterator(); matchIterator.hasNext();) { PotentialMatch match = matchIterator.next(); String result = match.tryMatch(textPart); if (result != null) { System.out.println("Match found: \"" + result + "\""); matchIterator.remove(); //ALLMATCHES - remove this line } } Set unfound = new HashSet(words); unfound.remove(textPart); matches.add(new PotentialMatch(unfound, textPart)); }// ALLMATCHES add these lines // else { // matches.add(new PotentialMatch(new HashSet(words), textPart)); // } } } class PotentialMatch { Set unfoundWords; StringBuilder stringPart; public PotentialMatch(Set unfoundWords, String part) { this.unfoundWords = unfoundWords; this.stringPart = new StringBuilder(part); } public String tryMatch(String part) { this.stringPart.append(' ').append(part); unfoundWords.remove(part); if (unfoundWords.isEmpty()) { return this.stringPart.toString(); } return null; } } } 

这是另一种方法:

 public static void main(String[] args) throws FileNotFoundException { // init List result = new ArrayList(); String string = "aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc"; String[] words = { "aaaa", "bbbb", "cccc" }; // find all combs as regexps (eg "(aaaa )+(bbbb )+(cccc )*cccc", "(aaaa )+(cccc )+(bbbb )*bbbb") List regexps = findCombs(Arrays.asList(words)); // compile and add for (String regexp : regexps) { Pattern p = Pattern.compile(regexp); Matcher m = p.matcher(string); while (m.find()) { result.add(m.group()); } } System.out.println(result); } private static List findCombs(List words) { if (words.size() == 1) { words.set(0, "(" + Pattern.quote(words.get(0)) + " )*" + Pattern.quote(words.get(0))); return words; } List list = new ArrayList(); for (String word : words) { List tail = new LinkedList(words); tail.remove(word); for (String s : findCombs(tail)) { list.add("(" + Pattern.quote(word) + " ?)+" + s); } } return list; } 

这将输出:

 [aaaa bbbb cccc, aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb, cccc bbbb bbbb bbbb bbbb aaaa] 

我知道结果并不完整:你只有可用的组合, 完全扩展 ,但你得到了所有这些。