java中的正则表达式，用于查找重复的连续单词

我把它看作是在字符串中找到重复单词的答案。但是当我使用它时，它认为This和is是相同的并删除is 。

正则表达式

 "\\b(\\w+)\\b\\s+\\1"

知道为什么会这样吗？

这是我用于重复删除的代码

 public static String RemoveDuplicateWords(String input) { String originalText = input; String output = ""; Pattern p = Pattern.compile("\b(\w+)\b\s+\b\1\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); //Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\1", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); Matcher m = p.matcher(input); if (!m.find()) output = "No duplicates found, no changes made to data"; else { while (m.find()) { if (output == "") output = input.replaceFirst(m.group(), m.group(1)); else output = output.replaceAll(m.group(), m.group(1)); } input = output; m = p.matcher(input); while (m.find()) { output = ""; if (output == "") output = input.replaceAll(m.group(), m.group(1)); else output = output.replaceAll(m.group(), m.group(1)); } } return output; }

试试这个：

 String pattern = "(?i)\\b([az]+)\\b(?:\\s+\\1\\b)+"; Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE); String input = "your string"; Matcher m = r.matcher(input); while (m.find()) { input = input.replaceAll(m.group(), m.group(1)); } System.out.println(input);

Java类正则表达式在Pattern类的API文档中得到了很好的解释。添加一些空格以指示正则表达式的不同部分后：

 "(?i) \\b ([az]+) \\b (?: \\s+ \\1 \\b )+" \b match a word boundary [az]+ match a word with one or more characters; the parentheses capture the word as a group \b match a word boundary (?: indicates a non-capturing group (which starts here) \s+ match one or more white space characters \1 is a back reference to the first (captured) group; so the word is repeated here \b match a word boundary )+ indicates the end of the non-capturing group and allows it to occur one or more times

你应该使用\b(\w+)\b\s+\b\1\b ，点击这里查看结果…

希望这是你想要的……

更新1

好吧，你拥有的输出是

删除重复项后的最后一个字符串

 import java.util.regex.*; public class MyDup { public static void main (String args[]) { String input="This This is text text another another"; String originalText = input; String output = ""; Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\b\\1\\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); Matcher m = p.matcher(input); System.out.println(m); if (!m.find()) output = "No duplicates found, no changes made to data"; else { while (m.find()) { if (output == "") { output = input.replaceFirst(m.group(), m.group(1)); } else { output = output.replaceAll(m.group(), m.group(1)); } } input = output; m = p.matcher(input); while (m.find()) { output = ""; if (output == "") { output = input.replaceAll(m.group(), m.group(1)); } else { output = output.replaceAll(m.group(), m.group(1)); } } } System.out.println("After removing duplicate the final string is " + output); }

运行此代码并查看您获得的输出…您的查询将被解决…

注意

在output你用单个单词替换重复……是不是？

当我把System.out.println(m.group() + " : " + m.group(1)); 首先，如果条件我输出为text text : text即重复项被替换为单个单词。

 else { while (m.find()) { if (output == "") { System.out.println(m.group() + " : " + m.group(1)); output = input.replaceFirst(m.group(), m.group(1)); } else {

希望你现在得到了什么… 🙂

祝你好运！！！干杯！！！

即使出现任意数量，下面的模式也会匹配重复的单词。

 Pattern.compile("\\b(\\w+)(\\b\\W+\\b\\1\\b)*", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);

例如，“这是我的我朋友朋友朋友朋友”将输出“这是我的朋友”

此外，对于此模式，只有一次使用“while（m.find（））”的迭代就足够了。

 \b(\w+)(\b\W+\1\b)*

说明：

 \b : Any word boundary 
(\w+) : Select any word character (letter, number, underscore)

一旦选择了所有单词，现在是时候选择常用单词了。

 ( : Grouping starts
 \b : Any word boundary
 \W+ : Any non-word character
 \1 : Select repeated words
 \b : Un select if it repeated word is joined with another word
 ) : Grouping ends

参考：示例

我相信这是你应该用来检测由任意数量的非单词字符分隔的2个连续单词的正则表达式：

 Pattern p = Pattern.compile("\\b(\\w+)\\b\\W+\\b\\1\\b", Pattern.CASE_INSENSITIVE);

java中的正则表达式，用于查找重复的连续单词

更新1

删除重复项后的最后一个字符串

注意

希望你现在得到了什么… 🙂

祝你好运！！！干杯！！！

JEE无法运行JAX-RS WebService骨架应用程序

Java Arrays.sort（test）对两个数组进行排序

Swing问题/ JTree /自定义树模型

如何将java项目转换为Maven项目或类似项目

“线程中的exception”main“java.util.InputMismatchException”**

Blowfish在Java / Scala中加密并在bash中解密

在字符串中搜索单词

Java不支持major.minor版本52.0错误我的世界

Java.Variable名称长度

如何使用java从Json文件导入Mongodb数据

java中的正则表达式，用于查找重复的连续单词

更新1

删除重复项后的最后一个字符串

注意

希望你现在得到了什么… 🙂

祝你好运！！！ 干杯！！！

JEE无法运行JAX-RS WebService骨架应用程序

Java Arrays.sort（test）对两个数组进行排序

Swing问题/ JTree /自定义树模型

如何将java项目转换为Maven项目或类似项目

“线程中的exception”main“java.util.InputMismatchException”**

Blowfish在Java / Scala中加密并在bash中解密

在字符串中搜索单词

Java不支持major.minor版本52.0错误我的世界

Java.Variable名称长度

如何使用java从Json文件导入Mongodb数据

祝你好运！！！干杯！！！