使用正则表达式从文本中删除连续的重复单词并显示新文本

HY，

我有以下代码：

import java.io.*; import java.util.ArrayList; import java.util.Scanner; import java.util.regex.*; / public class RegexSimple4 { public static void main(String[] args) { try { Scanner myfis = new Scanner(new File("D:\\myfis32.txt")); ArrayList  foundaz = new ArrayList(); ArrayList  noduplicates = new ArrayList(); while(myfis.hasNext()) { String line = myfis.nextLine(); String delim = " "; String [] words = line.split(delim); for (String s : words) { if (!s.isEmpty() && s != null) { Pattern pi = Pattern.compile("[aA-zZ]*"); Matcher ma = pi.matcher(s); if (ma.find()) { foundaz.add(s); } } } } if(foundaz.isEmpty()) { System.out.println("No words have been found"); } if(!foundaz.isEmpty()) { int n = foundaz.size(); String plus = foundaz.get(0); noduplicates.add(plus); for(int i=1; i<n; i++) { if ( !noduplicates.get(i-1) .equalsIgnoreCase(foundaz.get(i))) { noduplicates.add(foundaz.get(i)); } } //System.out.print("Cuvantul/cuvintele \n"+i); } if(!foundaz.isEmpty()) { System.out.print("Original text \n"); for(String s: foundaz) { System.out.println(s); } } if(!noduplicates.isEmpty()) { System.out.print("Remove duplicates\n"); for(String s: noduplicates) { System.out.println(s); } } } catch(Exception ex) { System.out.println(ex); } } }

目的是从短语中删除连续的重复项。该代码仅适用于不是全长短语的字符串列。

例如我的输入应该是：

Blah blah狗猫老鼠。猫老鼠狗狗。

和输出

Blah狗猫老鼠。猫老鼠狗。

诚恳，

首先，正则表达式[aA-zZ]*不会像你想象的那样做。它表示“在ASCII A和ASCII z （也包括[ ， ] ， \和其他）之间的范围内匹配零个或多个s或字符，或Z s”。因此它也匹配空字符串。

假设您只是寻找仅由ASCII字母组成的重复单词，不区分大小写，保留第一个单词（这意味着您不想匹配"it's it's"或"olé olé!" ），那么您可以在单个正则表达式操作中执行此操作：

 String result = subject.replaceAll("(?i)\\b([az]+)\\b(?:\\s+\\1\\b)+", "$1");

哪个会改变

 Hello hello Hello there there past pastures

成

 Hello there past pastures

说明：

 (?i) # Mode: case-insensitive \b # Match the start of a word ([az]+) # Match one ASCII "word", capture it in group 1 \b # Match the end of a word (?: # Start of non-capturing group: \s+ # Match at least one whitespace character \1 # Match the same word as captured before (case-insensitively) \b # and make sure it ends there. )+ # Repeat that as often as possible

在regex101.com上查看。

贝娄这是你的代码。我用线来分割文本和Tim的正则表达式。

 import java.util.Scanner; import java.io.*; import java.util.regex.*; import java.util.ArrayList; /** * * @author Marius */ public class RegexSimple41 { /** * @param args the command line arguments */ public static void main(String[] args) { ArrayList  manyLines = new ArrayList(); ArrayList  noRepeat = new ArrayList(); try { Scanner myfis = new Scanner(new File("D:\\myfis41.txt")); while(myfis.hasNext()) { String line = myfis.nextLine(); String delim = System.getProperty("line.separator"); String [] lines = line.split(delim); for(String s: lines) { if(!s.isEmpty()&&s!=null) { manyLines.add(s); } } } if(!manyLines.isEmpty()) { System.out.print("Original text\n"); for(String s: manyLines) { System.out.println(s); } } if(!manyLines.isEmpty()) { for(String s: manyLines) { String result = s.replaceAll("(?i)\\b([az]+)\\b(?:\\s+\\1\\b)+", "$1"); noRepeat.add(result); } } if(!noRepeat.isEmpty()) { System.out.print("Remove duplicates\n"); for(String s: noRepeat) { System.out.println(s); } } } catch(Exception ex) { System.out.println(ex); } } }

祝好运，

Bellow代码工作正常

import java.util.Scanner;

import java.util.regex.Matcher;

import java.util.regex.Pattern;

公共类DuplicateRemoveEx {

 public static void main(String[] args){ String regex="(?i)\\b(\\w+)(\\b\\W+\\1\\b)+"; Pattern p = Pattern.compile(regex,Pattern.CASE_INSENSITIVE); Scanner in = new Scanner(System.in); int numSentences = Integer.parseInt(in.nextLine()); while(numSentences-- >0){ String input = in.nextLine(); Matcher m = p.matcher(input); while(m.find()){ input=input.replaceAll(regex, "$1"); } System.out.println(input); } in.close(); }

}

使用正则表达式从文本中删除连续的重复单词并显示新文本

TestNG：确定接下来的测试方法

如何编写通用日志分析器

重新加载getResourceAsStream加载的资源

Java中的一个好的HTML对象模型？

Java TreeSet：remove和contains（）不起作用

回归collections vs Collection

如何在Java中以编程方式启动Tomcat Server

递归的Sierpinski三角形不是递归的

是否可以查看Java类文件字节码

如何让Cobertura失败M2构建以实现低代码覆盖率