使用Java从文本中删除重复行

我想知道是否有人在java中有逻辑删除重复行，同时保持行顺序。

我不希望没有正则表达式解决方案。

public class UniqueLineReader extends BufferedReader { Set lines = new HashSet(); public UniqueLineReader(Reader arg0) { super(arg0); } @Override public String readLine() throws IOException { String uniqueLine; if (lines.add(uniqueLine = super.readLine())) return uniqueLine; return ""; } //for testing.. public static void main(String args[]) { try { // Open the file that is the first // command line parameter FileInputStream fstream = new FileInputStream( "test.txt"); UniqueLineReader br = new UniqueLineReader(new InputStreamReader(fstream)); String strLine; // Read File Line By Line while ((strLine = br.readLine()) != null) { // Print the content on the console if (strLine != "") System.out.println(strLine); } // Close the input stream in.close(); } catch (Exception e) {// Catch exception if any System.err.println("Error: " + e.getMessage()); } } }

修改版本：

 public class UniqueLineReader extends BufferedReader { Set lines = new HashSet(); public UniqueLineReader(Reader arg0) { super(arg0); } @Override public String readLine() throws IOException { String uniqueLine; while (lines.add(uniqueLine = super.readLine()) == false); //read until encountering a unique line return uniqueLine; } public static void main(String args[]) { try { // Open the file that is the first // command line parameter FileInputStream fstream = new FileInputStream( "/home/emil/Desktop/ff.txt"); UniqueLineReader br = new UniqueLineReader(new InputStreamReader(fstream)); String strLine; // Read File Line By Line while ((strLine = br.readLine()) != null) { // Print the content on the console System.out.println(strLine); } // Close the input stream in.close(); } catch (Exception e) {// Catch exception if any System.err.println("Error: " + e.getMessage()); } } }

如果将行提供给LinkedHashSet ，它会忽略重复的行，因为它是一个集合，但保留了顺序，因为它是链接的。如果你只是想知道你之前是否看过一个给定的行，请在继续时Set它们提供给一个简单的Set ，并忽略Set已经包含/包含的那些。

使用BufferedReader读取文本文件并将其存储在LinkedHashSet中。打印出来。

这是一个例子：

 public class DuplicateRemover { public String stripDuplicates(String aHunk) { StringBuilder result = new StringBuilder(); Set uniqueLines = new LinkedHashSet(); String[] chunks = aHunk.split("\n"); uniqueLines.addAll(Arrays.asList(chunks)); for (String chunk : uniqueLines) { result.append(chunk).append("\n"); } return result.toString(); } }

这是一些要validation的unit testing（忽略我的邪恶复制粘贴;））：

 import org.junit.Test; import static org.junit.Assert.*; public class DuplicateRemoverTest { @Test public void removesDuplicateLines() { String input = "a\nb\nc\nb\nd\n"; String expected = "a\nb\nc\nd\n"; DuplicateRemover remover = new DuplicateRemover(); String actual = remover.stripDuplicates(input); assertEquals(expected, actual); } @Test public void removesDuplicateLinesUnalphabetized() { String input = "z\nb\nc\nb\nz\n"; String expected = "z\nb\nc\n"; DuplicateRemover remover = new DuplicateRemover(); String actual = remover.stripDuplicates(input); assertEquals(expected, actual); } }

这是另一种解决方案。我们只使用UNIX！

 cat MyFile.java | uniq > MyFile.java

编辑：哦等等，我重新阅读了这个话题。这是一个合法的解决方案，因为我成功地与语言无关吗？

使用新的Java Stream API可以很容易地从文本或文件中删除重复的行。 Stream支持不同的聚合特性，如sort，distinct和使用不同的java现有数据结构及其方法。以下示例可用于使用Stream API删除文件中的重复内容或排序内容

 package removeword; import java.io.IOException; import java.nio.file.Files; import java.nio.file.OpenOption; import java.nio.file.Path; import java.nio.file.Paths; import java.util.Arrays; import java.util.Scanner; import java.util.stream.Stream; import static java.nio.file.StandardOpenOption.*; import static java.util.stream.Collectors.joining; public class Java8UniqueWords { public static void main(String[] args) throws IOException { Path sourcePath = Paths.get("C:/Users/source.txt"); Path changedPath = Paths.get("C:/Users/removedDouplicate_file.txt"); try (final Stream lines = Files.lines(sourcePath ) // .map(line -> line.toLowerCase()) /*optional to use existing string methods*/ .distinct() // .sorted()) /*aggregrate function to sort disctincted line*/ { final String uniqueWords = lines.collect(joining("\n")); System.out.println("Final Output:" + uniqueWords); Files.write(changedPath , uniqueWords.getBytes(),WRITE, TRUNCATE_EXISTING); } } }

为了获得更好/最佳性能，使用Java 8的 APIfunction是明智之举。使用LinkedHashSet for Collection的Streams和Method引用如下：

 import java.io.IOException; import java.io.PrintWriter; import java.nio.file.Files; import java.nio.file.Paths; import java.util.LinkedHashSet; import java.util.stream.Collectors; public class UniqueOperation { private static PrintWriter pw; enter code here public static void main(String[] args) throws IOException { pw = new PrintWriter("abc.txt"); for(String p : Files.newBufferedReader(Paths.get("C:/Users/as00465129/Desktop/FrontEndUdemyLinks.txt")). lines(). collect(Collectors.toCollection(LinkedHashSet::new))) pw.println(p); pw.flush(); pw.close(); System.out.println("File operation performed successfully"); }

这里我使用一个hashset来存储看到的行

 Scanner scan;//input Set lines = new HashSet(); StringBuilder strb = new StringBuilder(); while(scan.hasNextLine()){ String line = scan.nextLine(); if(lines.add(line)) strb.append(line); }

使用Java从文本中删除重复行

涉及RMI调用的Spring分布式事务可能吗？

如何在AWS DynamoDB文档API上更新地图或列表？

用于Guava不可变集合的Java 8收集器？

如何解除阻塞在ServerSocket.accept（）上阻塞的线程？

Java 8中throwMerger的替代方案

Void值作为返回参数

在现有的webapp中集成BIRT

如何在gradle任务中通过scp复制目录？

在javafx中使用css导致实时蜡烛图表的大量内存使用

将不同版本的项目导入Eclipse