Groovy：从文件中读取一系列行

我有一个文本文件，其中包含大量2,000,000行的大量数据。使用以下代码片段浏览文件很简单，但这不是我需要的;-)

def f = new File("input.txt") f.eachLine() { // Some code here }

我只需要从文件中读取特定范围的行。有没有办法像这样指定起始和结束行（伪代码）？在选择范围之前，我想避免使用readLines（）将所有行加载到内存中。

 // Read all lines from 4 to 48 def f = new File("input.txt") def start = 4 def end = 48 f.eachLine(start, end) { // Some code here }

如果Groovy无法做到这一点，那么欢迎任何Java解决方案:-)

干杯，罗伯特

我不相信有任何“神奇”的方法可以跳到文件中的任意“行”。行只是由换行符定义，所以没有实际读取文件，就无法知道它们的位置。我相信你有两个选择：

按照Mark Peter的回答，使用BufferedReader一次读取一行文件，直到达到所需的行。这显然会很慢。
计算下一次读取需要多少字节（而不是行），并使用RandomAccessFile之类的东西直接寻找文件中的那一点。是否可以有效地知道正确的字节数取决于您的应用程序。例如，如果您按顺序一次读取文件，则只需记录您离开的位置。如果所有行都具有固定长度L字节，则到达行N只是寻求定位N * L的问题。如果这是您经常重复的操作，则一些预处理可能有所帮助：例如，读取整个文件一次并在内存中的HashMap中记录每行的起始位置。下次你需要去N行时，只需在HashMap中查找它的位置并直接寻找到那一点。

Java解决方案：

 BufferedReader r = new BufferedReader(new FileReader(f)); String line; for ( int ln = 0; (line = r.readLine()) != null && ln <= end; ln++ ) { if ( ln >= start ) { //Some code here } }

总，呃？

不幸的是，除非您的线路是固定长度的，否则您将无法有效地跳到start线，因为每条线路可能任意长，因此需要读取所有数据。但这并不排除更好的解决方案。

Java 8

认为值得更新以展示如何使用Streams有效地执行此操作：

 int start = 5; int end = 12; Path file = Paths.get("/tmp/bigfile.txt"); try (Stream lines = Files.lines(file)) { lines.skip(start).limit(end-start).forEach(System.out::println); }

因为Streams被懒惰地评估，它只会读取包括end （加上它选择做的内部缓冲）。

这是一个Groovy解决方案。不幸的是，这将在start后读取文件的每一行

 def start = 4 def end = 48 new File("input.txt").eachLine(start) {lineNo, line -> if (lineNo <= end) { // Process the line } }

Groovy现在可以从一些特殊的行开始。以下是文档中文档的两个引用

 Object eachLine(int firstLine, Closure closure) Object eachLine(String charset, int firstLine, Closure closure)

这应该做到这一点。我相信这不会在“结束”之后读取任何一行。

 def readRange = {file -> def start = 10 def end = 20 def fileToRead = new File(file) fileToRead.eachLine{line, lineNo = 0 -> lineNo++ if(lineNo > end) { return } if(lineNo >= start) { println line } } }

在Groovy中，您可以使用Category

 class FileHelper { static eachLineInRange(File file, IntRange lineRange, Closure closure) { file.withReader { r-> def line for(; (line = r.readLine()) != null;) { def lineNo = r.lineNumber if(lineNo < lineRange.from) continue if(lineNo > lineRange.to) break closure.call(line, lineNo) } } } } def f = '/path/to/file' as File use(FileHelper) { f.eachLineInRange(from..to){line, lineNo -> println "$lineNo) $line" } }

或ExpandoMetaClass

 File.metaClass.eachLineInRange = { IntRange lineRange, Closure closure -> delegate.withReader { r-> def line for(; (line = r.readLine()) != null;) { def lineNo = r.lineNumber if(lineNo < lineRange.from) continue if(lineNo > lineRange.to) break closure.call(line, lineNo) } } } def f = '/path/to/file' as File f.eachLineInRange(from..to){line, lineNo -> println "$lineNo) $line" }

在此解决方案中，您按顺序从文件中读取每一行，但不要将它们全部保留在内存中。

您必须从头开始迭代到达起始位置，但您可以使用LineNumberReader （而不是BufferedReader ），因为它会跟踪您的行号。

  final int start = 4; final int end = 48; final LineNumberReader in = new LineNumberReader(new FileReader(filename)); String line=null; while ((line = in.readLine()) != null && in.getLineNumber() <= end) { if (in.getLineNumber() >= start) { //process line } }

谢谢你的所有提示。根据你所写的内容，我拼凑了我自己的代码，这些代码似乎正在起作用。不优雅，但它的目的:-)

 def f = new RandomAccessFile("D:/input.txt", "r") def start = 3 def end = 6 def current = start-1 def BYTE_OFFSET = 11 def resultList = [] if ((end*BYTE_OFFSET) <= f.length()) { while ((current*BYTE_OFFSET) < (end*BYTE_OFFSET)) { f.seek(current*BYTE_OFFSET) resultList << f.readLine() current++ } }

这是使用Commons / IO的 LineIterator和FileUtils的另一个Java解决方案：

 public static Collection readFile(final File f, final int startOffset, final int lines) throws IOException{ final LineIterator it = FileUtils.lineIterator(f); int index = 0; final Collection coll = new ArrayList(lines); while(index++ < startOffset + lines && it.hasNext()){ final String line = it.nextLine(); if(index >= startOffset){ coll.add(line); } } it.close(); return coll; }

Groovy：从文件中读取一系列行

Java 8

线程“main”中的exceptionjava.io.FileNotFoundException：错误

字符串中的换行符未写入文件

如何在Java中引用资源？

如何在java中将数据保存到File中？

在java中创建，编写和编辑相同的文本文件

java.security.AccessControlException：拒绝访问（java.io.FilePermission

如何从Class文件夹外部读取java中的属性文件？

用Java读取和写入同一个文件

使用多个线程编写文件

用Java复制文件的最快方法