在Java中逐行读取和写入大文件的最快方法

我一直在寻找最快的方法来读取和写入具有有限内存(约64MB)的Java中的大文件(0.5 – 1 GB)。 文件中的每一行代表一条记录,所以我需要逐行获取它们。 该文件是普通文本文件。

我尝试过BufferedReader和BufferedWriter,但它似乎不是最好的选择。 读取和写入大小为0.5 GB的文件大约需要35秒,只读取写入而不进行处理。 我认为这里的瓶颈是写作,因为单独阅读大约需要10秒钟。

我试图读取字节数组,但是在每个读取的数组中搜索行需要更多时间。

有什么建议吗? 谢谢

我怀疑你真正的问题是你的硬件有限,你所做的是软件不会有太大的区别。 如果你有足够的内存和CPU,更高级的技巧可以帮助,但如果你只是等待你的硬盘驱动器,因为文件没有缓存,它将没有太大的区别。

BTW:10秒或500 MB /秒的500 MB是HDD的典型读取速度。

尝试运行以下命令以查看您的系统无法有效缓存文件的位置。

 public static void main(String... args) throws IOException { for (int mb : new int[]{50, 100, 250, 500, 1000, 2000}) testFileSize(mb); } private static void testFileSize(int mb) throws IOException { File file = File.createTempFile("test", ".txt"); file.deleteOnExit(); char[] chars = new char[1024]; Arrays.fill(chars, 'A'); String longLine = new String(chars); long start1 = System.nanoTime(); PrintWriter pw = new PrintWriter(new FileWriter(file)); for (int i = 0; i < mb * 1024; i++) pw.println(longLine); pw.close(); long time1 = System.nanoTime() - start1; System.out.printf("Took %.3f seconds to write to a %d MB, file rate: %.1f MB/s%n", time1 / 1e9, file.length() >> 20, file.length() * 1000.0 / time1); long start2 = System.nanoTime(); BufferedReader br = new BufferedReader(new FileReader(file)); for (String line; (line = br.readLine()) != null; ) { } br.close(); long time2 = System.nanoTime() - start2; System.out.printf("Took %.3f seconds to read to a %d MB file, rate: %.1f MB/s%n", time2 / 1e9, file.length() >> 20, file.length() * 1000.0 / time2); file.delete(); } 

在具有大量内存的Linux机器上。

 Took 0.395 seconds to write to a 50 MB, file rate: 133.0 MB/s Took 0.375 seconds to read to a 50 MB file, rate: 140.0 MB/s Took 0.669 seconds to write to a 100 MB, file rate: 156.9 MB/s Took 0.569 seconds to read to a 100 MB file, rate: 184.6 MB/s Took 1.585 seconds to write to a 250 MB, file rate: 165.5 MB/s Took 1.274 seconds to read to a 250 MB file, rate: 206.0 MB/s Took 2.513 seconds to write to a 500 MB, file rate: 208.8 MB/s Took 2.332 seconds to read to a 500 MB file, rate: 225.1 MB/s Took 5.094 seconds to write to a 1000 MB, file rate: 206.0 MB/s Took 5.041 seconds to read to a 1000 MB file, rate: 208.2 MB/s Took 11.509 seconds to write to a 2001 MB, file rate: 182.4 MB/s Took 9.681 seconds to read to a 2001 MB file, rate: 216.8 MB/s 

在具有大量内存的Windows机器上。

 Took 0.376 seconds to write to a 50 MB, file rate: 139.7 MB/s Took 0.401 seconds to read to a 50 MB file, rate: 131.1 MB/s Took 0.517 seconds to write to a 100 MB, file rate: 203.1 MB/s Took 0.520 seconds to read to a 100 MB file, rate: 201.9 MB/s Took 1.344 seconds to write to a 250 MB, file rate: 195.4 MB/s Took 1.387 seconds to read to a 250 MB file, rate: 189.4 MB/s Took 2.368 seconds to write to a 500 MB, file rate: 221.8 MB/s Took 2.454 seconds to read to a 500 MB file, rate: 214.1 MB/s Took 4.985 seconds to write to a 1001 MB, file rate: 210.7 MB/s Took 5.132 seconds to read to a 1001 MB file, rate: 204.7 MB/s Took 10.276 seconds to write to a 2003 MB, file rate: 204.5 MB/s Took 9.964 seconds to read to a 2003 MB file, rate: 210.9 MB/s 

我要尝试的第一件事是增加BufferedReader和BufferedWriter的缓冲区大小。 默认缓冲区大小没有记录,但至少在Oracle VM中它们是8192个字符,这不会带来太多的性能优势。

如果您只需要复制文件(并且不需要实际访问数据),我会放弃Reader / Writer方法,并使用字节数组作为缓冲区直接使用InputStream和OutputStream:

 FileInputStream fis = new FileInputStream("d:/test.txt"); FileOutputStream fos = new FileOutputStream("d:/test2.txt"); byte[] b = new byte[bufferSize]; int r; while ((r=fis.read(b))>=0) { fos.write(b, 0, r); } fis.close(); fos.close(); 

或实际使用NIO:

 FileChannel in = new RandomAccessFile("d:/test.txt", "r").getChannel(); FileChannel out = new RandomAccessFile("d:/test2.txt", "rw").getChannel(); out.transferFrom(in, 0, Long.MAX_VALUE); in.close(); out.close(); 

在对不同的复制方法进行基准测试时,我在每次运行基准测试之间的差异(持续时间)要大于不同实现之间的差异。 I / O缓存(在操作系统级别和硬盘缓存上)在这里发挥了重要作用,很难说什么更快。 在我的硬件上,使用BufferedReader和BufferedWriter逐行复制1GB文本文件在某些​​运行中需要少于5秒,而在其他运行中则少于30秒。

在Java 7中,您可以使用Files.readAllLines()和Files.write()方法。 这是一个例子:

 List readTextFile(String fileName) throws IOException { Path path = Paths.get(fileName); return Files.readAllLines(path, StandardCharsets.UTF_8); } void writeTextFile(List strLines, String fileName) throws IOException { Path path = Paths.get(fileName); Files.write(path, strLines, StandardCharsets.UTF_8); } 

我建议查看java.nio包中的java.nio 。 套接字的非阻塞IO可能更快:

http://docs.oracle.com/javase/6/docs/api/java/nio/package-summary.html

这篇文章有基准,说它是真的:

http://vanillajava.blogspot.com/2010/07/java-nio-is-faster-than-java-io-for.html

我写了一篇关于用Java读取文件并用1KB到1GB的样本文件相互测试的多种方法的大量文章,我发现以下3种方法读取1GB文件的速度最快:

1)java.nio.file.Files.readAllBytes() – 用不到1秒的时间读取1 GB的测试文件。

 import java.io.File; import java.io.IOException; import java.nio.file.Files; public class ReadFile_Files_ReadAllBytes { public static void main(String [] pArgs) throws IOException { String fileName = "c:\\temp\\sample-10KB.txt"; File file = new File(fileName); byte [] fileBytes = Files.readAllBytes(file.toPath()); char singleChar; for(byte b : fileBytes) { singleChar = (char) b; System.out.print(singleChar); } } } 

2)java.nio.file.Files.lines() – 在1 GB的测试文件中读取大约需要3.5秒。

 import java.io.File; import java.io.IOException; import java.nio.file.Files; import java.util.stream.Stream; public class ReadFile_Files_Lines { public static void main(String[] pArgs) throws IOException { String fileName = "c:\\temp\\sample-10KB.txt"; File file = new File(fileName); try (Stream linesStream = Files.lines(file.toPath())) { linesStream.forEach(line -> { System.out.println(line); }); } } } 

3)java.io.BufferedReader – 花了大约4.5秒来读取1 GB的测试文件。

 import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; public class ReadFile_BufferedReader_ReadLine { public static void main(String [] args) throws IOException { String fileName = "c:\\temp\\sample-10KB.txt"; FileReader fileReader = new FileReader(fileName); try (BufferedReader bufferedReader = new BufferedReader(fileReader)) { String line; while((line = bufferedReader.readLine()) != null) { System.out.println(line); } } } }