Java大文件磁盘IO性能

我的硬盘上有两个(每个2GB)文件,想要将它们相互比较:

  • 使用Windows资源管理器复制原始文件大约需要。 2-4分钟(即读写 – 在同一物理和逻辑磁盘上)。
  • 使用java.io.FileInputStream读取两次并在每个字节的字节上比较字节数组需要20多分钟。
  • java.io.BufferedInputStream缓冲区为64kb,文件以块的forms读取然后进行比较。
  • 比较完成是一个紧凑的循环之类

     int numRead = Math.min(numRead[0], numRead[1]); for (int k = 0; k < numRead; k++) { if (buffer[1][k] != buffer[0][k]) { return buffer[0][k] - buffer[1][k]; } } 

我该怎么做才能加快速度呢? NIO应该比普通的流更快吗? Java无法使用DMA / SATA技术,而是执行一些缓慢的OS-API调用吗?

编辑:
谢谢你的回答。 我做了一些基于它们的实验。 正如安德烈亚斯所示

流或nio方法没有太大差别。
更重要的是正确的缓冲区大小。

我的实validation实了这一点。 由于文件是以大块读取的,所以即使是额外的缓冲区( BufferedInputStream )也不会提供任何内容。 优化比较是可能的,并且我通过32次展开获得了最佳结果,但与磁盘读取相比,花费的时间比较小,因此加速很小。 看起来我无能为力;-(

我尝试了三种不同的方法来比较两个相同的3,8 gb文件,缓冲区大小介于8 kb和1 MB之间。 第一种方法只使用两个缓冲输入流

第二种方法使用一个线程池,它读入两个不同的线程并在第三个线程中进行比较。 这会以高CPU利用率为代价获得略高的吞吐量。 对于那些短期运行的任务,线程池的管理需要大量的开销。

第三种方法使用nio,由laginimaineb发布

正如您所看到的,一般方法没有太大差异。 更重要的是正确的缓冲区大小。

奇怪的是,我使用线程读取的字节数少了1个字节。 我无法发现错误。

 comparing just with two streams I was equal, even after 3684070360 bytes and reading for 704813 ms (4,98MB/sec * 2) with a buffer size of 8 kB I was equal, even after 3684070360 bytes and reading for 578563 ms (6,07MB/sec * 2) with a buffer size of 16 kB I was equal, even after 3684070360 bytes and reading for 515422 ms (6,82MB/sec * 2) with a buffer size of 32 kB I was equal, even after 3684070360 bytes and reading for 534532 ms (6,57MB/sec * 2) with a buffer size of 64 kB I was equal, even after 3684070360 bytes and reading for 422953 ms (8,31MB/sec * 2) with a buffer size of 128 kB I was equal, even after 3684070360 bytes and reading for 793359 ms (4,43MB/sec * 2) with a buffer size of 256 kB I was equal, even after 3684070360 bytes and reading for 746344 ms (4,71MB/sec * 2) with a buffer size of 512 kB I was equal, even after 3684070360 bytes and reading for 669969 ms (5,24MB/sec * 2) with a buffer size of 1024 kB comparing with threads I was equal, even after 3684070359 bytes and reading for 602391 ms (5,83MB/sec * 2) with a buffer size of 8 kB I was equal, even after 3684070359 bytes and reading for 523156 ms (6,72MB/sec * 2) with a buffer size of 16 kB I was equal, even after 3684070359 bytes and reading for 527547 ms (6,66MB/sec * 2) with a buffer size of 32 kB I was equal, even after 3684070359 bytes and reading for 276750 ms (12,69MB/sec * 2) with a buffer size of 64 kB I was equal, even after 3684070359 bytes and reading for 493172 ms (7,12MB/sec * 2) with a buffer size of 128 kB I was equal, even after 3684070359 bytes and reading for 696781 ms (5,04MB/sec * 2) with a buffer size of 256 kB I was equal, even after 3684070359 bytes and reading for 727953 ms (4,83MB/sec * 2) with a buffer size of 512 kB I was equal, even after 3684070359 bytes and reading for 741000 ms (4,74MB/sec * 2) with a buffer size of 1024 kB comparing with nio I was equal, even after 3684070360 bytes and reading for 661313 ms (5,31MB/sec * 2) with a buffer size of 8 kB I was equal, even after 3684070360 bytes and reading for 656156 ms (5,35MB/sec * 2) with a buffer size of 16 kB I was equal, even after 3684070360 bytes and reading for 491781 ms (7,14MB/sec * 2) with a buffer size of 32 kB I was equal, even after 3684070360 bytes and reading for 317360 ms (11,07MB/sec * 2) with a buffer size of 64 kB I was equal, even after 3684070360 bytes and reading for 643078 ms (5,46MB/sec * 2) with a buffer size of 128 kB I was equal, even after 3684070360 bytes and reading for 865016 ms (4,06MB/sec * 2) with a buffer size of 256 kB I was equal, even after 3684070360 bytes and reading for 716796 ms (4,90MB/sec * 2) with a buffer size of 512 kB I was equal, even after 3684070360 bytes and reading for 652016 ms (5,39MB/sec * 2) with a buffer size of 1024 kB 

使用的代码:

 import junit.framework.Assert; import org.junit.Before; import org.junit.Test; import java.io.BufferedInputStream; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.nio.ByteBuffer; import java.nio.channels.FileChannel; import java.text.DecimalFormat; import java.text.NumberFormat; import java.util.Arrays; import java.util.concurrent.*; public class FileCompare { private static final int MIN_BUFFER_SIZE = 1024 * 8; private static final int MAX_BUFFER_SIZE = 1024 * 1024; private String fileName1; private String fileName2; private long start; private long totalbytes; @Before public void createInputStream() { fileName1 = "bigFile.1"; fileName2 = "bigFile.2"; } @Test public void compareTwoFiles() throws IOException { System.out.println("comparing just with two streams"); int currentBufferSize = MIN_BUFFER_SIZE; while (currentBufferSize <= MAX_BUFFER_SIZE) { compareWithBufferSize(currentBufferSize); currentBufferSize *= 2; } } @Test public void compareTwoFilesFutures() throws IOException, ExecutionException, InterruptedException { System.out.println("comparing with threads"); int myBufferSize = MIN_BUFFER_SIZE; while (myBufferSize <= MAX_BUFFER_SIZE) { start = System.currentTimeMillis(); totalbytes = 0; compareWithBufferSizeFutures(myBufferSize); myBufferSize *= 2; } } @Test public void compareTwoFilesNio() throws IOException { System.out.println("comparing with nio"); int myBufferSize = MIN_BUFFER_SIZE; while (myBufferSize <= MAX_BUFFER_SIZE) { start = System.currentTimeMillis(); totalbytes = 0; boolean wasEqual = isEqualsNio(myBufferSize); if (wasEqual) { printAfterEquals(myBufferSize); } else { Assert.fail("files were not equal"); } myBufferSize *= 2; } } private void compareWithBufferSize(int myBufferSize) throws IOException { final BufferedInputStream inputStream1 = new BufferedInputStream( new FileInputStream(new File(fileName1)), myBufferSize); byte[] buff1 = new byte[myBufferSize]; final BufferedInputStream inputStream2 = new BufferedInputStream( new FileInputStream(new File(fileName2)), myBufferSize); byte[] buff2 = new byte[myBufferSize]; int read1; start = System.currentTimeMillis(); totalbytes = 0; while ((read1 = inputStream1.read(buff1)) != -1) { totalbytes += read1; int read2 = inputStream2.read(buff2); if (read1 != read2) { break; } if (!Arrays.equals(buff1, buff2)) { break; } } if (read1 == -1) { printAfterEquals(myBufferSize); } else { Assert.fail("files were not equal"); } inputStream1.close(); inputStream2.close(); } private void compareWithBufferSizeFutures(int myBufferSize) throws ExecutionException, InterruptedException, IOException { final BufferedInputStream inputStream1 = new BufferedInputStream( new FileInputStream( new File(fileName1)), myBufferSize); final BufferedInputStream inputStream2 = new BufferedInputStream( new FileInputStream( new File(fileName2)), myBufferSize); final boolean wasEqual = isEqualsParallel(myBufferSize, inputStream1, inputStream2); if (wasEqual) { printAfterEquals(myBufferSize); } else { Assert.fail("files were not equal"); } inputStream1.close(); inputStream2.close(); } private boolean isEqualsParallel(int myBufferSize , final BufferedInputStream inputStream1 , final BufferedInputStream inputStream2) throws InterruptedException, ExecutionException { final byte[] buff1Even = new byte[myBufferSize]; final byte[] buff1Odd = new byte[myBufferSize]; final byte[] buff2Even = new byte[myBufferSize]; final byte[] buff2Odd = new byte[myBufferSize]; final Callable read1Even = new Callable() { public Integer call() throws Exception { return inputStream1.read(buff1Even); } }; final Callable read2Even = new Callable() { public Integer call() throws Exception { return inputStream2.read(buff2Even); } }; final Callable read1Odd = new Callable() { public Integer call() throws Exception { return inputStream1.read(buff1Odd); } }; final Callable read2Odd = new Callable() { public Integer call() throws Exception { return inputStream2.read(buff2Odd); } }; final Callable oddEqualsArray = new Callable() { public Boolean call() throws Exception { return Arrays.equals(buff1Odd, buff2Odd); } }; final Callable evenEqualsArray = new Callable() { public Boolean call() throws Exception { return Arrays.equals(buff1Even, buff2Even); } }; ExecutorService executor = Executors.newCachedThreadPool(); boolean isEven = true; Future read1 = null; Future read2 = null; Future isEqual = null; int lastSize = 0; while (true) { if (isEqual != null) { if (!isEqual.get()) { return false; } else if (lastSize == -1) { return true; } } if (read1 != null) { lastSize = read1.get(); totalbytes += lastSize; final int size2 = read2.get(); if (lastSize != size2) { return false; } } isEven = !isEven; if (isEven) { if (read1 != null) { isEqual = executor.submit(oddEqualsArray); } read1 = executor.submit(read1Even); read2 = executor.submit(read2Even); } else { if (read1 != null) { isEqual = executor.submit(evenEqualsArray); } read1 = executor.submit(read1Odd); read2 = executor.submit(read2Odd); } } } private boolean isEqualsNio(int myBufferSize) throws IOException { FileChannel first = null, seconde = null; try { first = new FileInputStream(fileName1).getChannel(); seconde = new FileInputStream(fileName2).getChannel(); if (first.size() != seconde.size()) { return false; } ByteBuffer firstBuffer = ByteBuffer.allocateDirect(myBufferSize); ByteBuffer secondBuffer = ByteBuffer.allocateDirect(myBufferSize); int firstRead, secondRead; while (first.position() < first.size()) { firstRead = first.read(firstBuffer); totalbytes += firstRead; secondRead = seconde.read(secondBuffer); if (firstRead != secondRead) { return false; } if (!nioBuffersEqual(firstBuffer, secondBuffer, firstRead)) { return false; } } return true; } finally { if (first != null) { first.close(); } if (seconde != null) { seconde.close(); } } } private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) { if (first.limit() != second.limit() || length > first.limit()) { return false; } first.rewind(); second.rewind(); for (int i = 0; i < length; i++) { if (first.get() != second.get()) { return false; } } return true; } private void printAfterEquals(int myBufferSize) { NumberFormat nf = new DecimalFormat("#.00"); final long dur = System.currentTimeMillis() - start; double seconds = dur / 1000d; double megabytes = totalbytes / 1024 / 1024; double rate = (megabytes) / seconds; System.out.println("I was equal, even after " + totalbytes + " bytes and reading for " + dur + " ms (" + nf.format(rate) + "MB/sec * 2)" + " with a buffer size of " + myBufferSize / 1024 + " kB"); } } 

有了这么大的文件, 你可以用java.nio获得更好的性能。

此外,使用java流读取单个字节可能非常慢。 使用字节数组(根据我自己的经验2-6K元素,ymmv,因为它看起来像平台/应用程序特定)将显着提高您使用流的读取性能。

使用Java读取和写入文件也同样快。 您可以使用FileChannels 。 至于比较文件,显然这需要花费大量时间来比较字节到字节这里是一个使用FileChannels和ByteBuffers的例子(可以进一步优化):

 public static boolean compare(String firstPath, String secondPath, final int BUFFER_SIZE) throws IOException { FileChannel firstIn = null, secondIn = null; try { firstIn = new FileInputStream(firstPath).getChannel(); secondIn = new FileInputStream(secondPath).getChannel(); if (firstIn.size() != secondIn.size()) return false; ByteBuffer firstBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE); ByteBuffer secondBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE); int firstRead, secondRead; while (firstIn.position() < firstIn.size()) { firstRead = firstIn.read(firstBuffer); secondRead = secondIn.read(secondBuffer); if (firstRead != secondRead) return false; if (!buffersEqual(firstBuffer, secondBuffer, firstRead)) return false; } return true; } finally { if (firstIn != null) firstIn.close(); if (secondIn != null) firstIn.close(); } } private static boolean buffersEqual(ByteBuffer first, ByteBuffer second, final int length) { if (first.limit() != second.limit()) return false; if (length > first.limit()) return false; first.rewind(); second.rewind(); for (int i=0; i 

修改NIO比较function后,我得到以下结果。

 I was equal, even after 4294967296 bytes and reading for 304594 ms (13.45MB/sec * 2) with a buffer size of 1024 kB I was equal, even after 4294967296 bytes and reading for 225078 ms (18.20MB/sec * 2) with a buffer size of 4096 kB I was equal, even after 4294967296 bytes and reading for 221351 ms (18.50MB/sec * 2) with a buffer size of 16384 kB 

注意:这意味着正在以37 MB / s的速率读取文件

在更快的驱动器上运行相同的东西

 I was equal, even after 4294967296 bytes and reading for 178087 ms (23.00MB/sec * 2) with a buffer size of 1024 kB I was equal, even after 4294967296 bytes and reading for 119084 ms (34.40MB/sec * 2) with a buffer size of 4096 kB I was equal, even after 4294967296 bytes and reading for 109549 ms (37.39MB/sec * 2) with a buffer size of 16384 kB 

注意:这意味着正在以74.8 MB / s的速率读取文件

 private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) { if (first.limit() != second.limit() || length > first.limit()) { return false; } first.rewind(); second.rewind(); int i; for (i = 0; i < length-7; i+=8) { if (first.getLong() != second.getLong()) { return false; } } for (; i < length; i++) { if (first.get() != second.get()) { return false; } } return true; } 

以下是关于在java中读取文件的不同方法的相对优点的好文章。 可能有一些用处:

如何快速阅读文件

你可以看看太阳的文章进行I / O调整 (尽管已经有点过时),也许你可以找到那里的例子和你的代码之间的相似之处。 还要看一下包含比java.io更快的I / O元素的java.nio包。 Dobbs Journal博士有一篇关于使用java.nio的高性能IO的相当不错的文章。

如果是这样,那里还有其他示例和调优技巧,可以帮助您加快代码速度。

此外,Arrays类具有比较内置字节数组的方法 ,也许这些方法也可以用来使事情更快并且稍微清理一下你的循环。

为了更好地进行比较,请尝试一次复制两个文件。 硬盘驱动器可以比读取两个文件更有效地读取一个文件(因为磁头必须来回移动才能读取)减少这种情况的一种方法是使用更大的缓冲区,例如16 MB。 与ByteBuffer。

使用ByteBuffer,您可以通过比较long值和getLong()一次比较8个字节

如果您的Java是高效的,那么大部分工作都在磁盘/操作系统中进行读写,因此它不应该比使用任何其他语言慢得多(因为磁盘/操作系统是瓶颈)

在确定它不是代码中的错误之前,不要认为Java很慢。

我发现在这篇文章中链接的很多文章都是过时的(也有一些非常有见地的东西)。 2001年有一些文章链接起来,信息充其量是有问题的。 机械同情的Martin Thompson在2011年写了很多关于此的内容。请参考他为背景和理论撰写的内容。

我发现NIO与NIO的性能关系不大。 它更多地是关于输出缓冲区的大小(在那个上读取字节数组)。 NIO没有魔力让它快速进行网络规模的酱油。

我能够采用Martin的例子并使用1.0时代的OutputStream并使其尖叫。 NIO也很快,但最大的指标就是输出缓冲区的大小,不管你是否使用NIO,除非你当然使用内存映射的NIO然后这很重要。 🙂

如果您想了解最新的权威信息,请参阅Martin的博客:

http://mechanical-sympathy.blogspot.com/2011/12/java-sequential-io-performance.html

如果你想看看NIO如何不会产生那么大的差别(因为我能够使用更快的常规IO编写示例),请参阅:

http://www.dzone.com/links/fast_java_io_nio_is_always_faster_than_fileoutput.html

我已经测试了我对带有快速硬盘的新Windows笔记本电脑,带有SSD的macbook pro,EC2 xlarge和带有最大IOPS /高速I / O的EC2 4x大的假设(很快就在大磁盘NAS光纤盘上)因此它可以工作(对于较小的EC2实例存在一些问题但是如果你关心性能……你会使用一个小的EC2实例吗?)。 如果你使用真正的硬件,在我的测试中到目前为止,传统的IO总是获胜。 如果您使用高/ IO EC2,那么这也是一个明显的赢家。 如果您在有源EC2实例下使用,NIO可以获胜。

基准测试没有替代品。

无论如何,我不是专家,我只是使用Martin Thompson爵士在他的博客文章中写的框架进行了一些实证测试。

我把它带到了下一步,并使用带有TransferQueue的 Files.newInputStream (来自JDK 7)来创建用于发出Java I / O尖叫的配方(即使在小EC2实例上)。 该配方可以在本文档底部找到Boon( https://github.com/RichardHightower/boon/wiki/Auto-Growable-Byte-Buffer-like-a-ByteBuilder )。 这允许我使用传统的OutputStream,但在较小的EC2实例上运行良好。 (我是Boon的主要作者。但是我接受新作者。工资很糟糕。每小时0美元。但好消息是,我可以随时加倍你的报酬。)

我的2美分。

看看这个,看看为什么TransferQueue很重要。 http://php.sabscape.com/blog/?p=557

主要经验:

  1. 如果您关心性能永远不会使用BufferedOutputStream
  2. NIO并不总是与性能相等。
  3. 缓冲区大小最重要。
  4. 用于高速写入的循环缓冲区至关重要。
  5. GC可以/将/确实会破坏您的高速写入性能。
  6. 您必须有一些机制来重用已用完的缓冲区。

DMA / SATA是硬件/低级技术,任何编程语言都不可见。

对于内存映射输入/输出,你应该使用java.nio,我相信。

你确定你没有按一个字节读取这些文件吗? 这将是浪费,我建议逐块进行,每个块应该像64兆字节,以尽量减少搜索。

尝试将输入流上的缓冲区设置为几兆字节。