什么是较便宜的哈希算法？

我对哈希算法知之甚少。

我需要在将文件转发到远程系统（有点像S3）之前计算Java中的传入文件的哈希值，这需要MD2 / MD5 / SHA-X中的文件哈希。出于安全原因，不会计算此哈希，而只是为了一致性校验和。

我可以使用Java标准库的DigestInputStream在转发文件时实时计算此哈希值，但是想知道最好使用哪种算法来避免使用DigestInputStream的性能问题？

我的一位前同事测试并告诉我们，与unix命令行或文件相比，计算hash实时可能非常昂贵。

关于过早优化的编辑：我在一家公司工作，目的是帮助其他公司取消他们的文件。这意味着我们有一个处理来自其他公司的文件传输的批次。我们将来每天定位数百万份文档，实际上，此批次的执行时间对我们的业务非常敏感。

每天100万个文档的散列优化10毫秒是每天执行时间缩短3小时，这是非常巨大的。

如果您只是想在传输过程中检测意外损坏等，那么一个简单的（非加密）校验和应该就足够了。但请注意（例如）16位校验和将无法在2 ^16中检测到一次随机损坏。并且它无法防止有人故意修改数据。

Checksums上的Wikipedia页面列出了各种选项，包括Adler-32和CRC等常用（和便宜）的选项。

但是，我同意@ppeterka。这种气味“过早优化”。

我知道很多人不相信微基准，但让我发布我得到的结果。

输入：

bigFile.txt = appx 143MB size

hashAlgorithm = MD2, MD5, SHA-1

测试代码：

  while (true){ long l = System.currentTimeMillis(); MessageDigest md = MessageDigest.getInstance(hashAlgorithm); try (InputStream is = new BufferedInputStream(Files.newInputStream(Paths.get("bigFile.txt")))) { DigestInputStream dis = new DigestInputStream(is, md); int b; while ((b = dis.read()) != -1){ } } byte[] digest = md.digest(); System.out.println(System.currentTimeMillis() - l); }

结果：

 MD5 ------ 22030 10356 9434 9310 11332 9976 9575 16076 ----- SHA-1 ----- 18379 10139 10049 10071 10894 10635 11346 10342 10117 9930 ----- MD2 ----- 45290 34232 34601 34319 -----

似乎MD2比MD5或SHA-1慢一点

像NKukhar一样，我试图做一个微基准测试，但是使用不同的代码和更好的结果：

 public static void main(String[] args) throws Exception { String bigFile = "100mbfile"; // We put the file bytes in memory, we don't want to mesure the time it takes to read from the disk byte[] bigArray = IOUtils.toByteArray(Files.newInputStream(Paths.get(bigFile))); byte[] buffer = new byte[50_000]; // the byte buffer we will use to consume the stream // we prepare the algos to test Set algos = ImmutableSet.of( "no_hash", // no hashing MessageDigestAlgorithms.MD5, MessageDigestAlgorithms.SHA_1, MessageDigestAlgorithms.SHA_256, MessageDigestAlgorithms.SHA_384, MessageDigestAlgorithms.SHA_512 ); int executionNumber = 20; for ( String algo : algos ) { long totalExecutionDuration = 0; for ( int i = 0 ; i < 20 ; i++ ) { long beforeTime = System.currentTimeMillis(); InputStream is = new ByteArrayInputStream(bigArray); if ( !"no_hash".equals(algo) ) { is = new DigestInputStream(is, MessageDigest.getInstance(algo)); } while ((is.read(buffer)) != -1) { } long executionDuration = System.currentTimeMillis() - beforeTime; totalExecutionDuration += executionDuration; } System.out.println(algo + " -> average of " + totalExecutionDuration/executionNumber + " millies per execution"); } }

这会在一台优秀的i7开发者机器上为100mb文件生成以下输出：

 no_hash -> average of 6 millies per execution MD5 -> average of 201 millies per execution SHA-1 -> average of 335 millies per execution SHA-256 -> average of 576 millies per execution SHA-384 -> average of 481 millies per execution SHA-512 -> average of 464 millies per execution

什么是较便宜的哈希算法？

使用Guava RateLimiter类调用限制方法

如何从数组中获取唯一项？

在鼠标hover之前，JButton不可见

在mac上添加javax.comm API

Spring MVC和JSR-303 hibernate条件validation

JAXB碎片编组

如果您有ISO国家代码`US`，`FR`，您如何获得Locale代码（`Locale.US`，`Locale.FRANCE`）？

如何在spring-boot中提供静态html内容页面

将最终变量传递给匿名类

org.junit.Assert.assert是否比org.hamcrest.MatcherAssert.assert更好？