BUSTED 如何使用sun.misc.Unsafe加快字节查找速度？

我正在尝试使用Unsafe迭代内存而不是遍历byte []中的值。使用unsafe分配内存块。内存足以容纳65536个字节值。

我在尝试这个：

char aChar = some character if ((byte) 0 == (unsafe.getByte(base_address + aChar) & mask)){ // do something }

代替：

 char aChar = some character if ((byte) 0 == ( lookup[aChar] & mask )){ // do something }

我认为 Unsafe可以比使用常规数组访问更快地访问内存，并对每个索引执行索引检查…

只是一厢情愿地认为jvm会有一个特殊的操作（不安全），它会以某种方式使常规数组访问和迭代更快。在我看来，jvm在正常的byte []迭代中运行良好，并且可以使用普通的，纯粹的，纯粹的java代码快速完成它们。

@millimoose击中众所周知的“钉在头上”

“不安全可能对许多事情有用，但这种微观优化程度不是其中之一。” – 毫米“

在非常严格的有限情况下使用Unsafe会更快：

（仅64位jvm）对于单个65535字节[]查找更快，每次测试只完成一次 。在这种情况下，64_bit jvm上的UnsafeLookup_8B快24％。如果测试重复进行，每次测试都进行两次，那么正常方法现在比不安全快30％。在冷jvm上的纯解释模式中，Unsafe到目前为止更快 – 但仅是第一次，仅适用于较小的数组大小。在32位标准Oracle JVM 7.x上，正常情况比使用unsafe快三倍。

使用Unsafe（在我的测试中）速度较慢：

在Oracle java 64位和32位虚拟机上都较慢
无论操作系统和机器架构如何都会变慢（32位和64位）
即使调用server jvm选项，也会更慢
在32位jvm（64位甚至更慢？）的代码中，不安全从9％或更多（1_GB数组和UnsafeLookup_8B（最快的一个）慢）
在64位jvm中，不安全性从234％或更高（1_MB数组和UnsafeLookup_1B（最快的一个）在64位jvm的代码中慢。

这有什么理由吗？**

当我运行下面发布的代码yellowB（检查1GB字节[]）时，法线仍然是最快的：

 C:\Users\wilf>java -Xms1600m -Xprof -jar "S:\wilf\testing\dist\testing.jar" initialize data... initialize data done! use normalLookup()... Not found '0' time : 1967737 us. use unsafeLookup_1B()... Not found '0' time : 2923367 us. use unsafeLookup_8B()... Not found '0' time : 2495663 us. Flat profile of 26.35 secs (2018 total ticks): main Interpreted + native Method 0.0% 1 + 0 test.StackOverflow.main 0.0% 1 + 0 Total interpreted Compiled + native Method 67.8% 1369 + 0 test.StackOverflow.main 11.7% 236 + 0 test.StackOverflow.unsafeLookup_8B 11.2% 227 + 0 test.StackOverflow.unsafeLookup_1B 9.1% 184 + 0 test.StackOverflow.normalLookup 99.9% 2016 + 0 Total compiled Stub + native Method 0.0% 0 + 1 sun.misc.Unsafe.getLong 0.0% 0 + 1 Total stub Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM Thread-local ticks: 100.0% 1 Blocked (of total) Global summary of 26.39 seconds: 100.0% 2023 Received ticks C:\Users\wilf>java -version java version "1.7.0_07" Java(TM) SE Runtime Environment (build 1.7.0_07-b11) Java HotSpot(TM) Client VM (build 23.3-b01, mixed mode, sharing)

CPU为：Intel Core 2 Duo E4600 @ 2.4GHZ 4.00GB（3.25GB可用）操作系统：Windows 7（32）

使用Windows 7_64,32位java在4核AMD64上运行测试：

 initialize data... initialize data done! use normalLookup()... Not found '0' time : 1631142 us. use unsafeLookup_1B()... Not found '0' time : 2365214 us. use unsafeLookup_8B()... Not found '0' time : 1783320 us.

使用Windows 7_64,64位java在4 Core AMD64上运行测试：

 use normalLookup()... Not found '0' time : 655146 us. use unsafeLookup_1B()... Not found '0' time : 904783 us. use unsafeLookup_8B()... Not found '0' time : 764427 us. Flat profile of 6.34 secs (13 total ticks): main Interpreted + native Method 23.1% 3 + 0 java.io.PrintStream.println 23.1% 3 + 0 test.StackOverflow.unsafeLookup_8B 15.4% 2 + 0 test.StackOverflow.main 7.7% 1 + 0 java.io.DataInputStream. 69.2% 9 + 0 Total interpreted Compiled + native Method 7.7% 0 + 1 test.StackOverflow.unsafeLookup_1B 7.7% 0 + 1 test.StackOverflow.main 7.7% 0 + 1 test.StackOverflow.normalLookup 7.7% 0 + 1 test.StackOverflow.unsafeLookup_8B 30.8% 0 + 4 Total compiled Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM Thread-local ticks: 100.0% 1 Blocked (of total) Global summary of 6.35 seconds: 100.0% 14 Received ticks 42.9% 6 Compilation

我认为你发布的两个函数基本相同，因为它们只读取1个字节，然后将其转换为int并进行进一步的比较。

每次读取4字节int或8字节更有效。我写了两个函数来做同样的事情：比较两个字节[]的内容，看看它们是否相同：

function1：

 public static boolean hadoopEquals(byte[] b1, byte[] b2) { if(b1 == b2) { return true; } if(b1.length != b2.length) { return false; } // Bring WritableComparator code local for(int i = 0;i < b1.length; ++i) { int a = (b1[i] & 0xff); int b = (b2[i] & 0xff); if (a != b) { return false; } } return true; }

function2：

 public static boolean goodEquals(byte[] b1,byte[] b2) { if(b1 == b2) { return true; } if(b1.length != b2.length) { return false; } int baseOffset = UnSafe.arrayBaseOffset(byte[].class); int numLongs = (int)Math.ceil(b1.length / 8.0); for(int i = 0;i < numLongs; ++i) { long currentOffset = baseOffset + (i * 8); long l1 = UnSafe.getLong(b1, currentOffset); long l2 = UnSafe.getLong(b2, currentOffset); if(0L != (l1 ^ l2)) { return false; } } return true; }

我在笔记本电脑上运行了这两个function（corei7 2630QM，8GB DDR3,64bit win 7,64bit Hotspot JVM），比较两个400MB byte []，结果如下：

function1：~670ms

function2：~80ms

2更快。

所以我的建议是每次读取8字节并使用XOR运算符（^）：

 long l1 = UnSafe.getLong(byteArray, offset); //8 byte if(0L == l1 ^ 0xFF) //if the lowest byte == 0? /* do something */ if(0L == l1 ^ 0xFF00) //if the 2nd lowest byte == 0? /* do something */ /* go on... */

================================================== ==========================

嗨Wilf，我使用你的代码制作一个测试类，如下所示，这个类比较了查找字节数组中第一个0的3个函数之间的速度：

 package test; import java.lang.reflect.Field; import sun.misc.Unsafe; /** * Test the speed in looking up the 1st 0 in a byte array * Set -Xms the same as -Xms to avoid Heap reallocation * * @author yellowb * */ public class StackOverflow { public static Unsafe UnSafe; public static Unsafe getUnsafe() throws SecurityException, NoSuchFieldException, IllegalArgumentException, IllegalAccessException { Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe"); theUnsafe.setAccessible(true); Unsafe unsafe = (Unsafe) theUnsafe.get(null); return unsafe; } /** * use 'byte[index]' form to read 1 byte every time * @param buf */ public static void normalLookup(byte[] buf) { for (int i = 0; i < buf.length; ++i) { if ((byte) 0 == buf[i]) { System.out.println("The 1st '0' is at position : " + i); return; } } System.out.println("Not found '0'"); } /** * use Unsafe.getByte to read 1 byte every time directly from the memory * @param buf */ public static void unsafeLookup_1B(byte[] buf) { int baseOffset = UnSafe.arrayBaseOffset(byte[].class); for (int i = 0; i < buf.length; ++i) { byte b = UnSafe.getByte(buf, (long) (baseOffset + i)); if (0 == ((int) b & 0xFF)) { System.out.println("The 1st '0' is at position : " + i); return; } } System.out.println("Not found '0'"); } /** * use Unsafe.getLong to read 8 byte every time directly from the memory * @param buf */ public static void unsafeLookup_8B(byte[] buf) { int baseOffset = UnSafe.arrayBaseOffset(byte[].class); //The first (numLongs * 8) bytes will be read by Unsafe.getLong in below loop int numLongs = buf.length / 8; long currentOffset = 0L; for (int i = 0; i < numLongs; ++i) { currentOffset = baseOffset + (i * 8); //the step is 8 bytes long l = UnSafe.getLong(buf, currentOffset); //Compare each byte(in the 8-Byte long) to 0 //PS:x86 cpu is little-endian mode if (0L == (l & 0xFF)) { System.out.println("The 1st '0' is at position : " + (i * 8)); return; } if (0L == (l & 0xFF00L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 1)); return; } if (0L == (l & 0xFF0000L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 2)); return; } if (0L == (l & 0xFF000000L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 3)); return; } if (0L == (l & 0xFF00000000L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 4)); return; } if (0L == (l & 0xFF0000000000L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 5)); return; } if (0L == (l & 0xFF000000000000L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 6)); return; } if (0L == (l & 0xFF00000000000000L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 7)); return; } } //If some rest bytes exists int rest = buf.length % 8; if(0 != rest) { currentOffset = currentOffset + 8; //Because the length of rest bytes < 8,we have to read them one by one for(; currentOffset < (baseOffset + buf.length); ++currentOffset) { byte b = UnSafe.getByte(buf, (long)currentOffset); if (0 == ((int) b & 0xFF)) { System.out.println("The 1st '0' is at position : " + (currentOffset - baseOffset)); return; } } } System.out.println("Not found '0'"); } public static void main(String[] args) throws SecurityException, NoSuchFieldException, IllegalArgumentException, IllegalAccessException { UnSafe = getUnsafe(); int len = 1024 * 1024 * 1024; //1G long startTime = 0L; long endTime = 0L; System.out.println("initialize data..."); byte[] byteArray1 = new byte[len]; for (int i = 0; i < len; ++i) { byteArray1[i] = (byte) (i % 128 + 1); //No byte will equal to 0 } //If you want to set one byte to 0,uncomment the below statement // byteArray1[2500] = (byte)0; System.out.println("initialize data done!"); System.out.println("use normalLookup()..."); startTime = System.nanoTime(); normalLookup(byteArray1); endTime = System.nanoTime(); System.out.println("time : " + ((endTime - startTime) / 1000) + " us."); System.out.println("use unsafeLookup_1B()..."); startTime = System.nanoTime(); unsafeLookup_1B(byteArray1); endTime = System.nanoTime(); System.out.println("time : " + ((endTime - startTime) / 1000) + " us."); System.out.println("use unsafeLookup_8B()..."); startTime = System.nanoTime(); unsafeLookup_8B(byteArray1); endTime = System.nanoTime(); System.out.println("time : " + ((endTime - startTime) / 1000) + " us."); } }

输出是：

 initialize data... initialize data done! use normalLookup()... Not found '0' time : 1271781 us. use unsafeLookup_1B()... Not found '0' time : 716898 us. use unsafeLookup_8B()... Not found '0' time : 591689 us.

结果表明，每次通过Unsafe.getByte（）读取1个字节比定期迭代byte []要快得多。读取8个字节的长度是最快的。

我认为Unsafe可以比使用常规数组访问更快地访问内存，并对每个索引执行索引检查…

范围检查可能不是因素的一个可能原因是JIT编译器的优化器。由于数组的大小永远不会改变，优化器可能会“提升”所有范围检查并在循环开始时执行一次。

相比之下，JIT编译器可能无法优化（例如内联）Unsafe.getByte（）调用。或者getByte方法可能有读屏障…）

然而这是猜测。可以肯定的是让JVM为这两种情况转储出JIT编译的本机代码，并逐个指令地比较它们。

不安全的方法可能被标记为本机，但这并不意味着它们必然是JNI。几乎所有不安全的方法都是内在的（请参阅此处的简短post： http ： //psy-lob-saw.blogspot.co.uk/2012/10/java-intrinsics-are-not-jni-calls.html ） Sun JVM将转换为单个汇编指令（在许多情况下），对于其他JVM，它们在处理内部函数时可能会或可能不会很好，并且可能将它们转换为JNI调用或普通java调用。据我所知，JRockit倾向于采用JNI方式，Android JVM也是如此。

BUSTED 如何使用sun.misc.Unsafe加快字节查找速度？

Maven IDEA插件（或eclipse插件）

unix中的java classpath

在Swing中设置JComboBox的背景

如何在java或php中解析一个非常大的xml文件并插入到mysql DB中

以编程方式计算Java对象占用的内存，包括它引用的对象

在Netty连接关闭后重新连接的最佳方法是什么

JSP页面中的UTF-8编码

静音Stanford coreNLP日志记录

Kerberos错误：GSSHeader找不到正确的标记

如何从applet打开新的applet窗口

** BUSTED **如何使用sun.misc.Unsafe加快字节查找速度？

Maven IDEA插件（或eclipse插件）

unix中的java classpath

在Swing中设置JComboBox的背景

如何在java或php中解析一个非常大的xml文件并插入到mysql DB中

以编程方式计算Java对象占用的内存，包括它引用的对象

在Netty连接关闭后重新连接的最佳方法是什么

JSP页面中的UTF-8编码

静音Stanford coreNLP日志记录

Kerberos错误：GSSHeader找不到正确的标记

如何从applet打开新的applet窗口

BUSTED 如何使用sun.misc.Unsafe加快字节查找速度？