一些运行/ JIT的故障后Java循环变慢了?

所以我想对一些基本的javafunction进行基准测试,以便为这个问题添加一些信息: 将方法声明为静态会带来什么好处 。

我知道写作基准有时并不容易,但这里发生的事情我无法解释。

请注意,我并没有考虑如何解决这个问题,而是为什么会发生这种情况*

测试类:

public class TestPerformanceOfStaticVsDynamicCalls { private static final long RUNS = 1_000_000_000L; public static void main( String [] args ){ new TestPerformanceOfStaticVsDynamicCalls().run(); } private void run(){ long r=0; long start, end; for( int loop = 0; loop<10; loop++ ){ // Benchmark start = System.currentTimeMillis(); for( long i = 0; i < RUNS; i++ ) { r += addStatic( 1, i ); } end = System.currentTimeMillis(); System.out.println( "Static: " + ( end - start ) + " ms" ); start = System.currentTimeMillis(); for( long i = 0; i < RUNS; i++ ) { r += addDynamic( 1, i ); } end = System.currentTimeMillis(); System.out.println( "Dynamic: " + ( end - start ) + " ms" ); // Do something with r to keep compiler happy System.out.println( r ); } } private long addDynamic( long a, long b ){ return a+b; } private static long addStatic( long a, long b ){ return a+b; } } 

我期待第一个循环是热身,以下循环更快。

在Eclipse中运行它会产生以下奇怪的结果:

 Static: 621 ms Dynamic: 631 ms 1000000001000000000 Static: 2257 ms Dynamic: 2501 ms 2000000002000000000 Static: 2258 ms Dynamic: 2469 ms 3000000003000000000 Static: 2231 ms Dynamic: 2464 ms 4000000004000000000 

那么wtf? 它变慢了。 要进行交叉检查,我使用java / c 7运行相同的代码:

 Static: 620 ms Dynamic: 627 ms 1000000001000000000 Static: 897 ms Dynamic: 617 ms 2000000002000000000 Static: 901 ms Dynamic: 615 ms 3000000003000000000 Static: 888 ms Dynamic: 616 ms 4000000004000000000 

所以这里只有静态调用对于后续循环变慢。 更奇怪的是,如果我重新排列代码只在最后一个循环后打印r我在Eclipse中得到它:

 Static: 620 ms Dynamic: 635 ms Static: 2285 ms Dynamic: 893 ms Static: 2258 ms Dynamic: 900 ms Static: 2280 ms Dynamic: 905 ms 4000000004000000000 

这与java / c 7:

 Static: 620 ms Dynamic: 623 ms Static: 890 ms Dynamic: 614 ms Static: 890 ms Dynamic: 616 ms Static: 886 ms Dynamic: 614 ms 4000000004000000000 

而在它改变eclipse中动态/静态基准测试的顺序时:

 Dynamic: 618 ms Static: 626 ms 1000000001000000000 Dynamic: 632 ms Static: 2524 ms 2000000002000000000 Dynamic: 617 ms Static: 2528 ms 3000000003000000000 Dynamic: 622 ms Static: 2506 ms 4000000004000000000 

在java / c 7中:

 Dynamic: 625 ms Static: 646 ms 1000000001000000000 Dynamic: 2470 ms Static: 633 ms 2000000002000000000 Dynamic: 2459 ms Static: 635 ms 3000000003000000000 Dynamic: 2464 ms Static: 645 ms 4000000004000000000 

那么这里发生了什么?

编辑:一些系统信息:

 Java version "1.7.0_55" OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1) OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode) Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz 

EDIT2:

使用Java8:

 Static: 620 ms Dynamic: 624 ms 1000000001000000000 Static: 890 ms Dynamic: 618 ms 2000000002000000000 Static: 891 ms Dynamic: 616 ms 3000000003000000000 Static: 892 ms Dynamic: 617 ms 4000000004000000000 

其他代码排序在这里产生类似的奇怪(但是更好)的结果。

序言:手动编写微基准测试几乎总是注定要失败。
有些框架已经解决了常见的基准测试问题。

  1. JIT编译单元是一种方法。 将几个基准合并到一个方法中会导致不可预测的结果。

  2. JIT严重依赖于执行配置文件,即运行时统计信息。 如果方法长时间运行第一个场景,JIT将优化生成的代码。 当方法突然切换到另一个方案时,不要指望它以相同的速度运行。

  3. JIT可能会跳过优化未执行的代码。 它将为此代码留下一个不常见的陷阱。 如果陷阱被击中,JVM将取消优化编译的方法,切换到解释器,然后用新知识重新编译代码。 例如,当你的方法run第一次在第一个热循环中编译时,JIT还不知道System.out.println 。 一旦执行到达println ,早期编译的代码很可能被去优化。

  4. 方法越大 – 为JIT编译器优化它就越困难。 例如,似乎没有足够的备用寄存器来保存所有局部变量。 这就是你的情况。

总结一下,您的基准测试似乎通过以下场景:

  1. 第一个热循环( addStatic )触发run方法的编译。 执行配置文件除了addStatic方法之外什么都不知道。
  2. System.out.println触发去优化,之后第二个热循环( addDynamic )导致重新编译run方法。
  3. 现在,执行配置文件仅包含有关addDynamic信息,因此JIT优化了第二个循环,并且第一个循环似乎有额外的寄存器溢出:

优化循环:

 0x0000000002d01054: add %rbx,%r14 0x0000000002d01057: add $0x1,%rbx ;*ladd ; - TestPerformanceOfStaticVsDynamicCalls::addDynamic@2 ; - TestPerformanceOfStaticVsDynamicCalls::run@105 0x0000000002d0105b: add $0x1,%r14 ; OopMap{rbp=Oop off=127} ;*goto ; - TestPerformanceOfStaticVsDynamicCalls::run@116 0x0000000002d0105f: test %eax,-0x1c91065(%rip) # 0x0000000001070000 ;*lload ; - TestPerformanceOfStaticVsDynamicCalls::run@92 ; {poll} 0x0000000002d01065: cmp $0x3b9aca00,%rbx 0x0000000002d0106c: jl 0x0000000002d01054 

循环使用额外的寄存器溢出:

 0x0000000002d011d0: mov 0x28(%rsp),%r11 <---- the problem is here 0x0000000002d011d5: add %r10,%r11 0x0000000002d011d8: add $0x1,%r10 0x0000000002d011dc: add $0x1,%r11 0x0000000002d011e0: mov %r11,0x28(%rsp) ;*ladd ; - TestPerformanceOfStaticVsDynamicCalls::addStatic@2 ; - TestPerformanceOfStaticVsDynamicCalls::run@33 0x0000000002d011e5: mov 0x28(%rsp),%r11 <---- the problem is here 0x0000000002d011ea: add $0x1,%r11 ; OopMap{[32]=Oop off=526} ;*goto ; - TestPerformanceOfStaticVsDynamicCalls::run@44 0x0000000002d011ee: test %eax,-0x1c911f4(%rip) # 0x0000000001070000 ;*goto ; - TestPerformanceOfStaticVsDynamicCalls::run@44 ; {poll} 0x0000000002d011f4: cmp $0x3b9aca00,%r10 0x0000000002d011fb: jl 0x0000000002d011d0 ;*ifge ; - TestPerformanceOfStaticVsDynamicCalls::run@25 

PS以下JVM选项对于分析JIT编译很有用:

 -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:+PrintAssembly -XX:CompileOnly=TestPerformanceOfStaticVsDynamicCalls 

看起来Java是向变量r添加值的方式。 我做了一些更改,添加方法run2()

 public class TestPerformanceOfStaticVsDynamicCalls { private static final long RUNS = 1_000_000_000L; public static void main(String[] args) { System.out.println("Test run 1 ================================="); new TestPerformanceOfStaticVsDynamicCalls().run(); System.out.println("Test run 2 ================================="); new TestPerformanceOfStaticVsDynamicCalls().run2(); } private void run2() { long r = 0; long start, end; for (int loop = 0; loop < 10; loop++) { // Benchmark long stat = 0; start = System.currentTimeMillis(); for (long i = 0; i < RUNS; i++) { stat += addStatic(1, i); } end = System.currentTimeMillis(); System.out.println("Static: " + (end - start) + " ms"); long dyna = 0; start = System.currentTimeMillis(); for (long i = 0; i < RUNS; i++) { dyna += addDynamic(1, i); } end = System.currentTimeMillis(); System.out.println("Dynamic: " + (end - start) + " ms"); // If you really want to have values in "r" then... r += stat + dyna; // Do something with r to keep compiler happy System.out.println(r); } } private void run() { long r = 0; long start, end; for (int loop = 0; loop < 10; loop++) { // Benchmark start = System.currentTimeMillis(); for (long i = 0; i < RUNS; i++) { r += addStatic(1, i); } end = System.currentTimeMillis(); System.out.println("Static: " + (end - start) + " ms"); start = System.currentTimeMillis(); for (long i = 0; i < RUNS; i++) { r += addDynamic(1, i); } end = System.currentTimeMillis(); System.out.println("Dynamic: " + (end - start) + " ms"); // If you really want to have values in "r" then... // Do something with r to keep compiler happy System.out.println(r); } } private long addDynamic(long a, long b) { return a + b; } private static long addStatic(long a, long b) { return a + b; } } 

结果是:

 Test run 1 ================================= Static: 582 ms Dynamic: 579 ms 1000000001000000000 Static: 2065 ms Dynamic: 2352 ms 2000000002000000000 Static: 2084 ms Dynamic: 2345 ms 3000000003000000000 Static: 2095 ms Dynamic: 2347 ms 4000000004000000000 Static: 2102 ms Dynamic: 2338 ms 5000000005000000000 Static: 2073 ms Dynamic: 2345 ms 6000000006000000000 Static: 2074 ms Dynamic: 2341 ms 7000000007000000000 Static: 2102 ms Dynamic: 2355 ms 8000000008000000000 Static: 2062 ms Dynamic: 2354 ms 9000000009000000000 Static: 2057 ms Dynamic: 2350 ms -8446744063709551616 Test run 2 ================================= Static: 584 ms Dynamic: 582 ms 1000000001000000000 Static: 587 ms Dynamic: 577 ms 2000000002000000000 Static: 577 ms Dynamic: 579 ms 3000000003000000000 Static: 577 ms Dynamic: 577 ms 4000000004000000000 Static: 578 ms Dynamic: 579 ms 5000000005000000000 Static: 578 ms Dynamic: 580 ms 6000000006000000000 Static: 577 ms Dynamic: 579 ms 7000000007000000000 Static: 578 ms Dynamic: 577 ms 8000000008000000000 Static: 580 ms Dynamic: 578 ms 9000000009000000000 Static: 576 ms Dynamic: 579 ms -8446744063709551616 

至于为什么直接添加到r ,我没有任何线索。 也许有人可以提供更多的见解,为什么在loop block访问r会使事情变慢。

只需一个注意事项。 如果我long使用ri我只能观察到这种奇怪的行为。 如果我将它们转换为int那么我得到这些时间:

 Static: 352 ms Dynamic: 353 ms Static: 348 ms Dynamic: 349 ms Static: 349 ms Dynamic: 348 ms Static: 349 ms Dynamic: 344 ms 

因此,一个可能的结论是在这些情况下避免long 。 至少在Linux / Amd64 Java7中,性能很重要。