在java中苦苦挣扎 – 将long分解为long 的bitmasks

我正在将一个长的分解为长的单个长的[]

public static int decompose(long[] buffer, long base) { int count = Long.bitCount(base); for (int i=0; i<count; i++) { base ^= (buffer[i] = Long.lowestOneBit(base)); } return count; } 

但我觉得可能有更快的方法来做到这一点,因为看起来有一些重复的步骤。 例如,计算这些位应该已经非常接近于获得填充结果所需的所有信息。

有什么建议么? 我熟悉过早的优化口头禅,因此为什么我不会在我的时间推进我的解决方案,但也许其他人之前已经看过这个或沉迷于优化…编辑:请通过测试运行任何建议马克马克提供如下 。 我的第一次暗示实际上是在坚持,我有点惊讶。

测试代码,它位于JUnit方法中:

 Random rng = new Random(); long size = 0; long[] hold = new long[Long.SIZE]; System.out.println("init:"+Long.toBinaryString(BitTwiddling.bitmask(rng.nextInt(Long.SIZE)))); //initialize BitTwiddling internals long start = System.currentTimeMillis(); for (int i=0; i<compareIterations; i++) size+= BitTwiddling.decompose(hold,rng.nextInt(Integer.MAX_VALUE)); long delta = System.currentTimeMillis() - start; double base = (double)size/delta; size = 0; start = System.currentTimeMillis(); for (int i=0; i<compareIterations; i++) size += BitTwiddling.decomposeAlt(hold, rng.nextInt(Integer.MAX_VALUE)); long delta2 = System.currentTimeMillis() - start; double base2 = (double)size/delta2; 

然后比较base和base2

这是我目前正在使用的线束。 我一直在问题(方法1)中获得代码以显着加快执行速度。 目前Mark Peters的答案通常是最快的,但是在设置-server标志的情况下,Carl的速度更快。 此外,迈克拉在服务器模式下的竞争力要大得多(尽管仍比卡尔/马克的速度慢)。

 import java.util.Random; public abstract class Harness { public static void main(String[] args) { while (true) { long[] randomData = new long[4096]; Random r = new Random(); for (int i = 0; i < randomData.length; i++) { randomData[i] = r.nextLong(); } long[] buffer = new long[64]; Harness[] versions = new Harness[] { new Control(randomData, buffer), new Carl(randomData, buffer), new MarkPeters(randomData, buffer), new MarkPetersServer64Bit(randomData, buffer), // new Rsp1(randomData, buffer), new Rsp2(randomData, buffer), new Mikera(randomData, buffer) }; for (Harness v : versions) { v.doTest(); } } } private final long[] buffer; private final long[] randomData; protected Harness(long[] randomData, long[] buffer) { this.randomData = randomData; this.buffer = buffer; } public void doTest() { long start = System.nanoTime(); long count = 0; for (int times = 0; times < 1000; times++) { for (int i = 0; i < randomData.length; i++) { count = decompose(buffer, randomData[i]); } } long end = System.nanoTime(); //verify decomposition of last item long l = 0; for ( int i = 0; i < count; i++ ) { l |= buffer[i]; } System.out.println(getClass().getSimpleName() + " took " + (end - start) / 1000000.0 + " ms - last base: " + l); } protected abstract int decompose(long[] buffer, long base); private static class Control extends Harness { protected Control(long[] randomData, long[] buffer) { super(randomData, buffer); } @Override protected int decompose(long[] buffer, long base) { return 0; } } private static class Carl extends Harness { protected Carl(long[] randomData, long[] buffer) { super(randomData, buffer); } @Override protected int decompose(long[] buffer, long base) { final int count = Long.bitCount(base); for (int i = 0; i < count; i++) { base ^= (buffer[i] = Long.lowestOneBit(base)); } return count; } } private static class Mikera extends Harness { protected Mikera(long[] randomData, long[] buffer) { super(randomData, buffer); } @Override protected int decompose(long[] buffer, long base) { int count = 0; while (base != 0) { long mask = base & (-base); base &= ~mask; buffer[count++] = mask; } return count; } } private static class Rsp1 extends Harness { protected Rsp1(long[] randomData, long[] buffer) { super(randomData, buffer); } @Override protected int decompose(long[] buffer, long base) { int count = 0; if (0 != (base & 1)) { buffer[count++] = 1; } base >>>= 1; for (long bit = 1; 0 < bit && bit <= base; ) { if (0 < (base & bit)) { buffer[count++] = (bit <<= 1); } } return count; } } private static class Rsp2 extends Harness { protected Rsp2(long[] randomData, long[] buffer) { super(randomData, buffer); } @Override protected int decompose(long[] buffer, long base) { int count = 0; for (long bit = 1; 0 != base; bit <<= 1, base >>>= 1) { if (0 < (base & 1)) { buffer[count++] = bit; } } return count; } } private static class MarkPeters extends Harness { protected MarkPeters(long[] randomData, long[] buffer) { super(randomData, buffer); } @Override protected int decompose(long[] buffer, long base) { int count = Long.bitCount(base); for (int i = 0; i < count; i++) { base -= (buffer[i] = Long.lowestOneBit(base)); } return count; } } private static class MarkPetersServer64Bit extends Harness { protected MarkPetersServer64Bit(long[] randomData, long[] buffer) { super(randomData, buffer); } @Override protected int decompose(long[] buffer, long base) { int count = 0; while ( 0 != (base ^= (buffer[count++] = Long.lowestOneBit(base)))); return count; } } } 

样本输出

哪种方法最好取决于具体情况。

非服务器,32位JVM

 Control took 41.175272 ms - last base: 0 Carl took 691.966919 ms - last base: 5852835112840111303 MarkPeters took 642.230253 ms - last base: 5852835112840111303 MarkPetersServer64Bit took 742.594626 ms - last base: 5852835112840111303 Rsp2 took 3886.203787 ms - last base: 5852835112840111303 Mikera took 1044.451494 ms - last base: 5852835112840111303 

获奖者: MarkPeters

服务器,32位JVM

 Control took 2.354383 ms - last base: 0 Carl took 508.687401 ms - last base: 338317437500027646 MarkPeters took 521.831297 ms - last base: 338317437500027646 MarkPetersServer64Bit took 727.052206 ms - last base: 338317437500027646 Rsp2 took 3811.75662 ms - last base: 338317437500027646 Mikera took 665.252599 ms - last base: 338317437500027646 

获奖者:卡尔

服务器(隐式或显式),64位JVM

 Control took 0.007186 ms - last base: 0 Carl took 543.701859 ms - last base: -8898262206218882664 MarkPeters took 439.706079 ms - last base: -8898262206218882664 MarkPetersServer64Bit took 391.831055 ms - last base: -8898262206218882664 Rsp2 took 1861.40449 ms - last base: -8898262206218882664 Mikera took 435.964319 ms - last base: -8898262206218882664 

获奖者: MarkPetersServer64Bit

注意:据我所知,没有类似的64位JVM可以在非服务器模式下运行。 看到这里 :

在64位Java中是否提供-client和-server VM模式?

目前只有Java HotSpot Server VM支持64位操作,而-server选项是隐式的,使用-d64。 这可能会在将来的版本中发生变化。

在尝试了很长时间并且难以击败原始代码之后,我找到了一个通过无耻地窃取您的代码并进行调整来持续击败您(在我的环境中)的代码。

  protected int decompose(long[] buffer, long base) { int count = Long.bitCount(base); for (int i = 0; i < count; i++) { base -= (buffer[i] = Long.lowestOneBit(base)); } return count; } 

唯一区别:使用减法而不是xor。 我认为没有任何边缘案件; 我当然没有找到任何。 不知道为什么那些没有被优化到同一个东西(假设没有边缘情况),但它确实使我的速度更快(再次,在我的机器上)。 我已将此版本添加到测试答案中。

编辑我想它无法优化的原因是因为编译器不知道操作数总是小于基数(因为我们从低位到高位)。

编辑#2设置-server标志后,Carl's再次比我的快一点。

编辑#3是的,当在64位JVM上运行时,这要快得多:

  protected int decompose(long[] buffer, long base) { int count = 0; while ( 0 != (base -= (buffer[count++] = Long.lowestOneBit(base)))); return count; } 

(或等效的xor版本),因为它可以使用本机处理器操作执行64位比较。

由于我喜欢优化*,这里有一个版本,您可以尝试:

 public static int decompose(long[] buffer, long base) { int count = 0; while (base != 0) { long mask = base & (-base); base &= ~mask; buffer[count++] = mask; } return count; } 

我做的主要事情是:

  • 内联最低一位计算以避免方法调用开销。 这可能是一个胜利(罕见但可能)的情况下编译器/ JVM不够聪明,无法为你做….
  • 传递一个long数组作为输入参数,以避免内存分配,并需要对位进行计数以确定数组大小。 函数现在返回找到的位掩码数。
  • 随着时间的推移将零点归零,这样你就可以尽早从循环中拯救出来

*因为它很有趣,无论它是否过早

我会使用一个掩码,使用shift运算符在每个循环中计算:

 for (int i= 0; i < result.length; i++) result[i]= base & (1< 

应该既清晰又快速。

这个版本怎么样?

 public static int decompose(long[] buffer, long base) { int count = 0; if (0 != (base & 1)) { buffer[count++] = 1; } base >>>= 1; for (long bit = 1; 0 < bit && bit <= base; ) { if (0 < (base & bit)) { buffer[count++] = (bit <<= 1); } } return count; } 

在这种情况下,base的符号位使结束条件复杂化,因此我们通过在开始时“调整”基值来防止这种情况发生。

或者,不是检查位掩码与基数,我们可以自行移位:

 public static int decompose(long[] buffer, long base) { int count = 0; for (long bit = 1; 0 != base; bit <<= 1, base >>>= 1) { if (0 < (base & 1)) { buffer[count++] = bit; } } return count; }