在for循环中重新创建ArrayList的最快方法

在Java中，使用以下函数为巨大的矩阵X打印其列不同的元素：

// create the list of distinct values List values = new ArrayList(); // X is n * m int[][] matrix for (int j = 0, x; j < m; j++) { values.clear(); for (int i = 0; i < n; i++) { x = X[i][j]; if (values.contains(x)) continue; System.out.println(x); values.add(x); } }

首先，我按列（索引j）和内部行（索引i）进行迭代。

对于不同的矩阵，此函数将被调用数百万次，因此应优化代码以满足性能要求。我想知道值数组。使用values = new ArrayList();会更快吗values = new ArrayList(); 或者values = null而不是values.clear() ？

更有效的是使用Set而不是列表，例如HashSet实现。 contains方法将使用列表在O（1）而不是O（n）中运行。您只需调用add方法即可保存一个调用。

至于你的具体问题，我只想在每个循环中创建一个新的Set – 对象创建并不昂贵，可能不如清除集合（由底部的基准确认 – 请参阅编辑2中最有效的版本）：

 for (int j = 0, x; j < m; j++) { Set values = new HashSet(); for (int i = 0; i < n; i++) { x = X[i][j]; if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before System.out.println(x); } }

但是，知道哪个更快（新对象与清除）的唯一方法是分析代码的这一部分并检查两个版本的性能。

编辑

我运行了一个快速基准测试，清晰版本似乎比在每个循环中创建一个集合要快一些（约20％）。您仍应检查数据集/用例哪个更好。使用我的数据集加快代码：

 Set values = new HashSet(); for (int j = 0, x; j < m; j++) { for (int i = 0; i < n; i++) { x = X[i][j]; if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before System.out.println(x); } values.clear(); }

编辑2

通过在每个循环中创建一个正确大小的新集合，可以获得更快的代码版本：

 for (int j = 0, x; j < m; j++) { Set values = new HashSet(n, 1); //right size from the beginning for (int i = 0; i < n; i++) { x = X[i][j]; if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before System.out.println(x); } }

结果摘要

在JVM热身+ JIT之后：

 Set values = new HashSet(n, 1); =====> 280 ms values.clear(); =====> 380 ms Set values = new HashSet(); =====> 450 ms

（自2015-09-04起更正，包括可复制的基准和结论）

~~当然， values.clear()比创建一个新对象更快（只是将最后一个项目索引设置为零）。~~ 几乎可以肯定， values.clear()会比创建一个新对象更快。对于最初使用的ArrayList ，只需将插入索引设置为零即可。
正如我在PD＃1中评论的那样，对于这种情况， BitSet可能是最快的方法 ，其中元素是整数（假设值的范围不是太宽。但是这可能对任何其他类型的元素都没有用。
也 ~~如上所述~~ 因为我与 Assylias的答案 相吻合 ， HashSet是一个比ArrayList更好的选择 （ 假设hashCode()提供了一个不错的分布，它不会导致我们的O（N）性能 ）。

在这个HashSet案例中，直觉还会建议clear() （基本上将HashSet#table数组的“鸽子洞”设置为null）比建立一个全新的集合（在任何情况下都需要相同的表）更快初始化/重新发布为零）。但在这种特殊情况下，情况恰好相反。 Assylias发表了他的研究结果。不幸的是，我必须自己编写我的bechmark代码，以便了解这是如何发生的。我在PD＃3中讨论了这个问题

无论如何，关于这一点的主要问题是，因为为每次迭代创建一个全新的HashSet并没有实质性的惩罚，所以这样做是有道理的（因为它更简单），除非我们必须更加关注性能和资源。
关于性能的另一个问题是I / O. 示例代码中的System.out.println()可能会为每一行执行flush() ，这会自动将瓶颈转移到console / stdout中 。解决方法可能是添加到StringBuffer 。除非有一个读者进程急切等待该输出，否则将写入延迟到循环结束可能是有意义的。

这将是我的尝试：

 Set values = new HashSet(); // (PD 1) Or new BitSet(max_x - min_x + 1); // (PD 2) Or new HashSet((int)Math.ceil(n/0.75)); StringBuffer sb = new StringBuffer(); // appends row values for printing together. for (int j = 0, x; j < m; j++) { values.clear(); sb.setLength(0); for (int i = 0; i < n; i++) { x = X[i][j]; if (! values.contains(x)){ sb.append(x+"\n"); values.add(x); } } System.out.print(sb); }

PD1。另外，如果您可以考虑使用BitSet 。它具有O（1）访问性能（即使在最坏的情况下，因为没有冲突 ）。它最适合于范围从0开始的整数（否则可能需要翻译）以及在可能的分布范围内足够密集的实际值的总体。

例如，如果检查Unicode代码点的出现，则需要139,264字节长的数组（17（平面）* 2 ¹⁶ （代码点/平面）/ 8），其中可能只使用100个字符长的40个不同字符短信，可能是一种矫枉过正。但是如果你被限制在ISO-Latin-1中的256个可能的值。（8字节bitset），这实际上是一个完美的契合。

PD2。另外，正如Assylias所说，为HashSet设置初始大小可能会有所帮助。作为threshold = (int)(capacity * loadFactor) ，您可能需要initialCapacity=(int)Math.ceil(n/0.75)以确保没有resize。 这个问题属于Assyliaspost（我没有自己使用它），并且以这种方式讨论是不合适的

PD3（2015年9月：3年后）我碰巧重新审视了这个问题，我对阿西拉斯的结果非常感兴趣，我编写了自己的微基准（我包括了，所以任何人都可以复制）。这些是我的结论：

我提出的BitSet （注意：不适合非整数和非常稀疏的分布）明显优于HashSet的所有风格（密集分布的分布大约快4倍）
对于大小为1000的高填充集的测试显示出有利于创建新集合（7.7“vs 9.8”）的轻微优势。但是， HashSet#clear() vs new HashSet()的“干运行”会产生相反的结果（9.5“vs 7.5”）。我的猜测是因为重置HashSet.table时设置了无效缓存的HashSet.table （设置null为非null ）。
另外， 预先知道最佳尺寸 （这可能并不总是可行）是一个很大的优势。 HashSet.clear()方法更具适应性，并且可以容忍更好地低估大小。过高估计并不意味着太大的差异，但如果记忆是一个问题，可能不是一个好的策略。
结果清楚地表明，现在创建一个对象和分配内存不是一个大问题 （参见Programmers.SE ）。 但是，重用对象应该仍然是一个需要考虑的选项 。例如，在drmirror中看到，即使在JDK 1.6演化之后，重用实例（CharBuffer）也会使性能提高一倍。
另外我想知道Assylias使用loadFactor==1.0f的影响是什么（HashSet不会resize直到size > table.length*loadFactor ，这与我建议的不同，但如果没有碰撞则是完美的）。大约loadFactor==0.75f需要1.33倍的表空间，以避免25％的冲突。我的测试没有显示此方案的默认设置的任何优势。

这是我用于测试的课程。对不起，如果它可能在某些方面超调而在其他方面缺乏（没有热身，只需执行足够长的时间以便实施有机会扼杀他自己的垃圾）。

 /** * Messing around this StackOverflow question: https://stackoverflow.com/questions/11740013/fastest-way-to-recreate-the-arraylist-in-a-for-loop/ . * Quite surprisingly new HashSet() (which should imply actual memory initialization) is faster than HashSet.clear() in the given scenario. * Primary goal is to test this phenomenon (new vs clear) under different scenarios. * Secondarily a bit about the BitSet and the HashSet loadFactor is tested. * @author Javier */ public class TestSetClear2 { public static interface MicroBenchmark { public String getName(); /** * * @param dataSet Data set to insert in the collection * @param initialSize Initial size for the collection. Can try to be optimal or try to fool. * @param iterations Number of times to go through the dataSet over and over */ public void run(int[] dataSet, int initialSize, int iterations); } /** Bad initial case. Based in question code */ public static class MBList implements MicroBenchmark { @Override public String getName() { return "ArrayList.clear()"; } @Override public void run(int[] data, int initialSize, int n) { // Not taking initial size into account may result in a resizing penalty in the first iteration // But will have an adequate size in following iterations, and wont be fooled by bad estimations. List values = new ArrayList(); for (int iter = 0; iter < n; iter++) { values.clear(); for (int i = 0; i < data.length; i++) { int x = data[i]; if (values.contains(x)) continue; values.add(x); } } } } /** new HashSet(N,1) for every iteration. Reported as best by assylias. */ public static class MBNewHashSetN1 implements MicroBenchmark { @Override public String getName() { return "new HashSet(N,1)"; } @Override public void run(int[] data, int initialSize, int n) { for (int iter = 0; iter < n; iter++) { Set values = new HashSet<>(initialSize, 1.0f); // 1.0 loadfactor optimal if no collisions. for (int i = 0; i < data.length; i++) { int x = data[i]; if (values.contains(x)) continue; values.add(x); } } } } // No need to implement raw new HashSet() (reported as worse). Will be enough fooling to initialize to 16 so it succumbs to resizing. /** HashsetClear for every iteration. Attempted by Assylias and Javier. Clear() does not perform as well as expected under basic tests. */ public static class MBHashSetClear implements MicroBenchmark { private float loadFactor; // Allow loadFactor to check how much 1.0 factor affects if there are collisions. private String name; public MBHashSetClear(float loadFactor) { this.loadFactor = loadFactor; name = String.format(Locale.ENGLISH, "HashSet(N,%f).clear()", loadFactor); } @Override public String getName() { return name; } @Override public void run(int[] data, int initialSize, int n) { HashSet values = new HashSet<>((int)Math.ceil(initialSize/loadFactor), loadFactor);// Just the size for loadfactor so it wont resize. for (int iter = 0; iter < n; iter++) { values.clear(); for (int i = 0; i < data.length; i++) { int x = data[i]; if (values.contains(x)) continue; values.add(x); } } } } /** Javier BitSet. Might clearly outperform HashSet, but only on the very specific constraints of the test (non negative integers, not hugely big). */ public static class MBBitSet implements MicroBenchmark { @Override public String getName() { return "BitSet.clear()"; } @Override public void run(int[] data, int distributionSize, int n) { BitSet values = new BitSet(distributionSize); for (int iter = 0; iter < n; iter++) { values.clear(); for (int i = 0; i < data.length; i++) { int x = data[i]; if (values.get(x)) continue; values.set(x); } } } } public static void main(String[] args) { final MicroBenchmark mbNew = new MBNewHashSetN1(); // Create with same loadFactor as MBNewHashSetN1. So we compare apples with apples (same size of handled table, same collisions). final MicroBenchmark mbClear = new MBHashSetClear(1.0f); final MicroBenchmark mbClear075 = new MBHashSetClear(0.75f); final MicroBenchmark mbBitset = new MBBitSet(); final MicroBenchmark mbList = new MBList(); // Will have a taste of O(N) with a not too bit dataset. // warmup. trigger the cpu high performance mode? Fill the heap with garbage? //mbNew.run(dataSetE3xE3, 1000, (int)1e5); // Using new HS might give a bit advantage? int timePerTest = 10000; int distributionSize, initialCapacity, datasetLength; // 1000 long and values 0..999 (1e3 x 1e3). Optimal initial capacity distributionSize = 1000; datasetLength = 1000; initialCapacity = 1000; final int[] dataSetE3xE3 = generateRandomSet(1000,1000); runBenchmark("E3xE3", dataSetE3xE3, distributionSize, timePerTest, initialCapacity, mbNew, mbClear, mbClear075, mbBitset); // repeat with underestimated initial size. Will incur in resizing penalty initialCapacity = 16; // Default initial runBenchmark("E3xE3+underSize", dataSetE3xE3, distributionSize, timePerTest, initialCapacity, mbNew, mbClear, mbBitset); // repeat with overestimated initial size. larger garbage and clearing. initialCapacity = 100000; // oversized will force to handle large tables filled with 0 / null. runBenchmark("E3xE3+overSize", dataSetE3xE3, distributionSize, timePerTest, initialCapacity, mbNew, mbClear, mbBitset); // Dry run (not rum). what if we focus on the new and clear operations. Just 1 item so clear() is forced to traverse the table. datasetLength = 1; distributionSize = 1000; initialCapacity = 1000; runBenchmark("E3xE3-DryRun", generateRandomSet(datasetLength, distributionSize), distributionSize, timePerTest, initialCapacity, mbNew, mbClear); // check for * 100 and / 100 sizes. distributionSize = datasetLength = initialCapacity = 10; runBenchmark("E1xE1", generateRandomSet(datasetLength, distributionSize), distributionSize, timePerTest, initialCapacity, mbNew, mbClear, mbList); distributionSize = datasetLength = initialCapacity = 100000; runBenchmark("E5xE5", generateRandomSet(datasetLength, distributionSize), distributionSize, timePerTest, initialCapacity, mbNew, mbClear); // Concentrated distributions might behave as with oversized? datasetLength=10000; distributionSize=10; initialCapacity=Math.min(datasetLength, distributionSize); runBenchmark("E4xE1", generateRandomSet(datasetLength, distributionSize), distributionSize, timePerTest, initialCapacity, mbNew, mbClear); // Sparse distributions might allow mild collision. Also adverse for BitSet. // TODO Generate a greater/known amount of collisions datasetLength=10000; distributionSize=(int)1e6; initialCapacity=Math.min(datasetLength, distributionSize); runBenchmark("E4xE6", generateRandomSet(datasetLength, distributionSize), distributionSize, timePerTest, initialCapacity, mbNew, mbClear, mbClear075); } private static void runBenchmark(String testName, int[] dataSet, int distributionSize, int timePerTest , int initialCapacity, MicroBenchmark ... testees /* not testes */) { // How many iterations? Will use first testee to callibrate. MicroBenchmark curTest = testees[0]; long t0 = System.nanoTime(); long ellapsed = 0L; final long minToCallibrate = (long)0.5e9; // half second int iterations = 1; while (ellapsed < minToCallibrate) { curTest.run(dataSet, initialCapacity, iterations); iterations *= 2; // same as <<= 1 ellapsed = System.nanoTime() - t0; } // calculation is not laser-sharp precise (actually executed iterations -1, and some extra initializations). final int nIterations = (int) ((double)iterations * timePerTest * 1e6 /* nanos/millis */ / ellapsed); // Do actual benchmark System.out.printf(Locale.ENGLISH, "dataset:{name=%s,length:%d,distrib:%d,capacity0:%d,iterations:%d}\n", testName, dataSet.length, distributionSize, initialCapacity, nIterations); for (MicroBenchmark testee : testees) { t0 = System.nanoTime(); testee.run(dataSet, initialCapacity, nIterations); ellapsed = System.nanoTime() - t0; System.out.printf(Locale.ENGLISH, "%s : %5.3f\n", testee.getName(), ellapsed/1e9 ); } } private static int[] generateRandomSet(int lengthOfSet, int distributionSize) { Random r = new Random(); int[] result = new int[lengthOfSet]; for (int i = 0; i < lengthOfSet; i++) { result[i] = r.nextInt(distributionSize); } return result; } }

这是我的结果（使用JDK 1.8.0_31 - 64位 - Windows 7）

 dataset:{name=E3xE3,length:1000,distrib:1000,capacity0:1000,iterations:514241} new HashSet(N,1) : 7.688 HashSet(N,1.000000).clear() : 9.796 HashSet(N,0.750000).clear() : 9.923 BitSet.clear() : 1.990 dataset:{name=E3xE3+underSize,length:1000,distrib:1000,capacity0:16,iterations:420572} new HashSet(N,1) : 9.735 HashSet(N,1.000000).clear() : 6.637 BitSet.clear() : 1.611 dataset:{name=E3xE3+overSize,length:1000,distrib:1000,capacity0:100000,iterations:143032} new HashSet(N,1) : 9.948 HashSet(N,1.000000).clear() : 10.377 BitSet.clear() : 0.447 dataset:{name=E3xE3-DryRun,length:1,distrib:1000,capacity0:1000,iterations:18511486} new HashSet(N,1) : 9.583 HashSet(N,1.000000).clear() : 7.523 dataset:{name=E1xE1,length:10,distrib:10,capacity0:10,iterations:76177852} new HashSet(N,1) : 9.988 HashSet(N,1.000000).clear() : 10.521 ArrayList.clear() : 7.915 dataset:{name=E5xE5,length:100000,distrib:100000,capacity0:100000,iterations:2892} new HashSet(N,1) : 9.764 HashSet(N,1.000000).clear() : 9.615 dataset:{name=E4xE1,length:10000,distrib:10,capacity0:10,iterations:170386} new HashSet(N,1) : 9.843 HashSet(N,1.000000).clear() : 9.708 dataset:{name=E4xE6,length:10000,distrib:1000000,capacity0:10000,iterations:36497} new HashSet(N,1) : 9.686 HashSet(N,1.000000).clear() : 10.079 HashSet(N,0.750000).clear() : 10.008

你可以使用ArrayList.clear(); 这样可以在内存中保存地址ArrayList，而不会对此地址产生垃圾收集器影响。

你应该使用.clear（）方法，使用它你不需要一次又一次地为你的变量分配内存。

在for循环中重新创建ArrayList的最快方法

如何比较ArrayList中的Objects属性？

比较2个ArrayLists的简单方法

java ArrayList中最常见的n – 单词

最好的使用方法包含在Java中的ArrayList中？

重复的对象被添加到列表中

ArrayList的内存分配如何工作？

为什么ArrayList的最大数组大小是Integer.MAX_VALUE – 8？

在arraylist中保存（永久）更改？

使用分隔每个条目的线条在toString中打印Arraylist

如何捕获回车键，使用扫描仪作为控制台输入？