反序列化后Hashmap变慢 – 为什么？

我有一个非常大的Hashmap（~250MB）。创建它大约需要50-55秒，所以我决定将其序列化并将其保存到文件中。从文件中读取大约需要16-17秒。

唯一的问题是查找似乎这样慢。我一直认为hashmap是从文件读入内存的，所以与我自己创建hashmap的情况相比，性能应该是相同的，对吧？这是我用来将hashmap读入文件的代码：

File file = new File("omaha.ser"); FileInputStream f = new FileInputStream(file); ObjectInputStream s = new ObjectInputStream(new BufferedInputStream(f)); omahaMap = (HashMap) s.readObject(); s.close();

当我自己创建hashmap时，3亿次查找大约需要3.1秒，当我从文件中读取相同的hashmap时大约需要8.5秒。有人知道为什么吗？我忽略了一些明显的东西吗

编辑：

我通过使用System.nanotime（）获取时间来“测量”时间，因此没有使用适当的基准测试方法。这是代码：

 public class HandEvaluationTest { public static void Test() { HandEvaluation.populate5Card(); HandEvaluation.populate9CardOmaha(); Card[] player1cards = {new Card("4s"), new Card("2s"), new Card("8h"), new Card("4d")}; Card[] player2cards = {new Card("As"), new Card("9s"), new Card("6c"), new Card("2h")}; Card[] player3cards = {new Card("9h"), new Card("7h"), new Card("Kc"), new Card("Kh")}; Card[] table = {new Card("2d"), new Card("2c"), new Card("3c"), new Card("5c"), new Card("4h")}; int j=0, k=0, l=0; long startTime = System.nanoTime(); for(int p=0; p<100000000; p++) { j = HandEvaluation.handEval9Hash(player1cards, table); k = HandEvaluation.handEval9Hash(player2cards, table); l = HandEvaluation.handEval9Hash(player3cards, table); } long estimatedTime = System.nanoTime() - startTime; System.out.println("Time needed: " + estimatedTime*Math.pow(10,-6) + "ms"); System.out.println("Handstrength Player 1: " + j); System.out.println("Handstrength Player 2: " + k); System.out.println("Handstrength Player 3: " + l); } }

大型hashmap工作在HandEvaluation.populate9CardOmaha（）中完成。 5张卡很小。大的代码：

  public static void populate9CardOmaha() { //Check if the hashmap is already there- then just read it and exit File hashmap = new File("omaha.ser"); if(hashmap.exists()) { try { File file = new File("omaha.ser"); FileInputStream f = new FileInputStream(file); ObjectInputStream s = new ObjectInputStream(new BufferedInputStream(f)); omahaMap = (HashMap) s.readObject(); s.close(); } catch(IOException ioex) {ioex.printStackTrace();} catch(ClassNotFoundException cnfex) { System.out.println("Class not found"); cnfex.printStackTrace(); return; } return; } // if it's not there, populate it yourself ... Code for populating hashmap ... // and then save it to file ( try { File file = new File("omaha.ser"); FileOutputStream f = new FileOutputStream(file); ObjectOutputStream s = new ObjectOutputStream(new BufferedOutputStream(f)); s.writeObject(omahaMap); s.close(); } catch(IOException ioex) {ioex.printStackTrace();} }

当我自己填充它（=文件不在这里）时，HandEvaluationTest.Test（）中的查找大约需要8秒而不是3.也许这只是我非常天真的测量时间的方法？

这个问题很有意思，所以我编写了自己的测试用例来validation它。我发现实时查找的速度与从序列化文件加载的速度没有区别。任何有兴趣运行它的人都可以在post的末尾找到该程序。

使用JProfiler监控方法。
序列化文件与您的文件相当。 ~ 230 MB 。
内存中的查找成本为1210毫秒，没有任何序列化

在此处输入图像描述

序列化地图并再次读取它们后，查找的成本保持不变（差不多 – 1224毫秒）

在此处输入图像描述

调整分析器以在两种情况下增加最小的开销。
这是在Java(TM) SE Runtime Environment (build 1.6.0_25-b06) / 4 CPUs running at 1.7 Ghz / 4GB Ram 800 Mhz

测量很棘手。我自己注意到你描述的8 second查询时间，但猜测发生了什么我注意到了什么。

GC活动

在此处输入图像描述

您的测量结果也可能会提高。如果单独确定Map.get()的测量值，您将看到结果具有可比性。

 public class GenericTest { public static void main(String... args) { // Call the methods as you please for a live Vs ser <-> de_ser run } private static Map generateHashMap() { Map map = new HashMap(); final Random random = new Random(); for(int counter = 0 ; counter < 10000000 ; counter++) { final int value = random.nextInt(); final long key = random.nextLong(); map.put(key, value); } return map; } private static void lookupItems(int n, Map map) { final Random random = new Random(); for(int counter = 0 ; counter < n ; counter++) { final long key = random.nextLong(); final Integer value = map.get(key); } } private static void serialize(Map map) { try { File file = new File("temp/omaha.ser"); FileOutputStream f = new FileOutputStream(file); ObjectOutputStream s = new ObjectOutputStream(new BufferedOutputStream(f)); s.writeObject(map); s.close(); } catch (Exception e) { e.printStackTrace(); } } private static Map deserialize() { try { File file = new File("temp/omaha.ser"); FileInputStream f = new FileInputStream(file); ObjectInputStream s = new ObjectInputStream(new BufferedInputStream(f)); HashMap map = (HashMap) s.readObject(); s.close(); return map; } catch (Exception e) { throw new RuntimeException(e); } } }

当我自己创建hashmap时，3亿次查找大约需要3.1秒，当我从文件中读取相同的hashmap时大约需要8.5秒。有人知道为什么吗？我忽略了一些明显的东西吗

一个可能的原因是重建的HashMap可能不具有与原始HashMap相同的容量（桶的数量），这可能增加散列冲突的频率或（如果大小增加）减少主存储器访问的局部性（导致更多缓存未命中）。要进行validation，请使用调试器检查重建之前和之后map.table的长度。如果确实如此，请尝试使用适当的loadFactor将数据复制到新的HashMap中。

至于为什么序列化不能保持容量：HashMap通过提供writeObject和readObject方法自定义其序列化格式（为每个空表元素序列化null没有意义），并忽略它在输入流中找到的容量：

 /** * Reconstitute the {@code HashMap} instance from a stream (ie, * deserialize it). */ private void readObject(java.io.ObjectInputStream s) throws IOException, ClassNotFoundException { // Read in the threshold (ignored), loadfactor, and any hidden stuff s.defaultReadObject(); reinitialize(); if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new InvalidObjectException("Illegal load factor: " + loadFactor); s.readInt(); // Read and ignore number of buckets int mappings = s.readInt(); // Read number of mappings (size) if (mappings < 0) throw new InvalidObjectException("Illegal mappings count: " + mappings); else if (mappings > 0) { // (if zero, use defaults) // Size the table using given load factor only if within // range of 0.25...4.0 float lf = Math.min(Math.max(0.25f, loadFactor), 4.0f); float fc = (float)mappings / lf + 1.0f; int cap = ((fc < DEFAULT_INITIAL_CAPACITY) ? DEFAULT_INITIAL_CAPACITY : (fc >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : tableSizeFor((int)fc)); float ft = (float)cap * lf; threshold = ((cap < MAXIMUM_CAPACITY && ft < MAXIMUM_CAPACITY) ? (int)ft : Integer.MAX_VALUE); @SuppressWarnings({"rawtypes","unchecked"}) Node[] tab = (Node[])new Node[cap]; table = tab; // Read the keys and values, and put the mappings in the HashMap for (int i = 0; i < mappings; i++) { @SuppressWarnings("unchecked") K key = (K) s.readObject(); @SuppressWarnings("unchecked") V value = (V) s.readObject(); putVal(hash(key), key, value, false, false); } } }

我怀疑它忽略了阻止拒绝服务攻击的数量的桶，攻击者制作序列化流，并提供不切实际的高（或低）数量的桶，这将导致OutOfMemoryError（或由于哈希冲突导致过多的CPU负载）），对于接受来自不受信任来源的序列化数据的任何应用程序，这是一种廉价的拒绝服务攻击方式（ CVE-2012-2739描述了这样的问题）。

反序列化后Hashmap变慢 – 为什么？

GC活动

HashMap与LinkedHashMap在值迭代中的性能（）

为什么ArrayList和HashMap等类不同步？

HashMap与Switch语句性能

动态制作对象的副本？

HashMap碰撞是否会导致resize？

使用Jackson将JSON字符串转换为Java映射时出错

为什么我的HashMap允许重复键？

如何使HashMap与数组一起使用？

如果密钥以与equals不一致的方式实现Comparable，那么Java 8的HashMap是否会出错？这是一个错误吗？

嵌套映射或java中的组合键