如何在Java中有效地存储小字节数组？

小字节数组是指字节数组，长度从10到30。

通过商店我的意思是将它们存储在RAM中 ，而不是序列化并持久保存到文件系统。

系统macOS 10.12.6，Oracle jdk1.8.0_141 64位，JVM args -Xmx1g

示例： new byte[200 * 1024 * 1024]预期行为是堆空间的≈200mb

 public static final int TARGET_SIZE = 200 * 1024 * 1024; public static void main(String[] args) throws InterruptedException { byte[] arr = new byte[TARGET_SIZE]; System.gc(); System.out.println("Array size: " + arr.length); System.out.println("HeapSize: " + Runtime.getRuntime().totalMemory()); Thread.sleep(60000); }

新字节的jvisualvm总堆使用堆[200 * 1024 * 1024] jvisualvm内存示例新字节[200 * 1024 * 1024]

但是对于较小的数组，数学并不那么简单

 public static final int TARGET_SIZE = 200 * 1024 * 1024; public static void main(String[] args) throws InterruptedException { final int oneArraySize = 20; final int numberOfArrays = TARGET_SIZE / oneArraySize; byte[][] arrays = new byte[numberOfArrays][]; for (int i = 0; i < numberOfArrays; i++) { arrays[i] = new byte[oneArraySize]; } System.gc(); System.out.println("Arrays size: " + arrays.length); System.out.println("HeapSize: " + Runtime.getRuntime().totalMemory()); Thread.sleep(60000); }

jvisualvm总堆使用堆为10 * 1024 * 1024的新字节[20] jvisualvm内存样本为10 * 1024 * 1024的新字节[20]

更糟糕的是

jvisualvm总堆使用堆为20 * 1024 * 1024的新字节[10] jvisualvm内存样本为20 * 1024 * 1024的新字节[10]

问题是

~~这个开销来自哪里？~~ 如何有效地存储和使用小字节数组（数据块）？

更新1

对于new byte[200*1024*1024][1]它吃 jvisualvm总堆使用堆为200 * 1024 * 1024的新字节[1] jvisualvm内存样本为200 * 1024 * 1024的新字节[1]

基本数学表示new byte[1] 权重为 24个字节。

更新2

根据Java中对象的内存消耗是多少？ Java中对象的最小大小为16个字节 。从我以前的“测量” 24字节-4字节为int长度-1我的数据的实际字节=一些~~其他垃圾~~填充的3字节。

Eugene的答案解释了为什么你观察到大量arrays的内存消耗增加的原因。标题中的问题“如何在Java中有效地存储小字节数组？” ，然后可以回答：完全没有。 ¹

但是，可能有办法实现您的目标。像往常一样，这里的“最佳”解决方案将取决于如何使用这些数据。一种非常实用的方法是：为您的数据结构定义一个interface 。

在最简单的情况下，这个界面可能就是这样

 interface ByteArray2D { int getNumRows(); int getNumColumns(); byte get(int r, int c); void set(int r, int c, byte b); }

提供“2D字节数组”的基本抽象。根据应用案例，在此提供其他方法可能是有益的。这里可以使用的模式通常与处理“2D矩阵”（通常是float值）的Matrix库相关，并且它们通常提供如下方法：

 interface Matrix { Vector getRow(int row); Vector getColumn(int column); ... }

但是，当这里的主要目的是处理一组byte[]数组时，访问每个数组 （即2D数组的每一行）的方法就足够了：

 ByteBuffer getRow(int row);

有了这个接口，创建不同的实现很简单。例如，您可以创建一个只在内部存储2D byte[][]数组的简单实现：

 class SimpleByteArray2D implements ByteArray2D { private final byte array[][]; ... }

或者，您可以在内部创建一个存储1D byte[]数组或类似ByteBuffer ：

 class CompactByteArray2D implements ByteArray2D { private final ByteBuffer buffer; ... }

然后，该实现只需要在调用访问2Darrays的某个行/列的方法之一时计算（1D）索引。

下面是一个MCVE ，它显示了这个接口和两个实现，接口的基本用法，以及使用JOL进行内存占用分析。

该程序的输出是：

 For 10 rows and 1000 columns: Total size for SimpleByteArray2D : 10240 Total size for CompactByteArray2D: 10088 For 100 rows and 100 columns: Total size for SimpleByteArray2D : 12440 Total size for CompactByteArray2D: 10088 For 1000 rows and 10 columns: Total size for SimpleByteArray2D : 36040 Total size for CompactByteArray2D: 10088

显示出来

基于简单的2D byte[][]数组的SimpleByteArray2D实现在行数增加时需要更多内存（即使数组的总大小保持不变）
CompactByteArray2D的内存消耗与数组的结构无关

整个计划：

 package stackoverflow; import java.nio.ByteBuffer; import org.openjdk.jol.info.GraphLayout; public class EfficientByteArrayStorage { public static void main(String[] args) { showExampleUsage(); anaylyzeMemoryFootprint(); } private static void anaylyzeMemoryFootprint() { testMemoryFootprint(10, 1000); testMemoryFootprint(100, 100); testMemoryFootprint(1000, 10); } private static void testMemoryFootprint(int rows, int cols) { System.out.println("For " + rows + " rows and " + cols + " columns:"); ByteArray2D b0 = new SimpleByteArray2D(rows, cols); GraphLayout g0 = GraphLayout.parseInstance(b0); System.out.println("Total size for SimpleByteArray2D : " + g0.totalSize()); //System.out.println(g0.toFootprint()); ByteArray2D b1 = new CompactByteArray2D(rows, cols); GraphLayout g1 = GraphLayout.parseInstance(b1); System.out.println("Total size for CompactByteArray2D: " + g1.totalSize()); //System.out.println(g1.toFootprint()); } // Shows an example of how to use the different implementations private static void showExampleUsage() { System.out.println("Using a SimpleByteArray2D"); ByteArray2D b0 = new SimpleByteArray2D(10, 10); exampleUsage(b0); System.out.println("Using a CompactByteArray2D"); ByteArray2D b1 = new CompactByteArray2D(10, 10); exampleUsage(b1); } private static void exampleUsage(ByteArray2D byteArray2D) { // Reading elements of the array System.out.println(byteArray2D.get(2, 4)); // Writing elements of the array byteArray2D.set(2, 4, (byte)123); System.out.println(byteArray2D.get(2, 4)); // Bulk access to rows ByteBuffer row = byteArray2D.getRow(2); for (int c = 0; c < row.capacity(); c++) { System.out.println(row.get(c)); } // (Commented out for this MCVE: Writing one row to a file) /*/ try (FileChannel fileChannel = new FileOutputStream(new File("example.dat")).getChannel()) { fileChannel.write(byteArray2D.getRow(2)); } catch (IOException e) { e.printStackTrace(); } //*/ } } interface ByteArray2D { int getNumRows(); int getNumColumns(); byte get(int r, int c); void set(int r, int c, byte b); // Bulk access to rows, for convenience and efficiency ByteBuffer getRow(int row); } class SimpleByteArray2D implements ByteArray2D { private final int rows; private final int cols; private final byte array[][]; public SimpleByteArray2D(int rows, int cols) { this.rows = rows; this.cols = cols; this.array = new byte[rows][cols]; } @Override public int getNumRows() { return rows; } @Override public int getNumColumns() { return cols; } @Override public byte get(int r, int c) { return array[r][c]; } @Override public void set(int r, int c, byte b) { array[r][c] = b; } @Override public ByteBuffer getRow(int row) { return ByteBuffer.wrap(array[row]); } } class CompactByteArray2D implements ByteArray2D { private final int rows; private final int cols; private final ByteBuffer buffer; public CompactByteArray2D(int rows, int cols) { this.rows = rows; this.cols = cols; this.buffer = ByteBuffer.allocate(rows * cols); } @Override public int getNumRows() { return rows; } @Override public int getNumColumns() { return cols; } @Override public byte get(int r, int c) { return buffer.get(r * cols + c); } @Override public void set(int r, int c, byte b) { buffer.put(r * cols + c, b); } @Override public ByteBuffer getRow(int row) { ByteBuffer r = buffer.slice(); r.position(row * cols); r.limit(row * cols + cols); return r.slice(); } }

同样，这主要是作为草图，以显示一种可能的方法。界面的细节将取决于预期的应用模式。

¹旁注：

内存开销的问题在其他语言中是类似的。例如，在C / C ++中，最接近“2D Java数组”的结构将是手动分配的指针数组：

 char** array; array = new (char*)[numRows]; array[0] = new char[numCols]; ...

在这种情况下，您还有一个与行数成比例的开销 - 即每行一个（通常是4个字节）指针。

好的，所以如果我理解正确（请问是否 – 会尝试回答），这里有几件事。首先，您需要正确的测量工具， JOL是我唯一信任的工具。

让我们开始吧：

 byte[] two = new byte[1]; System.out.println(GraphLayout.parseInstance(one).toFootprint());

这将显示24 bytes （ 12用于mark和class字 – 或者Object头+ 4个字节填充）， 1 byte用于实际值， 7 bytes for padding （内存是8个字节对齐）。

考虑到这一点，这应该是一个可预测的输出：

 byte[] eight = new byte[8]; System.out.println(GraphLayout.parseInstance(eight).toFootprint()); // 24 bytes byte[] nine = new byte[9]; System.out.println(GraphLayout.parseInstance(nine).toFootprint()); // 32 bytes

现在让我们转到二维数组：

 byte[][] ninenine = new byte[9][9]; System.out.println(GraphLayout.parseInstance(ninenine).toFootprint()); // 344 bytes System.out.println(ClassLayout.parseInstance(ninenine).toPrintable());

因为java没有真正的二维数组; 每个嵌套数组本身都是一个具有标题和内容的Object（ byte[] ）。因此，单个byte[9]具有32 bytes （ 12标头+ 4填充）和16 bytes用于内容（ 9 bytes用于实际内容+ 7 bytes填充）。

ninenine对象总共有56个字节： 16标题+ 36用于保存对9个对象的引用+ 4 bytes用于填充。

看看这里生产的样本：

 byte[][] left = new byte[10000][10]; System.out.println(GraphLayout.parseInstance(left).toFootprint()); // 360016 bytes byte[][] right = new byte[10][10000]; System.out.println(GraphLayout.parseInstance(right).toFootprint()); // 100216 bytes

这增加了260％ ; 所以只需改变工作方式就可以节省大量空间。

但更深层次的问题是Java中的每个Object都有这些头，还没有无头无限的对象。它们可能会出现并称为值类型。可能是在实现时 – 原语数组至少不会有这种开销。

如何在Java中有效地存储小字节数组？

但是对于较小的数组，数学并不那么简单

更糟糕的是

问题是

更新1

更新2

multithreadingJava应用程序中的SQLite

没有persistence.xml的JPA

如何在没有setter的情况下绑定请求参数？

在java8中，如何在lambdas foreach块中设置全局值？

用保存的Cookie填写表单

HTTP POST从javascript到java servlet

如何使用Java引用使用Java Unsafe释放内存？

如何根据单元格中的值为JTable的单个单元着色？

以自动方式查找泄漏内存的JUnit测试

使用Spring通过构造函数自动assembly集合