使用GSON JsonReader处理大字段的最佳方法

我得到了一个java.lang.OutOfMemoryError:Java堆空间,即使使用GSON Streaming也是如此。

{"result":"OK","base64":"JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC...."} 

base64最长可达200​​Mb。 GSON占用的内存要多得多,(3GB)当我尝试将base64存储在一个变量中时,我得到一个:

 Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535) at java.lang.StringBuilder.append(StringBuilder.java:204) at com.google.gson.stream.JsonReader.nextQuotedValue(JsonReader.java:1014) at com.google.gson.stream.JsonReader.nextString(JsonReader.java:815) 

处理这种领域的最佳方法是什么?

你得到OutOfMemoryError的原因是GSON nextString()返回一个字符串,该字符串在使用StringBuilder构建一个非常大的字符串时聚合。 当您遇到这样的问题时,您必须处理中间数据,因为没有其他选择。 遗憾的是,GSON不允许您以任何方式处理大量文字。

不确定是否可以更改响应有效负载,但如果不能,则可能需要实现自己的JSON阅读器,或者“破解”现有的JsonReader以使其以流式方式工作。 下面的示例基于GSON 2.5并大量使用reflection,因为JsonReader非常小心地隐藏其状态。

EnhancedGson25JsonReader.java

 final class EnhancedGson25JsonReader extends JsonReader { // A listener to accept the internal character buffers. // Accepting a single string built on such buffers is total memory waste as well. interface ISlicedStringListener { void accept(char[] buffer, int start, int length) throws IOException; } // The constants can be just copied /** @see JsonReader#PEEKED_NONE */ private static final int PEEKED_NONE = 0; /** @see JsonReader#PEEKED_SINGLE_QUOTED */ private static final int PEEKED_SINGLE_QUOTED = 8; /** @see JsonReader#PEEKED_DOUBLE_QUOTED */ private static final int PEEKED_DOUBLE_QUOTED = 9; // Here is a bunch of spies made to "spy" for the parent's class state private final FieldSpy peeked; private final MethodSpy doPeek; private final MethodSpy getLineNumber; private final MethodSpy getColumnNumber; private final FieldSpy buffer; private final FieldSpy pos; private final FieldSpy limit; private final MethodSpy readEscapeCharacter; private final FieldSpy lineNumber; private final FieldSpy lineStart; private final MethodSpy fillBuffer; private final MethodSpy syntaxError; private final FieldSpy stackSize; private final FieldSpy pathIndices; private EnhancedJsonReader(final Reader reader) throws NoSuchFieldException, NoSuchMethodException { super(reader); peeked = spyField(JsonReader.class, this, "peeked"); doPeek = spyMethod(JsonReader.class, this, "doPeek"); getLineNumber = spyMethod(JsonReader.class, this, "getLineNumber"); getColumnNumber = spyMethod(JsonReader.class, this, "getColumnNumber"); buffer = spyField(JsonReader.class, this, "buffer"); pos = spyField(JsonReader.class, this, "pos"); limit = spyField(JsonReader.class, this, "limit"); readEscapeCharacter = spyMethod(JsonReader.class, this, "readEscapeCharacter"); lineNumber = spyField(JsonReader.class, this, "lineNumber"); lineStart = spyField(JsonReader.class, this, "lineStart"); fillBuffer = spyMethod(JsonReader.class, this, "fillBuffer", int.class); syntaxError = spyMethod(JsonReader.class, this, "syntaxError", String.class); stackSize = spyField(JsonReader.class, this, "stackSize"); pathIndices = spyField(JsonReader.class, this, "pathIndices"); } static EnhancedJsonReader getEnhancedGson25JsonReader(final Reader reader) { try { return new EnhancedJsonReader(reader); } catch ( final NoSuchFieldException | NoSuchMethodException ex ) { throw new RuntimeException(ex); } } // This method has been copied and reworked from the nextString() implementation void nextSlicedString(final ISlicedStringListener listener) throws IOException { int p = peeked.get(); if ( p == PEEKED_NONE ) { p = doPeek.get(); } switch ( p ) { case PEEKED_SINGLE_QUOTED: nextQuotedSlicedValue('\'', listener); break; case PEEKED_DOUBLE_QUOTED: nextQuotedSlicedValue('"', listener); break; default: throw new IllegalStateException("Expected a string but was " + peek() + " at line " + getLineNumber.get() + " column " + getColumnNumber.get() + " path " + getPath() ); } peeked.accept(PEEKED_NONE); pathIndices.get()[stackSize.get() - 1]++; } // The following method is also a copy-paste that was patched for the "spies". // It's, in principle, the same as the source one, but it has one more buffer singleCharBuffer // in order not to add another method to the ISlicedStringListener interface (enjoy lamdbas as much as possible). // Note that the main difference between these two methods is that this one // does not aggregate a single string value, but just delegates the internal // buffers to call-sites, so the latter ones might do anything with the buffers. /** * @see JsonReader#nextQuotedValue(char) */ private void nextQuotedSlicedValue(final char quote, final ISlicedStringListener listener) throws IOException { final char[] buffer = this.buffer.get(); final char[] singleCharBuffer = new char[1]; while ( true ) { int p = pos.get(); int l = limit.get(); int start = p; while ( p < l ) { final int c = buffer[p++]; if ( c == quote ) { pos.accept(p); listener.accept(buffer, start, p - start - 1); return; } else if ( c == '\\' ) { pos.accept(p); listener.accept(buffer, start, p - start - 1); singleCharBuffer[0] = readEscapeCharacter.get(); listener.accept(singleCharBuffer, 0, 1); p = pos.get(); l = limit.get(); start = p; } else if ( c == '\n' ) { lineNumber.accept(lineNumber.get() + 1); lineStart.accept(p); } } listener.accept(buffer, start, p - start); pos.accept(p); if ( !fillBuffer.apply(just1) ) { throw syntaxError.apply(justUnterminatedString); } } } // Save some memory private static final Object[] just1 = { 1 }; private static final Object[] justUnterminatedString = { "Unterminated string" }; } 

FieldSpy.java

 final class FieldSpy implements Supplier, Consumer { private final Object instance; private final Field field; private FieldSpy(final Object instance, final Field field) { this.instance = instance; this.field = field; } static  FieldSpy spyField(final Class declaringClass, final Object instance, final String fieldName) throws NoSuchFieldException { final Field field = declaringClass.getDeclaredField(fieldName); field.setAccessible(true); return new FieldSpy<>(instance, field); } @Override public T get() { try { @SuppressWarnings("unchecked") final T value = (T) field.get(instance); return value; } catch ( final IllegalAccessException ex ) { throw new RuntimeException(ex); } } @Override public void accept(final T value) { try { field.set(instance, value); } catch ( final IllegalAccessException ex ) { throw new RuntimeException(ex); } } } 

MethodSpy.java

 final class MethodSpy implements Function, Supplier { private static final Object[] emptyObjectArray = {}; private final Object instance; private final Method method; private MethodSpy(final Object instance, final Method method) { this.instance = instance; this.method = method; } static  MethodSpy spyMethod(final Class declaringClass, final Object instance, final String methodName, final Class... parameterTypes) throws NoSuchMethodException { final Method method = declaringClass.getDeclaredMethod(methodName, parameterTypes); method.setAccessible(true); return new MethodSpy<>(instance, method); } @Override public T get() { // my javac generates useless new Object[0] if no args passed return apply(emptyObjectArray); } @Override public T apply(final Object[] arguments) { try { @SuppressWarnings("unchecked") final T value = (T) method.invoke(instance, arguments); return value; } catch ( final IllegalAccessException | InvocationTargetException ex ) { throw new RuntimeException(ex); } } } 

HugeJsonReaderDemo.java

这是一个演示,它使用该方法读取巨大的JSON并将其字符串值重定向到另一个文件。

 public static void main(final String... args) throws IOException { try ( final EnhancedGson25JsonReader input = getEnhancedGson25JsonReader(new InputStreamReader(new FileInputStream("./huge.json"))); final Writer output = new OutputStreamWriter(new BufferedOutputStream(new FileOutputStream("./huge.json.STRINGS"))) ) { while ( input.hasNext() ) { final JsonToken token = input.peek(); switch ( token ) { case BEGIN_OBJECT: input.beginObject(); break; case NAME: input.nextName(); break; case STRING: input.nextSlicedString(output::write); break; default: throw new AssertionError(token); } } } } 

我成功地将上面的字段提取到文件中。 输入文件长度为544 MB( 570 425 371字节),并由以下JSON块生成:

  • {"result":"OK","base64":"
  • JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC × 16777216 (2 ^ 24)
  • "}

结果是(因为我只是将所有字符串重定向到文件):

  • OK
  • JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC × 16777216 (2 ^ 24)

我认为你面临一个非常有趣的问题。 从GSON团队那里得到一些可能的API增强反馈会很高兴。