在我的Storm集群中读取AWS SQS队列时,导致这些ParseErrorexception的原因是什么

我正在使用Storm 0.8.1从Amazon SQS队列中读取传入消息,并在执行此操作时获得一致的exception:

2013-12-02 02:21:38 executor [ERROR] java.lang.RuntimeException: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1] Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.) at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:219) at REDACTED.spouts.SqsQueueSpout.nextTuple(SqsQueueSpout.java:88) at backtype.storm.daemon.executor$fn__3976$fn__4017$fn__4018.invoke(executor.clj:447) at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377) at clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:701) Caused by: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1] Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.) at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:524) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:298) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:167) at com.amazonaws.services.sqs.AmazonSQSClient.invoke(AmazonSQSClient.java:812) at com.amazonaws.services.sqs.AmazonSQSClient.receiveMessage(AmazonSQSClient.java:575) at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:191) ... 5 more Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1] Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.setInputSource(XMLStreamReaderImpl.java:219) at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.(XMLStreamReaderImpl.java:189) at com.sun.xml.internal.stream.XMLInputFactoryImpl.getXMLStreamReaderImpl(XMLInputFactoryImpl.java:277) at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLStreamReader(XMLInputFactoryImpl.java:129) at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLEventReader(XMLInputFactoryImpl.java:78) at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:85) at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:41) at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:503) ... 10 more 

我调试了队列中的数据,一切看起来都不错。 我无法弄清楚为什么API的XML响应会导致这些问题。 有任何想法吗?

多年来在这里回答我自己的问题。

目前,Oracle和OpenJDK的Java中存在XML扩展限制处理错误,导致共享计数器在解析多个XML文档时达到默认上限。

  1. https://blogs.oracle.com/joew/entry/jdk_7u45_aws_issue_123
  2. https://bugs.openjdk.java.net/browse/JDK-8028111
  3. https://github.com/aws/aws-sdk-java/issues/123

虽然我认为我们的版本(6b27-1.12.6-1ubuntu0.12.04.4)没有受到影响,但运行OpenJDK错误报告中给出的示例代码确实证实我们容易受到该错误的影响。

要解决这个问题,我需要将jdk.xml.entityExpansionLimit=0传递给Storm工作者。 通过在我的集群中向storm.yaml添加以下内容,我能够缓解此问题。

 supervisor.childopts: "-Djdk.xml.entityExpansionLimit=0" worker.childopts: "-Djdk.xml.entityExpansionLimit=0" 

我应该注意到,从技术上讲,这会让您遭受拒绝服务攻击,但由于我们的XML文档仅来自SQS,因此我并不担心有人伪造恶意XML来杀死我们的工作人员。