Avro架构不支持向后兼容性

我有这个avro架构

{ "namespace": "xx.xxxx.xxxxx.xxxxx", "type": "record", "name": "MyPayLoad", "fields": [ {"name": "filed1", "type": "string"}, {"name": "filed2", "type": "long"}, {"name": "filed3", "type": "boolean"}, { "name" : "metrics", "type": { "type" : "array", "items": { "name": "MyRecord", "type": "record", "fields" : [ {"name": "min", "type": "long"}, {"name": "max", "type": "long"}, {"name": "sum", "type": "long"}, {"name": "count", "type": "long"} ] } } } ] } 

这是我们用来解析数据的代码

 public static final MyPayLoad parseBinaryPayload(byte[] payload) { DatumReader payloadReader = new SpecificDatumReader(MyPayLoad.class); Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null); MyPayLoad myPayLoad = null; try { myPayLoad = payloadReader.read(null, decoder); } catch (IOException e) { logger.log(Level.SEVERE, e.getMessage(), e); } return myPayLoad; } 

现在我想在架构中添加一个字段,因此架构如下所示

  { "namespace": "xx.xxxx.xxxxx.xxxxx", "type": "record", "name": "MyPayLoad", "fields": [ {"name": "filed1", "type": "string"}, {"name": "filed2", "type": "long"}, {"name": "filed3", "type": "boolean"}, { "name" : "metrics", "type": { "type" : "array", "items": { "name": "MyRecord", "type": "record", "fields" : [ {"name": "min", "type": "long"}, {"name": "max", "type": "long"}, {"name": "sum", "type": "long"}, {"name": "count", "type": "long"} ] } } } {"name": "agentType", "type": ["null", "string"], "default": "APP_AGENT"} ] } 

请注意添加的字段,并且还定义了默认值。 问题是,如果我们收到使用旧架构编写的数据,我会收到此错误

 java.io.EOFException: null at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473) ~[avro-1.7.4.jar:1.7.4] at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) ~[avro-1.7.4.jar:1.7.4] at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423) ~[avro-1.7.4.jar:1.7.4] at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) ~[avro-1.7.4.jar:1.7.4] at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) ~[avro-1.7.4.jar:1.7.4] at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) ~[avro-1.7.4.jar:1.7.4] at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) ~[avro-1.7.4.jar:1.7.4] at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177) ~[avro-1.7.4.jar:1.7.4] at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) ~[avro-1.7.4.jar:1.7.4] at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) ~[avro-1.7.4.jar:1.7.4] at com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38) ~[blitz-shared.jar:na] 

我从这份文件中了解到这应该是向后兼容的,但似乎并不是这样。 知道我做错了什么吗?

最后我得到了这个工作。 我需要在SpecificDatumReader中给出两个模式所以我修改了这样的解析,我在读者中传递了新旧模式,它就像一个魅力

 public static final MyPayLoad parseBinaryPayload(byte[] payload) { DatumReader payloadReader = new SpecificDatumReader<>(SCHEMA_V1, SCHEMA_V2); Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null); MyPayLoad myPayLoad = null; try { myPayLoad = payloadReader.read(null, decoder); } catch (IOException e) { logger.log(Level.SEVERE, e.getMessage(), e); } return myPayLoad; } 

我正面临着这种情况。 尝试使用较新的架构读取时,旧架构写入的数据会失败。 较新的模式只有一个带有union和default set的附加字段。 “类型”:[ “空”, “字符串”], “DOC”: “”, “默认”:空

尽管设置了默认值,但在读取期间不会自动填充空值。 在阅读期间需要提供作者和读者模式。 我的理解是avro是向后兼容的,它应该能够支持更新的列而不需要旧的模式。

我可以在您的架构中看到两个可能的问题

  1. 我的默认值似乎始终为null,以指定您需要设置的值

"default": null

  1. 同样在您的模式中,您忘记在数组和新字段之间添加(字段分隔符)。 因此,尝试将模式更改为

{ "namespace": "xx.xxxx.xxxxx.xxxxx", "type": "record", "name": "MyPayLoad", "fields": [ {"name": "filed1", "type": "string"}, {"name": "filed2", "type": "long"}, {"name": "filed3", "type": "boolean"}, { "name" : "metrics", "type": { "type" : "array", "items": { "name": "MyRecord", "type": "record", "fields" : [ {"name": "min", "type": "long"}, {"name": "max", "type": "long"}, {"name": "sum", "type": "long"}, {"name": "count", "type": "long"} ] } } }, {"name": "agentType", "type": ["null", "string"], "default":null} ] }