将原始二进制数据与XML一起存储的最标准Java方法是什么?

我需要将大量二进制数据存储到文件中,但我还想以XML格式读取/写入该文件的标头。

是的,我可以将二进制数据存储到某个XML值中,然后使用base64编码对其进行序列化。 但这不会节省空间。

我可以以一种或多或少的标准化方式“混合”XML数据和原始二进制数据吗?

我在想两个选择:

  • 有没有办法使用JAXB执行此操作?

  • 或者有没有办法获取一些现有的XML数据并将二进制数据附加到它,以便识别边界?

  • 是不是我正在寻找/用于SOAP的概念?

  • 或者是否在电子邮件标准中使用? (二元附件的分离)

我正在努力实现的方案:

[meta-info-about-boundary][XML-data][boundary][raw-binary-data] 

谢谢!

您可以使用AttachementMarshaller和AttachmentUnmarshaller。 这是JAXB / JAX-WS用于将二进制内容作为附件传递的桥梁。 您可以利用相同的机制来执行您想要的操作。


概念certificate

以下是如何实施。 这应该适用于任何JAXB impl(它适用于EclipseLink JAXB(MOXy)和参考实现)。

消息格式

 [xml_length][xml][attach1_length][attach1]...[attachN_length][attachN] 

这是一个具有多个byte []属性的对象。

 import javax.xml.bind.annotation.XmlRootElement; @XmlRootElement public class Root { private byte[] foo; private byte[] bar; public byte[] getFoo() { return foo; } public void setFoo(byte[] foo) { this.foo = foo; } public byte[] getBar() { return bar; } public void setBar(byte[] bar) { this.bar = bar; } } 

演示

此类用于演示如何使用MessageWriter和MessageReader:

 import java.io.FileInputStream; import java.io.FileOutputStream; import javax.xml.bind.JAXBContext; public class Demo { public static void main(String[] args) throws Exception { JAXBContext jc = JAXBContext.newInstance(Root.class); Root root = new Root(); root.setFoo("HELLO WORLD".getBytes()); root.setBar("BAR".getBytes()); MessageWriter writer = new MessageWriter(jc); FileOutputStream outStream = new FileOutputStream("file.xml"); writer.write(root, outStream); outStream.close(); MessageReader reader = new MessageReader(jc); FileInputStream inStream = new FileInputStream("file.xml"); Root root2 = (Root) reader.read(inStream); inStream.close(); System.out.println(new String(root2.getFoo())); System.out.println(new String(root2.getBar())); } } 

MessageWriter

负责将消息写入所需的格式:

 import java.io.ByteArrayOutputStream; import java.io.ObjectOutputStream; import java.io.OutputStream; import java.util.ArrayList; import java.util.List; import javax.activation.DataHandler; import javax.xml.bind.JAXBContext; import javax.xml.bind.Marshaller; import javax.xml.bind.attachment.AttachmentMarshaller; public class MessageWriter { private JAXBContext jaxbContext; public MessageWriter(JAXBContext jaxbContext) { this.jaxbContext = jaxbContext; } /** * Write the message in the following format: * [xml_length][xml][attach1_length][attach1]...[attachN_length][attachN] */ public void write(Object object, OutputStream stream) { try { Marshaller marshaller = jaxbContext.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FRAGMENT, true); BinaryAttachmentMarshaller attachmentMarshaller = new BinaryAttachmentMarshaller(); marshaller.setAttachmentMarshaller(attachmentMarshaller); ByteArrayOutputStream xmlStream = new ByteArrayOutputStream(); marshaller.marshal(object, xmlStream); byte[] xml = xmlStream.toByteArray(); xmlStream.close(); ObjectOutputStream messageStream = new ObjectOutputStream(stream); messageStream.write(xml.length); //[xml_length] messageStream.write(xml); // [xml] for(Attachment attachment : attachmentMarshaller.getAttachments()) { messageStream.write(attachment.getLength()); // [attachX_length] messageStream.write(attachment.getData(), attachment.getOffset(), attachment.getLength()); // [attachX] } messageStream.flush(); } catch(Exception e) { throw new RuntimeException(e); } } private static class BinaryAttachmentMarshaller extends AttachmentMarshaller { private static final int THRESHOLD = 10; private List attachments = new ArrayList(); public List getAttachments() { return attachments; } @Override public String addMtomAttachment(DataHandler data, String elementNamespace, String elementLocalName) { return null; } @Override public String addMtomAttachment(byte[] data, int offset, int length, String mimeType, String elementNamespace, String elementLocalName) { if(data.length < THRESHOLD) { return null; } int id = attachments.size() + 1; attachments.add(new Attachment(data, offset, length)); return "cid:" + String.valueOf(id); } @Override public String addSwaRefAttachment(DataHandler data) { return null; } @Override public boolean isXOPPackage() { return true; } } public static class Attachment { private byte[] data; private int offset; private int length; public Attachment(byte[] data, int offset, int length) { this.data = data; this.offset = offset; this.length = length; } public byte[] getData() { return data; } public int getOffset() { return offset; } public int getLength() { return length; } } } 

MessageReader

负责阅读信息:

 import java.io.ByteArrayInputStream; import java.io.IOException; import java.io.InputStream; import java.io.ObjectInputStream; import java.io.OutputStream; import java.util.HashMap; import java.util.Map; import javax.activation.DataHandler; import javax.activation.DataSource; import javax.xml.bind.JAXBContext; import javax.xml.bind.Unmarshaller; import javax.xml.bind.attachment.AttachmentUnmarshaller; public class MessageReader { private JAXBContext jaxbContext; public MessageReader(JAXBContext jaxbContext) { this.jaxbContext = jaxbContext; } /** * Read the message from the following format: * [xml_length][xml][attach1_length][attach1]...[attachN_length][attachN] */ public Object read(InputStream stream) { try { ObjectInputStream inputStream = new ObjectInputStream(stream); int xmlLength = inputStream.read(); // [xml_length] byte[] xmlIn = new byte[xmlLength]; inputStream.read(xmlIn); // [xml] BinaryAttachmentUnmarshaller attachmentUnmarshaller = new BinaryAttachmentUnmarshaller(); int id = 1; while(inputStream.available() > 0) { int length = inputStream.read(); // [attachX_length] byte[] data = new byte[length]; // [attachX] inputStream.read(data); attachmentUnmarshaller.getAttachments().put("cid:" + String.valueOf(id++), data); } Unmarshaller unmarshaller = jaxbContext.createUnmarshaller(); unmarshaller.setAttachmentUnmarshaller(attachmentUnmarshaller); ByteArrayInputStream byteInputStream = new ByteArrayInputStream(xmlIn); Object object = unmarshaller.unmarshal(byteInputStream); byteInputStream.close(); inputStream.close(); return object; } catch(Exception e) { throw new RuntimeException(e); } } private static class BinaryAttachmentUnmarshaller extends AttachmentUnmarshaller { private Map attachments = new HashMap(); public Map getAttachments() { return attachments; } @Override public DataHandler getAttachmentAsDataHandler(String cid) { byte[] bytes = attachments.get(cid); return new DataHandler(new ByteArrayDataSource(bytes)); } @Override public byte[] getAttachmentAsByteArray(String cid) { return attachments.get(cid); } @Override public boolean isXOPPackage() { return true; } } private static class ByteArrayDataSource implements DataSource { private byte[] bytes; public ByteArrayDataSource(byte[] bytes) { this.bytes = bytes; } public String getContentType() { return "application/octet-stream"; } public InputStream getInputStream() throws IOException { return new ByteArrayInputStream(bytes); } public String getName() { return null; } public OutputStream getOutputStream() throws IOException { return null; } } } 

了解更多信息

我遵循Blaise Doughan提出的概念,但没有附件marshallers:

我让一个XmlAdapter将一个byte[]转换为一个URI -reference并返回,而引用则指向存储原始数据的单独文件。 然后将XML文件和所有二进制文件放入zip中。

它类似于OpenOffice和ODF格式的方法,实际上是一个包含少量XML和二进制文件的zip。

(在示例代码中,没有写入实际的二进制文件,也没有创建zip。)

Bindings.java

 import java.net.*; import java.util.*; import javax.xml.bind.annotation.*; import javax.xml.bind.annotation.adapters.*; final class Bindings { static final String SCHEME = "storage"; static final Class[] ALL_CLASSES = new Class[]{ Root.class, RawRef.class }; static final class RawRepository extends XmlAdapter { final SortedMap map = new TreeMap<>(); final String host; private int lastID = 0; RawRepository(String host) { this.host = host; } @Override public byte[] unmarshal(URI o) { if (!SCHEME.equals(o.getScheme())) { throw new Error("scheme is: " + o.getScheme() + ", while expected was: " + SCHEME); } else if (!host.equals(o.getHost())) { throw new Error("host is: " + o.getHost() + ", while expected was: " + host); } String key = o.getPath(); if (!map.containsKey(key)) { throw new Error("key not found: " + key); } byte[] ret = map.get(key); return Arrays.copyOf(ret, ret.length); } @Override public URI marshal(byte[] o) { ++lastID; String key = String.valueOf(lastID); map.put(key, Arrays.copyOf(o, o.length)); try { return new URI(SCHEME, host, "/" + key, null); } catch (URISyntaxException ex) { throw new Error(ex); } } } @XmlRootElement @XmlType static final class Root { @XmlElement final List element = new LinkedList<>(); } @XmlType static final class RawRef { @XmlJavaTypeAdapter(RawRepository.class) @XmlElement byte[] raw = null; } } 

Main.java

 import java.io.*; import javax.xml.bind.*; public class _Run { public static void main(String[] args) throws Exception { JAXBContext context = JAXBContext.newInstance(Bindings.ALL_CLASSES); Marshaller marshaller = context.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true); Unmarshaller unmarshaller = context.createUnmarshaller(); Bindings.RawRepository adapter = new Bindings.RawRepository("myZipVFS"); marshaller.setAdapter(adapter); Bindings.RawRef ta1 = new Bindings.RawRef(); ta1.raw = "THIS IS A STRING".getBytes(); Bindings.RawRef ta2 = new Bindings.RawRef(); ta2.raw = "THIS IS AN OTHER STRING".getBytes(); Bindings.Root root = new Bindings.Root(); root.element.add(ta1); root.element.add(ta2); StringWriter out = new StringWriter(); marshaller.marshal(root, out); System.out.println(out.toString()); } } 

产量

   storage://myZipVFS/1   storage://myZipVFS/2   

这不是JAXB原生支持的,因为您不希望将二进制数据序列化为XML,但通常可以在使用JAXB时在更高级别完成。 我这样做的方式是使用web服务(SOAP和REST)使用MIME多部分/混合消息(检查多部分规范 )。 最初设计用于电子邮件,非常适合发送带有二进制数据的xml,大多数Web服务框架(如轴或jersey)以几乎透明的方式支持它。

下面是一个使用带有jersey-multipart扩展的 Jersey,使用带有REST web服务的二进制文件将XML对象与二进制文件一起发送的示例。

XML对象

 @XmlRootElement public class Book { private String title; private String author; private int year; //getter and setters... } 

客户

 byte[] bin = some binary data... Book b = new Book(); b.setAuthor("John"); b.setTitle("wild stuff"); b.setYear(2012); MultiPart multiPart = new MultiPart(); multiPart.bodyPart(new BodyPart(b, MediaType.APPLICATION_XML_TYPE)); multiPart.bodyPart(new BodyPart(bin, MediaType.APPLICATION_OCTET_STREAM_TYPE)); response = service.path("rest").path("multipart"). type(MultiPartMediaTypes.MULTIPART_MIXED). post(ClientResponse.class, multiPart); 

服务器

 @POST @Consumes(MultiPartMediaTypes.MULTIPART_MIXED) public Response post(MultiPart multiPart) { for(BodyPart part : multiPart.getBodyParts()) { System.out.println(part.getMediaType()); } return Response.status(Response.Status.ACCEPTED). entity("Attachements processed successfully."). type(MediaType.TEXT_PLAIN).build(); } 

我试图发送一个110917字节的文件。 使用wireshark,您可以看到数据是通过HTTP直接发送的,如下所示:

 Hypertext Transfer Protocol POST /org.etics.test.rest.server/rest/multipart HTTP/1.1\r\n Content-Type: multipart/mixed; boundary=Boundary_1_353042220_1343207087422\r\n MIME-Version: 1.0\r\n User-Agent: Java/1.7.0_04\r\n Host: localhost:8080\r\n Accept: text/html, image/gif, image/jpeg\r\n Connection: keep-alive\r\n Content-Length: 111243\r\n \r\n [Full request URI: http://localhost:8080/org.etics.test.rest.server/rest/multipart] MIME Multipart Media Encapsulation, Type: multipart/mixed, Boundary: "Boundary_1_353042220_1343207087422" [Type: multipart/mixed] First boundary: --Boundary_1_353042220_1343207087422\r\n Encapsulated multipart part: (application/xml) Content-Type: application/xml\r\n\r\n eXtensible Markup Language   John   wild stuff   2012   Boundary: \r\n--Boundary_1_353042220_1343207087422\r\n Encapsulated multipart part: (application/octet-stream) Content-Type: application/octet-stream\r\n\r\n Media Type Media Type: application/octet-stream (110917 bytes) Last boundary: \r\n--Boundary_1_353042220_1343207087422--\r\n 

如您所见,二进制数据发送时具有八位字节流,不会浪费空间,与在xml中内联发送二进制数据时发生的情况相反。 这只是非常低的开销MIME信封。 使用SOAP,原理是相同的(只是它将具有SOAP信封)。

我不这么认为 – XML库通常不适用于XML +额外数据。

但是你可能能够使用像特殊流包装一样简单的东西 – 它会暴露一个包含“XML”的流和一个二进制流(来自特殊的“格式”)。 然后JAXB(或任何其他XML库)可以使用“XML”流,并且二进制流保持独立。

还记得考虑“二进制”与“文本”文件。

快乐的编码。