如何在编写XML文件时忽略DTDvalidation但保留Doctype？

我正在研究一个系统，该系统应该能够读取任何（或至少是任何格式良好的）XML文件，操作一些节点并将它们写回到同一个文件中。我希望我的代码尽可能通用，我不想要

在我的代码中的任何位置对模式/ Doctype信息进行硬编码引用。 doctype信息位于源文档中，我想保留该doctype信息，而不是在我的代码中再次提供。如果文档没有DocType，我不会添加一个。除了我的几个节点之外，我根本不关心这些文件的forms或内容。
自定义EntityResolvers或StreamFilters以省略或以其他方式操纵源信息（已经遗憾的是，命名空间信息似乎无法从声明它的文档文件中访问，但我可以使用uglier XPath进行管理）
DTDvalidation。我没有引用的DTD，我不想包含它们，并且在不知道它们的情况下完全可以进行节点操作。

目的是使源文件完全不变，除了通过XPath检索的已更改的节点。我想逃避标准的javax.xml。

我到目前为止的进展：

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setAttribute("http://xml.org/sax/features/namespaces", true); factory.setAttribute("http://xml.org/sax/features/validation", false); factory.setAttribute("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false); factory.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false); factory.setNamespaceAware(true); factory.setIgnoringElementContentWhitespace(false); factory.setIgnoringComments(false); factory.setValidating(false); DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.parse(new InputSource(inStream));

这会成功将XML源加载到org.w3c.dom.Document中，忽略DTDvalidation。我可以做我的替换然后我用

  Source source = new DOMSource(document); Result result = new StreamResult(getOutputStream(getPath())); // Write the DOM document to the file Transformer xformer = TransformerFactory.newInstance().newTransformer(); xformer.transform(source, result);

把它写回来。这几乎是完美的。但无论我做什么，Doctype标签都不见了。在调试时，我看到解析后在Document对象中有一个DeferredDoctypeImpl [log4j：configuration：null]对象，但它在某种程度上是错误的，空的或被忽略的。我测试的文件就是这样开始的（但是对于其他文件类型它是一样的）：

[…]

我认为有很多（简单的）方法涉及黑客攻击或将额外的JAR引入项目中。但我更愿意使用我已经使用过的工具。

对不起，现在使用XMLSerializer而不是Transformer …

以下是使用JDK中的LSSerializer实现的方法：

  private void writeDocument(Document doc, String filename) throws IOException { Writer writer = null; try { /* * Could extract "ls" to an instance attribute, so it can be reused. */ DOMImplementationLS ls = (DOMImplementationLS) DOMImplementationRegistry.newInstance(). getDOMImplementation("LS"); writer = new OutputStreamWriter(new FileOutputStream(filename)); LSOutput lsout = ls.createLSOutput(); lsout.setCharacterStream(writer); /* * If "doc" has been constructed by parsing an XML document, we * should keep its encoding when serializing it; if it has been * constructed in memory, its encoding has to be decided by the * client code. */ lsout.setEncoding(doc.getXmlEncoding()); LSSerializer serializer = ls.createLSSerializer(); serializer.write(doc, lsout); } catch (Exception e) { throw new IOException(e); } finally { if (writer != null) writer.close(); } }

需要的import：

 import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStreamWriter; import java.io.Writer; import org.w3c.dom.Document; import org.w3c.dom.bootstrap.DOMImplementationRegistry; import org.w3c.dom.ls.DOMImplementationLS; import org.w3c.dom.ls.LSOutput; import org.w3c.dom.ls.LSSerializer;

我知道这是一个已经回答的旧问题，但我认为技术细节可能对某人有所帮助。

我尝试使用LSSerializer库，并且在保留Doctype方面无法使用它。这是Stephan可能使用的解决方案注意：这是在scala中但是使用了一个java库，所以只需转换你的代码

 import com.sun.org.apache.xml.internal.serialize.{OutputFormat, XMLSerializer} def transformXML(root: Element, file: String): Unit = { val doc = root.getOwnerDocument val format = new OutputFormat(doc) format.setIndenting(true) val writer = new OutputStreamWriter(new FileOutputStream(new File(file))) val serializer = new XMLSerializer(writer, format) serializer.serialize(doc) }

如何在编写XML文件时忽略DTDvalidation但保留Doctype？

使用DOM解析xml，DOCTYPE将被删除

如何在java的xpath中在运行时禁用dtd？

读取mybatis xml文件时会出现“java.net.UnknownHostException”