java 8中的漂亮打印XML

我有一个XML文件存储为DOM文档，我想将它打印到控制台，最好不使用外部库。 我知道这个问题已在本网站上被多次询问，但以前的答案都没有对我有用。 我正在使用java 8，所以也许这是我的代码与以前的问题不同的地方？我还尝试使用从网络上找到的代码手动设置变换器，但这只是导致not found错误。

这是我的代码，它当前只是在控制台左侧的新行上输出每个xml元素。

 import java.io.*; import javax.xml.parsers.*; import javax.xml.transform.*; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamResult; import org.w3c.dom.Document; import org.xml.sax.InputSource; import org.xml.sax.SAXException; public class Test { public Test(){ try { //java.lang.System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.xsltc.trax.TransformerFactoryImpl"); DocumentBuilderFactory dbFactory; DocumentBuilder dBuilder; Document original = null; try { dbFactory = DocumentBuilderFactory.newInstance(); dBuilder = dbFactory.newDocumentBuilder(); original = dBuilder.parse(new InputSource(new InputStreamReader(new FileInputStream("xml Store - Copy.xml")))); } catch (SAXException | IOException | ParserConfigurationException e) { e.printStackTrace(); } StringWriter stringWriter = new StringWriter(); StreamResult xmlOutput = new StreamResult(stringWriter); TransformerFactory tf = TransformerFactory.newInstance(); //tf.setAttribute("indent-number", 2); Transformer transformer = tf.newTransformer(); transformer.setOutputProperty(OutputKeys.METHOD, "xml"); transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4"); transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no"); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); transformer.transform(new DOMSource(original), xmlOutput); java.lang.System.out.println(xmlOutput.getWriter().toString()); } catch (Exception ex) { throw new RuntimeException("Error converting to String", ex); } } public static void main(String[] args){ new Test(); } }

我猜这个问题与原始文件中的空白文本节点 （即只有空格的文本节点）有关。您应该尝试使用以下代码在解析后立即以编程方式删除它们。如果你不删除它们， Transformer将保留它们。

 original.getDocumentElement().normalize(); XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("//text()[normalize-space(.) = '']"); NodeList blankTextNodes = (NodeList) xpath.evaluate(original, XPathConstants.NODESET); for (int i = 0; i < blankTextNodes.getLength(); i++) { blankTextNodes.item(i).getParentNode().removeChild(blankTextNodes.item(i)); }

在回复Espinosa的评论时，这里是“ 原始xml尚未（部分）缩进或包含新行 ”的解决方案。

背景

启发此解决方案的文章摘录（参见下面的参考资料 ）：

根据DOM规范，标记之外的空格是完全有效的，并且它们被正确保留。 要删除它们，我们可以使用XPath的规范化空间来定位所有空白节点并首先删除它们。

Java代码

 public static String toPrettyString(String xml, int indent) { try { // Turn xml string into a document Document document = DocumentBuilderFactory.newInstance() .newDocumentBuilder() .parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8")))); // Remove whitespaces outside tags document.normalize(); XPath xPath = XPathFactory.newInstance().newXPath(); NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']", document, XPathConstants.NODESET); for (int i = 0; i < nodeList.getLength(); ++i) { Node node = nodeList.item(i); node.getParentNode().removeChild(node); } // Setup pretty print options TransformerFactory transformerFactory = TransformerFactory.newInstance(); transformerFactory.setAttribute("indent-number", indent); Transformer transformer = transformerFactory.newTransformer(); transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); // Return pretty print xml string StringWriter stringWriter = new StringWriter(); transformer.transform(new DOMSource(document), new StreamResult(stringWriter)); return stringWriter.toString(); } catch (Exception e) { throw new RuntimeException(e); } }

样品用法

 String xml = "" + // "\n " + // "\nCoco Puff" + // "\n 10 "; System.out.println(toPrettyString(xml, 4));

产量

  Coco Puff 10

参考

Java：在MyShittyCode上发布的正确缩进 XML字符串
将新XML节点保存到文件

这适用于Java 8：

 public static void main (String[] args) throws Exception { String xmlString = "ME"; DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder(); Document document = documentBuilder.parse(new InputSource(new StringReader(xmlString))); pretty(document, System.out, 2); } private static void pretty(Document document, OutputStream outputStream, int indent) throws Exception { TransformerFactory transformerFactory = TransformerFactory.newInstance(); Transformer transformer = transformerFactory.newTransformer(); transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); if (indent > 0) { transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", Integer.toString(indent)); } Result result = new StreamResult(outputStream); Source source = new DOMSource(document); transformer.transform(source, result); }

我编写了一个简单的类来删除文档中的空格 – 支持命令行，不使用DOM / XPath。

编辑：想想看，项目还包含一个处理现有空白的漂亮打印机：

 PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();

我不喜欢任何常见的XML格式化解决方案，因为它们都删除了超过1个连续的新行字符（出于某种原因，删除空格/制表符和删除新行字符是不可分割的……）。这是我的解决方案，它实际上是为XHTML制作的，但也应该用XML完成工作：

 public String GenerateTabs(int tabLevel) { char[] tabs = new char[tabLevel * 2]; Arrays.fill(tabs, ' '); //Or: //char[] tabs = new char[tabLevel]; //Arrays.fill(tabs, '\t'); return new String(tabs); } public String FormatXHTMLCode(String code) { // Split on new lines. String[] splitLines = code.split("\\n", 0); int tabLevel = 0; // Go through each line. for (int lineNum = 0; lineNum < splitLines.length; ++lineNum) { String currentLine = splitLines[lineNum]; if (currentLine.trim().isEmpty()) { splitLines[lineNum] = ""; } else if (currentLine.matches(".*<[^/!][^<>]+?(??")) { splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum]; ++tabLevel; } else if (currentLine.matches(".*]+?>")) { --tabLevel; if (tabLevel < 0) { tabLevel = 0; } splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum]; } else if (currentLine.matches("[^<>]*?/>")) { splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum]; --tabLevel; if (tabLevel < 0) { tabLevel = 0; } } else { splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum]; } } return String.join("\n", splitLines); }

它做了一个假设 ：除了包含XML / XHTML标记的字符外，没有<>字符。

创建xml文件：

 new FileInputStream("xml Store - Copy.xml") ;// result xml file format incorrect !

这样，当将给定输入源的内容解析为XML文档并返回一个新的DOM对象时。

 Document original = null; ... original.parse("data.xml");//input source as an XML document

java 8中的漂亮打印XML

xpath 2.0 for java possible

使用Java解析HTML数据（DOM解析）

Android，org.w3c.dom：没有可用的validationDocumentBuilder实现

如何在Java中将DOM节点从一个文档复制到另一个文档？

getChildNodes给出了意想不到的结果

哪个Java DOM Wrapper最好或最受欢迎？

如何将JAXB对象封送到org.w3c.dom.Document？

使用dom java解析xml

DOMDocument getNodeValue（）返回null（包含输出转义字符串）

Node.getTextContent（）有一种方法可以获取当前节点的文本内容，而不是后代的文本