如何在序列化之前从DOM中删除仅空白文本节点？

我有一些Java（5.0）代码从各种（缓存）数据源构造DOM，然后删除不需要的某些元素节点，然后使用以下方法将结果序列化为XML字符串：

// Serialize DOM back into a string Writer out = new StringWriter(); Transformer tf = TransformerFactory.newInstance().newTransformer(); tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); tf.setOutputProperty(OutputKeys.INDENT, "no"); tf.transform(new DOMSource(doc), new StreamResult(out)); return out.toString();

但是，由于我要删除多个元素节点，最终在最终的序列化文档中会有很多额外的空格。

是否有一种简单的方法可以在将序列化为字符串之前（或同时）从DOM中删除/折叠无关的空格？

您可以使用XPath找到空文本节点，然后以编程方式删除它们，如下所示：

 XPathFactory xpathFactory = XPathFactory.newInstance(); // XPath to find empty text nodes. XPathExpression xpathExp = xpathFactory.newXPath().compile( "//text()[normalize-space(.) = '']"); NodeList emptyTextNodes = (NodeList) xpathExp.evaluate(doc, XPathConstants.NODESET); // Remove each empty text node from document. for (int i = 0; i < emptyTextNodes.getLength(); i++) { Node emptyTextNode = emptyTextNodes.item(i); emptyTextNode.getParentNode().removeChild(emptyTextNode); }

如果您希望更多地控制节点删除，而不是使用XSL模板轻松实现，则此方法可能很有用。

尝试使用以下XSL和strip-space元素来序列化DOM：

http://helpdesk.objects.com.au/java/how-do-i-remove-whitespace-from-an-xml-document

下面的代码删除带有所有空格的注释节点和文本节点。如果文本节点具有某个值，则将修剪值

 public static void clean(Node node) { NodeList childNodes = node.getChildNodes(); for (int n = childNodes.getLength() - 1; n >= 0; n--) { Node child = childNodes.item(n); short nodeType = child.getNodeType(); if (nodeType == Node.ELEMENT_NODE) clean(child); else if (nodeType == Node.TEXT_NODE) { String trimmedNodeVal = child.getNodeValue().trim(); if (trimmedNodeVal.length() == 0) node.removeChild(child); else child.setNodeValue(trimmedNodeVal); } else if (nodeType == Node.COMMENT_NODE) node.removeChild(child); } }

参考： http ： //www.sitepoint.com/removing-useless-nodes-from-the-dom/

另一种可能的方法是在删除目标节点的同时删除相邻的空格：

 private void removeNodeAndTrailingWhitespace(Node node) { List exiles = new ArrayList(); exiles.add(node); for (Node whitespace = node.getNextSibling(); whitespace != null && whitespace.getNodeType() == Node.TEXT_NODE && whitespace.getTextContent().matches("\\s*"); whitespace = whitespace.getNextSibling()) { exiles.add(whitespace); } for (Node exile: exiles) { exile.getParentNode().removeChild(exile); } }

这有利于保持现有格式的其余部分不变。

以下代码有效：

 public String getSoapXmlFormatted(String pXml) { try { if (pXml != null) { DocumentBuilderFactory tDbFactory = DocumentBuilderFactory .newInstance(); DocumentBuilder tDBuilder; tDBuilder = tDbFactory.newDocumentBuilder(); Document tDoc = tDBuilder.parse(new InputSource( new StringReader(pXml))); removeWhitespaces(tDoc); final DOMImplementationRegistry tRegistry = DOMImplementationRegistry .newInstance(); final DOMImplementationLS tImpl = (DOMImplementationLS) tRegistry .getDOMImplementation("LS"); final LSSerializer tWriter = tImpl.createLSSerializer(); tWriter.getDomConfig().setParameter("format-pretty-print", Boolean.FALSE); tWriter.getDomConfig().setParameter( "element-content-whitespace", Boolean.TRUE); pXml = tWriter.writeToString(tDoc); } } catch (RuntimeException | ParserConfigurationException | SAXException | IOException | ClassNotFoundException | InstantiationException | IllegalAccessException tE) { tE.printStackTrace(); } return pXml; } public void removeWhitespaces(Node pRootNode) { if (pRootNode != null) { NodeList tList = pRootNode.getChildNodes(); if (tList != null && tList.getLength() > 0) { ArrayList tRemoveNodeList = new ArrayList(); for (int i = 0; i < tList.getLength(); i++) { Node tChildNode = tList.item(i); if (tChildNode.getNodeType() == Node.TEXT_NODE) { if (tChildNode.getTextContent() == null || "".equals(tChildNode.getTextContent().trim())) tRemoveNodeList.add(tChildNode); } else removeWhitespaces(tChildNode); } for (Node tRemoveNode : tRemoveNodeList) { pRootNode.removeChild(tRemoveNode); } } } }

 transformer.setOutputProperty(OutputKeys.INDENT, "yes");

这将保留xml缩进。

如何在序列化之前从DOM中删除仅空白文本节点？

如何在Java中将xml元素及其子节点转换为String？

Java中的高效XSLT管道（或将结果重定向到源）

在java，dom，xml解析中设置新节点值时出现问题

从org.w3c.dom.Node获取Xpath

DOMDocument getNodeValue（）返回null（包含输出转义字符串）

Xerces DOM解析器非常慢？

在使用Java解析时如何在文档元素之前保留空格？

Java XML解析：避免实体引用解析

复制同一输出xml文件-java中的节点

如何在XML中查找和替换属性值