如何从一个XML JAVA中获取CDATA标记中包含的文本内容

我有以下XML:

    
application/xml
local-C++
200 <![CDATA[]]>

我想从内容节点解析以下文本,如下所示:

<![CDATA[]]>

请注意,内容包含在CDATA标记中。 如何使用任何方法在Java中完成此操作。

这是我的代码:

 @Test public void testGetDoOrchResponse() throws IOException { String path = "/Users/haddad/Git/Tools/ContentUtils/src/test/resources/testdata/doOrch_testfiles/doOrch_response.xml"; File f = new File(path); String response = FileUtils.readFileToString(f); String content = getDoOrchResponse(response, "content"); System.out.println("Content: "+content); } 

//输出:内容:空白

 static String getDoOrchResponse(String xml, String tagFragment) throws FileNotFoundException { String content = new String(); try { Document doc = getDocumentXML(xml); NodeList nlNodeExplanationList = doc.getElementsByTagName("response"); for(int i=0;i<nlNodeExplanationList.getLength();i++) { Node explanationNode = nlNodeExplanationList.item(i); List titleList = getTextValuesByTagName((Element)explanationNode, tagFragment); content = titleList.get(0); } } catch (IOException e) { e.printStackTrace(); } return content; } static List getTextValuesByTagName(Element element, String tagName) { NodeList nodeList = element.getElementsByTagName(tagName); ArrayList list = new ArrayList(); for (int i = 0; i < nodeList.getLength(); i++) { String textValue = getTextValue(nodeList.item(i)); if(textValue.equalsIgnoreCase("") ) { textValue = "blank"; } list.add(textValue); } return list; } static String getTextValue(Node node) { StringBuffer textValue = new StringBuffer(); int length = node.getChildNodes().getLength(); for (int i = 0; i < length; i ++) { Node c = node.getChildNodes().item(i); if (c.getNodeType() == Node.TEXT_NODE) { textValue.append(c.getNodeValue()); } } return textValue.toString().trim(); } static Document getDocumentXML(String xml) throws FileNotFoundException { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db; Document doc = null; try { db = dbf.newDocumentBuilder(); doc = db.parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8")))); doc.getDocumentElement().normalize(); } catch (ParserConfigurationException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } return doc; } 

我究竟做错了什么? 为什么我输出空白? 我只是看不到它……

如果要提取Element节点的内容,请使用getTextContent()方法。 如果您确实需要或想要CDATA部分标记,那么您需要使用LSSerializer或类似程序序列化该节点:

  DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance(); docFactory.setNamespaceAware(true); DocumentBuilder docBuilder = docFactory.newDocumentBuilder(); Document doc = docBuilder.parse(new File("doc1.xml")); Element content = (Element)doc.getElementsByTagNameNS("http://comResponse.engine/response", "content").item(0); if (content != null) { System.out.println(content.getTextContent()); LSSerializer ser = ((DOMImplementationLS)doc.getImplementation()).createLSSerializer(); if (content.getFirstChild() != null) { System.out.println(ser.writeToString(content.getFirstChild())); } } 

这就是理论,对我来说,Java JRE 1.8输出没有CDATA部分的结束标记,看起来LSSerializer与单个CDATA部分节点无法正常工作。