使用XSD,目录解析器和XSLT的JAXP DOMvalidationXML

背景

使用JDK 6将XML文件加载到DOM中 。 必须针对XSDvalidationXML文件。 XSD文件位置因运行环境而异。 确保XML可以针对XSD进行validation,无论目录结构如何,都需要目录解析器。 一旦validation了XML,就可以对其进行转换。

我的理解是DocumentBuilderFactory可用于配置此类validation。 这是通过使用带有XMLCatalogResolver的DocumentBuilder来查找与XML文件关联的XSD文件(以及任何包含的文件)来实现的。

有关使用目录派生的XSDvalidationXML文档的问题,包括:

  • JAXP – 调试XSD目录查找
  • 具有自定义资源解析程序的Java XML Schemavalidation程序无法解析元素
  • XMLCatalog可以用于模式导入吗?
  • 如何可靠地从类路径资源(jar内)加载XMLCatalog?
  • 用于实体解析的Catalog.xml文件的XMLSchemavalidation
  • 从XJC中导入的模式解析类型定义失败
  • 使用Java查找可在xml架构中重复的项目
  • Java servlet:针对xsd的xmlvalidation

这些问题和答案中的大多数引用了硬编码的XSD文件路径,或者使用SAX执行validation,或者与DTD相关 ,或者需要JDOM依赖项 ,或者没有转换 。

问题

没有规范的解决方案描述如何使用JAXP DOM进行XSDvalidation的XML目录,随后通过XSLT进行转换。 有许多 片段 ,但没有编译和运行的完整独立示例(在JDK 6下)。

我发布了一个答案似乎在技术上有效,但过于冗长。

什么是规范方式(使用JDK 1.6库)来validation和转换XML文档? 这是一种可能的算法:

  1. 创建目录解析程序。
  2. 创建XML解析器。
  3. 将解析器与解析器相关联。
  4. 解析包含XSD引用的XML文档。
  5. 终止validation错误。
  6. 使用XSL模板转换经过validation的XML。

源文件

源文件包括目录管理器属性文件,Java源代码,目录文件,XML数据,XSL文件和XSD文件。 所有文件都相对于当前工作目录( ./ )。

Catalog Manager属性文件

CatalogResolver类读取此属性文件; 另存为./CatalogManager.properties

 catalogs=catalog.xml relative-catalogs=yes verbosity=99 prefer=system static-catalog=yes allow-oasis-xml-catalog-pi=yes 

TestXSD.java

这是主要的应用; 保存为./src/TestXSD.java

 package src; import java.io.*; import java.net.URI; import java.util.*; import java.util.regex.Pattern; import java.util.regex.Matcher; import javax.xml.parsers.*; import javax.xml.xpath.*; import javax.xml.XMLConstants; import org.w3c.dom.*; import org.xml.sax.*; import org.apache.xml.resolver.tools.CatalogResolver; import org.apache.xerces.util.XMLCatalogResolver; import static org.apache.xerces.jaxp.JAXPConstants.JAXP_SCHEMA_LANGUAGE; import static org.apache.xerces.jaxp.JAXPConstants.W3C_XML_SCHEMA; import javax.xml.validation.SchemaFactory; import javax.xml.validation.Schema; import javax.xml.validation.Validator; import javax.xml.transform.Result; import javax.xml.transform.Source; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.sax.SAXSource; import javax.xml.transform.stream.StreamResult; import javax.xml.transform.stream.StreamSource; /** * Download http://xerces.apache.org/xml-commons/components/resolver/CatalogManager.properties */ public class TestXSD { private final static String ENTITY_RESOLVER = "http://apache.org/xml/properties/internal/entity-resolver"; /** * This program reads an XML file, performs validation, reads an XSL * file, transforms the input XML, and then writes the transformed document * to standard output. * * args[0] - The XSL file used to transform the XML file * args[1] - The XML file to transform using the XSL file */ public static void main( String args[] ) throws Exception { // For validation error messages. ErrorHandler errorHandler = new DocumentErrorHandler(); // Read the CatalogManager.properties file. CatalogResolver resolver = new CatalogResolver(); XMLCatalogResolver xmlResolver = createXMLCatalogResolver( resolver ); logDebug( "READ XML INPUT SOURCE" ); // Load an XML document in preparation to transform it. InputSource xmlInput = new InputSource( new InputStreamReader( new FileInputStream( args[1] ) ) ); DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance(); dbFactory.setAttribute( JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA ); dbFactory.setNamespaceAware( true ); DocumentBuilder builder = dbFactory.newDocumentBuilder(); builder.setEntityResolver( xmlResolver ); builder.setErrorHandler( errorHandler ); logDebug( "PARSE XML INTO DOCUMENT MODEL" ); Document xmlDocument = builder.parse( xmlInput ); logDebug( "CONVERT XML DOCUMENT MODEL INTO DOMSOURCE" ); DOMSource xml = new DOMSource( xmlDocument ); logDebug( "GET XML SCHEMA DEFINITION" ); String schemaURI = getSchemaURI( xmlDocument ); logDebug( "SCHEMA URI: " + schemaURI ); if( schemaURI != null ) { logDebug( "CREATE SCHEMA FACTORY" ); // Create a Schema factory to obtain a Schema for XML validation... SchemaFactory sFactory = SchemaFactory.newInstance( W3C_XML_SCHEMA ); sFactory.setResourceResolver( xmlResolver ); logDebug( "CREATE XSD INPUT SOURCE" ); String xsdFileURI = xmlResolver.resolveURI( schemaURI ); logDebug( "CREATE INPUT SOURCE XSD FROM: " + xsdFileURI ); InputSource xsd = new InputSource( new FileInputStream( new File( new URI( xsdFileURI ) ) ) ); logDebug( "CREATE SCHEMA OBJECT FOR XSD" ); Schema schema = sFactory.newSchema( new SAXSource( xsd ) ); logDebug( "CREATE VALIDATOR FOR SCHEMA" ); Validator validator = schema.newValidator(); logDebug( "VALIDATE XML AGAINST XSD" ); validator.validate( xml ); } logDebug( "READ XSL INPUT SOURCE" ); // Load an XSL template for transforming XML documents. InputSource xslInput = new InputSource( new InputStreamReader( new FileInputStream( args[0] ) ) ); logDebug( "PARSE XSL INTO DOCUMENT MODEL" ); Document xslDocument = builder.parse( xslInput ); transform( xmlDocument, xslDocument, resolver ); System.out.println(); } private static void transform( Document xml, Document xsl, CatalogResolver resolver ) throws Exception { if( versionAtLeast( xsl, 2 ) ) { useXSLT2Transformer(); } logDebug( "CREATE TRANSFORMER FACTORY" ); // Create the transformer used for the document. TransformerFactory tFactory = TransformerFactory.newInstance(); tFactory.setURIResolver( resolver ); logDebug( "CREATE TRANSFORMER FROM XSL" ); Transformer transformer = tFactory.newTransformer( new DOMSource( xsl ) ); logDebug( "CREATE RESULT OUTPUT STREAM" ); // This enables writing the results to standard output. Result out = new StreamResult( new OutputStreamWriter( System.out ) ); logDebug( "TRANSFORM THE XML AND WRITE TO STDOUT" ); // Transform the document using a given stylesheet. transformer.transform( new DOMSource( xml ), out ); } /** * Answers whether the given XSL document version is greater than or * equal to the given required version number. * * @param xsl The XSL document to check for version compatibility. * @param version The version number to compare against. * * @return true iff the XSL document version is greater than or equal * to the version parameter. */ private static boolean versionAtLeast( Document xsl, float version ) { Element root = xsl.getDocumentElement(); float docVersion = Float.parseFloat( root.getAttribute( "version" ) ); return docVersion >= version; } /** * Enables Saxon9's XSLT2 transformer for XSLT2 files. */ private static void useXSLT2Transformer() { System.setProperty("javax.xml.transform.TransformerFactory", "net.sf.saxon.TransformerFactoryImpl"); } /** * Creates an XMLCatalogResolver based on the file names found in * the given CatalogResolver. The resulting XMLCatalogResolver will * contain the absolute path to all the files known to the given * CatalogResolver. * * @param resolver The CatalogResolver to examine for catalog file names. * @return An XMLCatalogResolver instance with the same number of catalog * files as found in the given CatalogResolver. */ private static XMLCatalogResolver createXMLCatalogResolver( CatalogResolver resolver ) { int index = 0; List files = resolver.getCatalog().getCatalogManager().getCatalogFiles(); String catalogs[] = new String[ files.size() ]; XMLCatalogResolver xmlResolver = new XMLCatalogResolver(); for( Object file : files ) { catalogs[ index ] = (new File( file.toString() )).getAbsolutePath(); index++; } xmlResolver.setCatalogList( catalogs ); return xmlResolver; } private static String[] parseNameValue( String nv ) { Pattern p = Pattern.compile( "\\s*(\\w+)=\"([^\"]*)\"\\s*" ); Matcher m = p.matcher( nv ); String result[] = new String[2]; if( m.find() ) { result[0] = m.group(1); result[1] = m.group(2); } return result; } /** * Retrieves the XML schema definition using an XSD. * * @param node The document (or child node) to traverse seeking processing * instruction nodes. * @return null if no XSD is present in the XML document. * @throws IOException Never thrown (uses StringReader). */ private static String getSchemaURI( Node node ) throws IOException { String result = null; if( node.getNodeType() == Node.PROCESSING_INSTRUCTION_NODE ) { ProcessingInstruction pi = (ProcessingInstruction)node; logDebug( "NODE IS PROCESSING INSTRUCTION" ); if( "xml-model".equals( pi.getNodeName() ) ) { logDebug( "PI IS XML MODEL" ); // Hack to get the attributes. String data = pi.getData(); if( data != null ) { final String attributes[] = pi.getData().trim().split( "\\s+" ); String type = parseNameValue( attributes[0] )[1]; String href = parseNameValue( attributes[1] )[1]; // TODO: Schema should = http://www.w3.org/2001/XMLSchema //String schema = attributes.getNamedItem( "schematypens" ); if( "application/xml".equalsIgnoreCase( type ) && href != null ) { result = href; } } } } else { // Try to get the schema type information. NamedNodeMap attrs = node.getAttributes(); if( attrs != null ) { // TypeInfo.toString() returns values of the form: // schemaLocation="uri schemaURI" // The following loop extracts the schema URI. for( int i = 0; i < attrs.getLength(); i++ ) { Attr attribute = (Attr)attrs.item( i ); TypeInfo typeInfo = attribute.getSchemaTypeInfo(); String attr[] = parseNameValue( typeInfo.toString() ); if( "schemaLocation".equalsIgnoreCase( attr[0] ) ) { result = attr[1].split( "\\s" )[1]; break; } } } // Look deeper for the schema URI. if( result == null ) { NodeList list = node.getChildNodes(); for( int i = 0; i < list.getLength(); i++ ) { result = getSchemaURI( list.item( i ) ); if( result != null ) { break; } } } } return result; } /** * Writes a message to standard output. */ private static void logDebug( String s ) { System.out.println( s ); } } 

error handling程序

这是人性化错误消息的代码; 另存为./src/DocumentErrorHandler.java

 package src; import java.io.PrintStream; import org.xml.sax.ErrorHandler; import org.xml.sax.SAXParseException; import org.xml.sax.SAXException; /** * Handles error messages during parsing and validating XML documents. */ public class DocumentErrorHandler implements ErrorHandler { private final static PrintStream OUTSTREAM = System.err; private void log( String type, SAXParseException e ) { OUTSTREAM.println( "SAX PARSE EXCEPTION " + type ); OUTSTREAM.println( " Public ID: " + e.getPublicId() ); OUTSTREAM.println( " System ID: " + e.getSystemId() ); OUTSTREAM.println( " Line : " + e.getLineNumber() ); OUTSTREAM.println( " Column : " + e.getColumnNumber() ); OUTSTREAM.println( " Message : " + e.getMessage() ); } @Override public void error( SAXParseException e ) throws SAXException { log( "ERROR", e ); } @Override public void fatalError( SAXParseException e ) throws SAXException { log( "FATAL ERROR", e ); } @Override public void warning( SAXParseException e ) throws SAXException { log( "WARNING", e ); } } 

目录文件

另存为./catalog.xml

            

XML数据

不同的测试用例包括处理指令或根节点中引用的XSD。

架构:处理指令

可以使用xml-model处理指令(PI)来提供xml-model 。 另存为./Tests/good-notes2.xml

     Shopping List 2014-08-30 headlight fluid, flamgrabblit, exhaust coil  

架构:根节点

可以在文档的根节点的属性中提供模式。 另存为./Tests/good-notes3.xml

    Shopping List 2014-08-30 Eggs, Milk, Carrots  

失败validation

以下应该validation失败(日期需要连字符); 另存为./Tests/bad-note1.xml

      Shopping List 20140830 headlight fluid, flamgrabblit, exhaust coil  

转型

保存为./Tests/note-to-html.xsl

      

任意文件夹

任意文件夹表示可以位于文件系统上任何位置的计算机上文件的路径。 这些文件的位置可能不同,例如,在生产,开发和存储库之间。

目录

将此文件另存为./ArbitraryFolder/catalog.xml

       

笔记

此示例中有两个文件用于转换注释:notes.xsl和note-body.xsl。 第一个包括第二个。

备注样式表

保存为./ArbitraryFolder/XSL/notes/notes.xsl

        A Note        

注意正文样式表

保存为./ArbitraryFolder/XSL/notes/note-body.xsl

    

架构

最后一个文件是架构; 将其保存为./schemas/notes/notes.xsd

             

建造

本节详细介绍了如何构建测试应用程序。

图书馆

您将需要Saxon 9(用于XSLT2.0文档),Xerces,Xalan和Resolver API:

 jaxen-1.1.6.jar resolver.jar saxon9he.jar serializer.jar xalan.jar xercesImpl.jar xml-apis.jar xsltc.jar 

脚本

另存为./build.sh

 #!/bin/bash javac -d bin -cp .:lib/* src/TestXSD.java 

另存为./run.sh

 #!/bin/bash java -cp .:bin:lib/* src.TestXSD Tests/note-to-html.xsl $1 

使用./build.sh编译代码。

运行输出

运行使用:

 ./run.sh filename.xml 

好的测试

测试好的笔记通过validation:

 ./run.sh Tests/good-note2.xml 

没有错误。

糟糕的考验

测试坏笔记的日期未通过validation:

 ./run.sh Tests/bad-note1.xml 

正如所料,这会产生所需的错误:

 Exception in thread "main" org.xml.sax.SAXParseException; cvc-datatype-valid.1.2.1: '20140830' is not a valid value for 'date'. at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) at org.apache.xerces.util.ErrorHandlerWrapper.error(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.xs.XMLSchemaValidator$XSIErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.xs.XMLSchemaValidator.reportSchemaError(Unknown Source) at org.apache.xerces.impl.xs.XMLSchemaValidator.elementLocallyValidType(Unknown Source) at org.apache.xerces.impl.xs.XMLSchemaValidator.processElementContent(Unknown Source) at org.apache.xerces.impl.xs.XMLSchemaValidator.handleEndElement(Unknown Source) at org.apache.xerces.impl.xs.XMLSchemaValidator.endElement(Unknown Source) at org.apache.xerces.jaxp.validation.DOMValidatorHelper.finishNode(Unknown Source) at org.apache.xerces.jaxp.validation.DOMValidatorHelper.validate(Unknown Source) at org.apache.xerces.jaxp.validation.DOMValidatorHelper.validate(Unknown Source) at org.apache.xerces.jaxp.validation.ValidatorImpl.validate(Unknown Source) at javax.xml.validation.Validator.validate(Validator.java:124) at src.TestXSD.main(TestXSD.java:103)