Enterprise Java Technologies Tech Tips Tips, Techniques, and Sample Code Welcome to the Enterprise Java Technologies Tech Tips for October 25, 2005. Here you'll get tips on using enterprise Java technologies and APIs, such as those in Java 2 Platform, Enterprise Edition (J2EE). This issue covers: * The Schema Validation Framework * More About the Sun Java Streaming XML Parser These tips were developed using the Java 2, Enterprise Edition, v 1.4 SDK. You can download the SDK at http://java.sun.com/j2ee/1.4/download.html. You can view this issue of the Tech Tips on the Web at http://java.sun.com/developer/EJTechTips/2005/tt1025.html. You can download the sample archive for the JAXP 1.3 tip at http://java.sun.com/developer/EJTechTips/download/ttoct2005validationframework.zip. You can download the sample archive for the Sun Java Streaming XML Parser tip at http://java.sun.com/developer/EJTechTips/download/ttoct2005xmlparser.zip. Any use of this code and/or information below is subject to the license terms at http://developers.sun.com/dispatcher.jsp?uid=6910008. See the Subscribe/Unsubscribe note at the end of this newsletter to subscribe to Tech Tips that focus on technologies and products in other Java platforms. For more Java technology content, visit these sites: java.sun.com - The latest Java platform releases, tutorials, and newsletters. java.net - A web forum for collaborating and building solutions together. java.com - Hot games, cool apps -- Experience the power of Java technology. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - THE SCHEMA VALIDATION FRAMEWORK by Neeraj Bajaj The Java API for XML Processing (JAXP) 1.3 was initially introduced in Java 2 Platform, Standard Edition (J2SE) 5.0 and is also now available in the Java Web Services Developer Pack (Java WSDP) (http://java.sun.com/webservices/downloads/webservicespack.html). JAXP 1.3 adds a new Schema Validation Framework (SVF), also called the Validation API, which offers advanced capabilities to efficiently validate XML against a schema. SVF also provides for much faster performance as compared to the schema validation approach in JAXP 1.2. Before examining SVF, let's look at the earlier schema validation approach. Here's a code snippet that demonstrates that approach for SAX parsing: SAXParserFactory sf = SAXParserFactory.newInstance(); sf.setNamespaceAware(true); sf.setValidating(true); SAXParser sp = sf.newSAXParser(); sp.setProperty( SCHEMA_LANGUAGE, XMLConstants.W3C_XML_SCHEMA_NS_URI); sp.setProperty(SCHEMA_SOURCE, schema); sp.parse(new File(xml), dh); The basic steps are: 1. Create a SAXParserFactory object. 2. Configure the SAXParserFactory object to produce parsers that support XML namespaces, and that validate documents as they are parsed. 3. Create a SAX parser. 4. Set SAX parser properties for the schema language and the schema source. In this example, the schema is the W3C XML Schema. 5. Parse the document. Notice that this process couples validation and XML processing. By comparison, in the SVF approach, XML document validation against a schema is decoupled from XML processing. The first step in the SVP approach is to compile the schema: final String sl = XMLConstants.W3C_XML_SCHEMA_NS_URI; SchemaFactory factory = SchemaFactory.newInstance(sl); StreamSource ss = new StreamSource("mySchema.xsd"); Schema schema = factory.newSchema(ss); SchemaFactory is a schema compiler. It reads the given schema, checks the schema syntax and semantics according to the constraints imposed by the specified schema language, and returns a Schema object that is an immutable memory representation of the schema. Immutable here means that the set of constraints are not changed once the Schema object is created. An application that validates the same document twice against the same Schema object, must always produce the same result. Next, you validate an XML document against the schema. There are three approaches to choose from depending upon your requirements: o Set the Schema instance on a DocumentBuilderFactory or SAXParserFactory o Create a Validator o Create a ValidatorHandler (to validate a SAX stream) All three approaches guarantee that the XML document is validated only against the schema from which the Schema instance was obtained. Lets look at the first approach, setting the Schema instance on a factory: SAXParserFactory spf = SAXParserFactory.newInstance(); spf.setSchema(schema); SAXParser parser = spf.newSAXParser(); parser.parse(); Here, the same Schema instance is passed to all the SAXParser instances created from this SAXParserFactory. The SAXParser object parses the XML document and simultaneously validates it against the Schema instance. Because the SAXParser does not repeatedly load the schema for every XML document that needs to be validated, this approach considerably improves the performance of the overall schema validation process. Compare this to the previous approach, where the specified schema is repeatedly loaded for every XML document that needs to be validated. After you load a Schema object into memory, you can take the second approach, that is, use a Validator to validate an XML document against that Schema object. First you create a Validator object from the Schema object. Then you call the validate() method in the Validator object to do the validation: Validator v = schema.newValidator(); v.validate(new StreamSource(xml)); The Validator object accepts java.xml.transform.Source as input. This means that it can accept either an event-based, SAX source (SAXSource) or an object-based, Document Object Model (DOM) source (DOMSource). By accepting DOMSource as input, the Validator is capable of validating an in-memory DOM Document or node against the given Schema object. Validator v = schema.newValidator(); v.validate(new DOMSource()); You might consider the Validator approach if your requirement is to validate a DOM node, or you are given a SAXSource. This approach works even if the implementation of the SAX driver is from a different vendor. The third approach is to create a specially-designed javax.xml.validation.ValidatorHandler to validate SAX events: SAXParserFactory spf = SAXParserFactory.newInstance(); spf.setNamespaceAware(true); XMLReader reader = spf.newSAXParser().getXMLReader(); ValidatorHandler vh = schema.newValidatorHandler(); //key is to set "ValidatorHandler" as ContentHandler //so that SAX event can be validated reader.setContentHandler(vh); reader.parse(xml); Notice that to validate the SAX events, you need to set the ValidatorHandler as the ContentHandler. Using a ValidatorHandler, you can also validate a JDOM document against the schema. In fact, any object model (such as XOM and DOM4J) which can be built on top of a SAX stream or can produce SAX events can be used with the SVF to validate an XML document against a schema. This is possible because the ValidationHandler can validate a SAX stream. Here is a code snippet that illustrates how a JDOM document can be validated against a schema, it assumes that you obtained a ValidatorHandler as shown in the previous example: SAXOutputter so = new SAXOutputter(vh); so.output(jdomDocument); The SAXOutputter object fires SAX events for the JDOM document. The SAX events are then validated by the ValidatorHandler. There are other things you can do using the SVF, such as validate XML after transformation or obtain schema type information. For more information about using the SVF see the article "Easy and Efficient XML Processing: Upgrade to JAXP 1.3" (http://java.sun.com/developer/technicalArticles/xml/jaxp1-3/). Running the Sample Code A sample package accompanies this tip. The code in the sample package includes code examples and demonstrates the techniques covered in the tip. There are additional samples in this package. For example, one of the samples compares the schema validation performance using the new SVF and the previous approach of setting two schema properties. Another sample shows how to validate the output of a Transformer against a schema. To install and run the sample: 1. Download the sample file (http://java.sun.com/developer/EJTechTips/download/ttoct2005validationframework.zip) and extract its contents. You should now see the newly extracted directory as \ValidationFramework. For example, if you extracted the contents to C:\ on a Windows machine, then your newly created directory should be at C:\ValidationFramework. The extracted contents includes a README file, which contains instructions to run the samples. You can run the samples using JAXP 1.3 in J2SE 5.0 or in Java WSDP 1.6. You can also download the standalone JAXP 1.3 implementation (https://jaxp.dev.java.net/binaryDrops.html) from the JAXP project page on java.net . 2. Execute the ant targets in the ValidationFramework directory. To compile, execute the following command: ant compile In response, you should see something like this: Buildfile: build.xml init: [mkdir] Created dir: C:\ValidationFramework\build [mkdir] Created dir: C:\ValidationFramework\build\classes compile: [echo] C:\Program Files\Java\jdk1.5.0\jre ... BUILD SUCCESSFUL To run the samples, issue the ant command against the appropriate target, for example: ant ValidateSAXStream In response, you should see output that includes the following lines: [java] startElement: personnel [java] startElement: person [java] startElement: name [java] startElement: family [java] characters: Boss [java] endElement: family ... [java] startElement: email [java] characters: five@foo.com [java] endElement: email [java] startElement: link [java] endElement: link [java] endElement: person [java] endElement: personnel BUILD SUCCESSFUL If you run the samples using J2SE 5.0, override the 'endorsed' property to the location of the JAXP jars. For example: ant -Dendorsed=/space/jaxp/jaxp-1_3/dist/ Validate About the Author Neeraj Bajaj is a Member of Technical Staff in Web Technology and Standards at Sun Microsystems. He is the architect of the Sun Java Streaming XML Parser and the co-lead of the JAXP 1.4 specification. In addition to his contributions to StAX and JAXP 1.4, Neeraj has contributed to the development of JAXP 1.2 (JSR 60) and JAXP 1.3 (JSR 206). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - MORE ABOUT THE SUN JAVA STREAMING XML PARSER by Kim LiChong The February 22, 2005 Tech Tip "Introducing the Sun Java Streaming XML Parser" (http://java.sun.com/developer/EJTechTips/2005/tt0222.html#2) outlined the differences between the Sun Java Streaming XML Parser (SJSXP) and two API libraries for working with XML: the Simple API for XML (SAX) and the Document Object Model (DOM) libraries. Briefly, SJSXP is an implementation of the Streaming API for XML parsing (StAX) -- JSR173. As an implementation of StAX, SJSXP enables XML infosets to be transmitted and parsed serially during an application's runtime. SJSXP allows you to "pull" nodes from the XML document rather than having them "pushed" from the parser to the application. Consequently, SJSXP is very fast. A whitepaper titled "Streaming APIs for XML Parsers" (http://java.sun.com/performance/reference/whitepapers/StAX-1_0.pdf) shows how the performance of SJSXP compares to other StAX implementations. StAX is part of JAXP 1.4. You can download a standalone JAXP 1.4 implementation from the jaxp project page (https://jaxp.dev.java.net). SJSXP supports both of the APIs defined by StAX for XML processing: cursor and iterator. The previous Tech Tip on SJSXP included code examples that showed how to use the cursor API to parse and write XML documents. This Tech Tip focuses on the iterator API. It provides code examples that show how to use the iterator API, and general guidelines that describe when to use the iterator API. Comparing the Cursor and Iterator APIs The cursor API provides a low level representation of SJSXP. It uses a cursor to point to one infoset element from the beginning to the end of a document. The cursor always moves forward through the document. To get this cursor-based, forward-only access to XML, you use two interfaces: XMLStreamReader and XMLStreamWriter. Different methods in XMLStreamReader allow you to pull data from where the cursor is pointing. As mentioned in the earlier tip on SJSXP, there are a number of get methods in XMLInputStreamReader that you can use to obtain the contents of the XML item that the cursor is pointing to. For example the following method: public int getEventType() returns an integer code that identifies the type of event that the parser found under the cursor. An example of an event is the start of an XML element or the end of the document. The following method: public String getText(); gets text from the XML item that the cursor is pointing to. All of the XML information retrieved is returned as strings, much like information retrieved by SAX. Each event is represented by an integer constant. For example, the constant for the start of an XML element is XMLStreamConstants.START_ELEMENT, the constant for the end of an XML element is XMLStreamConstants.END_ELEMENT. The application needs to call a relevant method to get the information about the respective event. For example: while(parser.hasNext()) { eventType = parser.next(); switch (eventType) { case START_ELEMENT: // Do something break; case END_ELEMENT: // Do something break; // And so on ... } } Different methods in XMLStreamWriter allow you to write node information to an XML document. For example, XMLStreamWriter.writeStartElement writes a start tag, and XMLStreamWriter.writeCharacters writes text. By comparison, the iterator API represents the XML document as a set of discrete object events that you pull in the order in which they are read. These event objects are immutable and persistent, and they encapsulate all the associated information about the particular event. There is some overhead in creating each event object, so this approach is not as efficient as using the cursor API. As is the case for the cursor API, the iterator API has two APIs for reading and writing: XMLEventReader and XMLEventWriter. For parsing, you can access a node by calling the XMLEventReader.nextEvent() method. This method returns an XMLEvent object. The XMLEvent interface has 13 subinterfaces that represent the different event types: o StartDocument o StartElement o EndElement o Characters o Comment o EndDocument o Attribute o Namespace o DTD o EntityReference o ProcessingInstruction o EntityDeclaration o NotationDeclaration Note that the DTD, EntityReference, ProcessingInstruction, EntityDeclaration, and NotationDeclaration events are only created if the document contains a DTD. Using XMLEventReader Let's look at some code examples that use XMLEventReader. In these examples the target XML document is named HockeyTeams.xml. Here's the content of HockeyTeams.xml: Toronto Maple Leafs Pat Quinn Mats Sundin 45 280 You can use the following code to parse the XML document: URL url = Class.forName("MyClassName").getResource( "HockeyTeams.xml"); InputStream in = url.openStream(); XMLInputFactory factory = XMLInputFactory.newInstance(); XMLEventReader r = factory.createXMLEventReader(in); You can iterate through the code with a construct like this: while(r.hasNext()) { XMLEvent e = r.nextEvent(); System.out.println(e.toString()); } If you print each returned XMLEvent from HockeyTeams.xml, the output should look like this: <<['http://www.myhockey.net']:: HockeyTeams xmlns:='http://www.myhockey.net'> <['http://www.myhockey.net']::Team> <['http://www.myhockey.net']::City> Toronto <['http://www.myhockey.net']::Nickname> Maple Leafs <['http://www.myhockey.net']::Coach> Pat Quinn <['http://www.myhockey.net']::Captain> Mats Sundin <['http://www.myhockey.net']::Wins year='2003'> 45 <['http://www.myhockey.net']::MarketValue currency='USD'> 280 ENDDOCUMENT Each XMLEvent encapsulates all of the information about that particular event. You can use the getEventType() method to get an integer code that specifies the type of event. Then you can get specific information about the event, such as a particular subtype, like this: if(event.getEventType() == event.CHARACTERS) { Characters chars = event.asCharacters(); System.out.println("chars " + chars.getData() ); } Or you get element names or attributes from the event, like this: if(event.getEventType() == event.START_ELEMENT) { StartElement startE = event.asStartElement(); System.out.println("start" + startE.getName()); Iterator it = startE.getAttributes(); while (it.hasNext()) { System.out.println(" attributes " + it.next()); } } Note that each StartElement object has information about the node: the local name of the start tag, its prefix, namespace URI, attributes, and namespace declaration. Specifically, StartElement.getName() returns a QName, that is, a qualified name as defined in the XML specifications. You can query the QName object to get the local part, and namespace URI. Typically, you get the StartElement before accessing secondary events such as attributes or namespace. However, you can report standalone Attribute or Namespace events (that is, without first getting the StartElement). Namespace events are accessible from either the StartElement or the corresponding EndElement. The StartDocument object contains information that includes encoding, the XML version, and standalone properties. Using XMLEventWriter The XMLEventWriter interface is used for writing. It contains the method XMLEventWriter.add(XMLEvent) to add an event to the output stream. Here is a snippet of code that directs output to an XML document using XMLEventWriter. XMLEventFactory eventFactory = XMLEventFactory.newInstance(); XMLOutputFactory output = XMLOutputFactory.newInstance(); XMLEventWriter xmlwriter = output.createXMLEventWriter(System.out); Notice the difference between using the cursor and iterator APIs. In the cursor API, you use different methods in XMLStreamWriter to write attributes, characters, or elements (by sending the arguments as String objects). However, for XMLEventWriter in the iterator API, you must first create the objects as XMLEvents by using the utility factory class XMLEventFactory. Then you create XMLEvents specifying an output stream, and add the XMLEvents to the XMLEventWriter. In this example, the output stream is System.out. In the first line of the following code example, the EventFactory creates an object that implements the StartDocument interface, which, in turn, extends the XMLEvent interface. The two arguments specify the encoding and XML version. The methods createStartElement() and createAttribute() are overloaded, so there are different ways to create an XMLEvent. For example, in the example below, the attributes and namespaces are added as Iterator objects. xmlwriter.add(eventFactory.createStartDocument("UTF-8","1.0"); //for attributes Attribute att = eventFactory.createAttribute("year", "2003"); ArrayList attArr = new ArrayList(); attArr.add(att); //for namespaces Namespace namespace = eventFactory.createNamespace("foo","http://www.foo.org"); ArrayList nameArr = new ArrayList(); nameArr.add(namespace); //order namespace, localname, prefix QName qname = new Qname ( "http://www.foo.org","HockeyTeam","foo"); //now create the start element xmlwriter.add(eventFactory.createStartElement( qname, attArr.iterator(), nameArr.iterator())); xmlwriter.add(eventFactory.createCharacters( "Los Angeles Kings")); xmlwriter.add(eventFactory.createEndElement( qname, nameArr.iterator())); xmlwriter.add(eventFactory.createEndDocument()); xmlwriter.flush(); xmlwriter.close(); Deciding When to Use the Cursor or Iterator API The Streaming API for XML chapter in the Java Web Services Developer Pack 1.6 Tutorial (http://java.sun.com/webservices/tutorial.html) lists considerations in deciding between the cursor or iterator API. Here is a summary of those considerations: o You can make smaller, faster, and more efficient code with the cursor API. o You can pass objects created from the XMLEvent subclasses in arrays, lists, and maps in your applications even after the parser has moved to subsequent events. That's because these objects are immutable. o You can create subtypes of XMLEvent that are either completely new information items or extensions of existing items, but with additional methods. o If you need to modify the event stream, handle pluggable processing of the event stream, or create XML processing pipelines, use the iterator API. o In general, if you do not have a strong preference and memory constraints and speed are not factors in your decision, use the iterator API. It's more flexible and extensible. For more information on SJSXP, see Chapter 3; "Streaming API for XML" in the Java Web Services Developer Pack 1.6 Tutorial (http://java.sun.com/webservices/docs/1.6/tutorial/doc/SJSXP.html#wp69937). Also see the Sun Java Streaming XML Parser release notes (http://java.sun.com/webservices/docs/1.6/sjsxp/ReleaseNotes.html). Running the Sample Code A sample package accompanies this tip. The code in the sample package includes some (but not all) of the code examples in the tip, and demonstrates some of the techniques covered in the tip. To install and run the sample: 1. If you haven't already done so, install Java Web Services Developer Pack (Java WSDP) 1.6 from the Java WSDP Downloads page (http://java.sun.com/webservices/downloads/webservicespack.html). 2. Download the sample file (http://java.sun.com/developer/EJTechTips/download/techtip.zip), and extract its contents to the $JWSDP_HOME/sjsxp/samples directory. You should now see the newly extracted directory as $JWSDP_HOME/sjsxp/samples/techtip. For example, if you install Java WSDP on a Windows machine at C:\Sun\JWSDP-1.6, then your newly created directory should be at C:\Sun\JWSDP-1.6\sjsxp\samples\techtip. It is important to extract this zip file to this location because there is a build.xml file for the sample code that depends on the build.xml file packaged with the Java WSDP bundle. The classpath and execute properties are already defined in the build.xml file so that you can compile and execute the appropriate targets. 3. Execute the ant targets in the ${JWSDP_HOME}/sjsxp/samples/techtip directory. The ant binary is located in ${JWSDP_HOME}/apache-ant/bin directory. 4. Run the following command to compile: ant compile In response, you should see something like this: compile: [mkdir] Created dir: C:\Sun\jwsdp-1.6\sjsxp\samples\build\classes [javac] Compiling 8 source files to C:\Sun\jwsdp-1.6\sjsxp\samples\build\classes ... BUILD SUCCESSFUL 5. Run the following command to execute the event parse example: ant techtip.Parse In response, you should see output that begins like this: techtip.Parse: [echo] Current directory is C:\Sun\jwsdp-1.6\sjsxp\samples [echo] Running EventParseExample Sample. [java] XMLEvent is [java] Its corresponding EventType Integer is 7 : [java] START_DOCUMENT [java] Line Number 1 [java] ---------------------------------------- [java] XMLEvent is [java] Its corresponding EventType Integer is 5 : COMMENT [java] Line Number 9 [java] ---------------------------------------- 6. Run the following command to execute the event output example: ant techtip.Output When you run the output example, it will generate a SampleOutput.xml file in the $JWSDP_HOME/sjsxp/samples/techtip directory. The content of the SampleOutput.xml file should look like this: - - San Jose Sharks Daryl Sutter testerspace 35 NOTE: If you do not run these samples with the provided build.xml, include the sjsxp.jar and jsr173_api.jar that are located in the ${JWSDP_HOME}/sjsxp/lib directory. You will need these jar files to compile and run the provided code. About the Author Kim LiChong is a Member of Technical Staff in the Java Web Services Performance Engineering Group at Sun Microsystems. His interests include XML parsing performance, and is one of the authors of the StAX 1.0 White Paper: Streaming APIs for XML Parsers (http://java.sun.com/performance/reference/whitepapers/StAX-1_0.pdf). He also is a co-owner of xmltest, a microbenchmark used to measure XML parsing performance. For more information about xmltest, see the xmltest project page (https://xmltest.dev.java.net/). . . . . . . . . . . . . . . . . . . . . . . . Please read our Terms of Use and Licensing policies: http://www.sun.com/share/text/termsofuse.html http://developers.sun.com/dispatcher.jsp?uid=6910008 PRIVACY STATEMENT: Sun respects your online time and privacy (http://sun.com/privacy). You have received this based on your e-mail preferences. If you would prefer not to receive this information, please follow the steps at the bottom of this message to unsubscribe. * FEEDBACK Comments? Send your feedback on the Enterprise Java Technologies Tech Tips to: http://developers.sun.com/contact/feedback.jsp?category=sdn * SUBSCRIBE/UNSUBSCRIBE Subscribe to other Java developer Tech Tips: - Core Java Technologies Tech Tips. Get tips on using core Java technologies and APIs, such as those in the Java 2 Platform, Standard Edition (J2SE). - Wireless Developer Tech Tips. Get tips on using wireless Java technologies and APIs, such as those in the Java 2 Platform, Micro Edition (J2ME). To subscribe to these and other JDC publications: - Go to the Sun Developer Network - Subscriptions page, (https://softwarereg.sun.com/registration/developer/en_US/subscriptions), choose the newsletters you want to subscribe to and click "Submit". - To unsubscribe, go to the Subscriptions page, (https://softwarereg.sun.com/registration/developer/en_US/subscriptions), uncheck the appropriate checkbox, and click "Submit". - To use our one-click unsubscribe facility, see the link at the end of this email: - ARCHIVES You'll find the Enterprise Java Technologies Tech Tips archives at: http://java.sun.com/developer/EJTechTips/index.html - COPYRIGHT Copyright 2005 Sun Microsystems, Inc. All rights reserved. 901 San Antonio Road, Palo Alto, California 94303 USA. This document is protected by copyright. For more information, see: http://java.sun.com/developer/copyright.html Enterprise Java Technologies Tech Tips October 25, 2005 Trademark Information: http://www.sun.com/suntrademarks/ Java, J2SE, J2EE, J2ME, and all Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.