You are receiving this e-mail because you elected to receive e-mail from Sun Microsystems, Inc. To update your communications preferences, please see the link at the bottom of this message. We respect your privacy and post our privacy policy prominently on our Web site http://sun.com/privacy/

Please do not reply to the mailed version of the newsletter, this alias is not monitored. Feedback options are listed in the footer for both content and delivery issues.
  Welcome to the Enterprise Java Technologies Tech Tips.
Enterprise Java Technologies
TECHNICAL TIPS
October 25, 2005
View this issue as simple text
In this Issue
 
Here you'll get tips on using enterprise Java technologies and APIs, such as those in Java 2 Platform, Enterprise Edition (J2EE) and Java Platform, Enterprise Edition (Java EE).

This issue covers:

» The Schema Validation Framework
» More About the Sun Java Streaming XML Parser

These tips were developed using the Java 2, Enterprise Edition, v 1.4 SDK. You can download the SDK at http://java.sun.com/j2ee/1.4/download.html.

You can download the sample archive for the JAXP 1.3 tip.
You can download the sample archive for the Sun Java Streaming XML Parser tip.

Any use of this code and/or information below is subject to the license terms.

See the Subscribe/Unsubscribe note at the end of this newsletter to subscribe to Tech Tips that focus on technologies and products in other Java platforms.

For more Java technology content, visit these sites:

java.sun.com - The latest Java platform releases, tutorials, and newsletters.

java.net - A web forum where enthusiasts of Java technology can collaborate and build solutions together.

java.com - Hot games, cool apps -- Experience the power of Java technology.

THE SCHEMA VALIDATION FRAMEWORK
 

by Neeraj Bajaj

The Java API for XML Processing (JAXP) 1.3 was initially introduced in Java 2 Platform, Standard Edition (J2SE) 5.0 and is also now available in the Java Web Services Developer Pack (Java WSDP). JAXP 1.3 adds a new Schema Validation Framework (SVF), also called the Validation API, which offers advanced capabilities to efficiently validate XML against a schema. SVF also provides for much faster performance as compared to the schema validation approach in JAXP 1.2.

Before examining SVF, let's look at the earlier schema validation approach. Here's a code snippet that demonstrates that approach for SAX parsing:

   SAXParserFactory sf = SAXParserFactory.newInstance();
   sf.setNamespaceAware(true); 
   sf.setValidating(true);            
   SAXParser sp = sf.newSAXParser();
   sp.setProperty(
     SCHEMA_LANGUAGE, XMLConstants.W3C_XML_SCHEMA_NS_URI);
   sp.setProperty(SCHEMA_SOURCE, schema);
   sp.parse(new File(xml), dh);

The basic steps are:

  1. Create a SAXParserFactory object.
  2. Configure the SAXParserFactory object to produce parsers that support XML namespaces, and that validate documents as they are parsed.
  3. Create a SAX parser.
  4. Set SAX parser properties for the schema language and the schema source. In this example, the schema is the W3C XML Schema.
  5. Parse the document.

Notice that this process couples validation and XML processing.

By comparison, in the SVF approach, XML document validation against a schema is decoupled from XML processing. The first step in the SVP approach is to compile the schema:

   final String sl = XMLConstants.W3C_XML_SCHEMA_NS_URI;
   SchemaFactory factory = SchemaFactory.newInstance(sl);
   StreamSource ss = new StreamSource("mySchema.xsd");
   Schema schema = factory.newSchema(ss);

SchemaFactory is a schema compiler. It reads the given schema, checks the schema syntax and semantics according to the constraints imposed by the specified schema language, and returns a Schema object that is an immutable memory representation of the schema. Immutable here means that the set of constraints are not changed once the Schema object is created. An application that validates the same document twice against the same Schema object, must always produce the same result.

Next, you validate an XML document against the schema. There are three approaches to choose from depending upon your requirements:

  • Set the Schema instance on a DocumentBuilderFactory or SAXParserFactory
  • Create a Validator
  • Create a ValidatorHandler (to validate a SAX stream)

All three approaches guarantee that the XML document is validated only against the schema from which the Schema instance was obtained.

Lets look at the first approach, setting the Schema instance on a factory:

   SAXParserFactory spf = SAXParserFactory.newInstance();
   spf.setSchema(schema);
   SAXParser parser = spf.newSAXParser();
   parser.parse(<XML DOCUMENT>);

Here, the same Schema instance is passed to all the SAXParser instances created from this SAXParserFactory. The SAXParser object parses the XML document and simultaneously validates itagainst the Schema instance. Because the SAXParser does not repeatedly load the schema for every XML document that needs to be validated, this approach considerably improves the performance of the overall schema validation process. Compare this to the previous approach, where the specified schema is repeatedly loaded for every XML document that needs to be validated.

After you load a Schema object into memory, you can take the second approach, that is, use a Validator to validate an XML document against that Schema object. First you create a Validator object from the Schema object. Then you call the validate() method in the Validator object to do the validation:

   Validator v = schema.newValidator();
   v.validate(new StreamSource(xml));

The Validator object accepts java.xml.transform.Source as input. This means that it can accept either an event-based, SAX source (SAXSource) or an object-based, Document Object Model (DOM) source (DOMSource). By accepting DOMSource as input, the Validator is capable of validating an in-memory DOM Document or node against the given Schema object.

   Validator v = schema.newValidator();
   v.validate(new DOMSource(<DOM NODE>));

You might consider the Validator approach if your requirement is to validate a DOM node, or you are given a SAXSource. This approach works even if the implementation of the SAX driver is from a different vendor.

The third approach is to create a specially-designed javax.xml.validation.ValidatorHandler to validate SAX events:

   SAXParserFactory spf = SAXParserFactory.newInstance();
   spf.setNamespaceAware(true);
   XMLReader reader = spf.newSAXParser().getXMLReader();
   ValidatorHandler vh =  schema.newValidatorHandler();
   //key is to set "ValidatorHandler" as ContentHandler 
   //so that SAX event can be validated
   reader.setContentHandler(vh);
   reader.parse(xml);  

Notice that to validate the SAX events, you need to set the ValidatorHandler as the ContentHandler.

Using a ValidatorHandler, you can also validate a JDOM document against the schema. In fact, any object model (such as XOM and DOM4J) which can be built on top of a SAX stream or can produce SAX events can be used with the SVF to validate an XML document against a schema. This is possible because the ValidationHandler can validate a SAX stream. Here is a code snippet that illustrates how a JDOM document can be validated against a schema, it assumes that you obtained a ValidatorHandler as shown in the previous example:

   SAXOutputter so = new SAXOutputter(vh);
   so.output(jdomDocument);

The SAXOutputter object fires SAX events for the JDOM document. The SAX events are then validated by the ValidatorHandler.

There are other things you can do using the SVF, such as validate XML after transformation or obtain schema type information. For more information about using the SVF see the article Easy and Efficient XML Processing: Upgrade to JAXP 1.3

Running the Sample Code

A sample package accompanies this tip. The code in the sample package includes code examples and demonstrates the techniques covered in the tip. There are additional samples in this package. For example, one of the samples compares the schema validation performance using the new SVF and the previous approach of setting two schema properties. Another sample shows how to validate the output of a Transformer against a schema. To install and run the sample:

  1. Download the sample file and extract its contents. You should now see the newly extracted directory as <install_dir>\ValidationFramework. For example, if you extracted the contents to C:\ on a Windows machine, then your newly created directory should be at C:\ValidationFramework. The extracted contents includes a README file, which contains instructions to run the samples. You can run the samples using JAXP 1.3 in J2SE 5.0 or in Java WSDP 1.6. You can also download the standalone JAXP 1.3 implementation from the JAXP project page on java.net.

  2. Execute the ant targets in the ValidationFramework directory. To compile, execute the following command:
            ant compile
    
    In response, you should see something like this:
            Buildfile: build.xml
            
            init:
            
                [mkdir] Created dir: C:\ValidationFramework\build
                [mkdir] Created dir:
                C:\ValidationFramework\build\classes
                
            compile:
            
               [echo] C:\Program Files\Java\jdk1.5.0\jre
               ...
               
            BUILD SUCCESSFUL
    
    To run the samples, issue the ant command against the appropriate target, for example:
             ant ValidateSAXStream
    
    In response, you should see output that includes the following lines:
         [java] startElement: personnel
         [java] startElement: person
         [java] startElement: name
         [java] startElement: family
         [java] characters: Boss
         [java] endElement: family
         ...
         
         [java] startElement: email
         [java] characters: five@foo.com
         [java] endElement: email
         [java] startElement: link
         [java] endElement: link
         [java] endElement: person
         [java] endElement: personnel
         
         BUILD SUCCESSFUL
    
    If you run the samples using J2SE 5.0, override the 'endorsed' property to the location of the JAXP jars. For example:
         ant -Dendorsed=/space/jaxp/jaxp-1_3/dist/ Validate
    

About the Author

Neeraj Bajaj is a Member of Technical Staff in Web Technology and Standards at Sun Microsystems. He is the architect of the Sun Java Streaming XML Parser and the co-lead of the JAXP 1.4 specification. In addition to his contributions to StAX and JAXP 1.4, Neeraj has contributed to the development of JAXP 1.2 (JSR 60) and JAXP 1.3 (JSR 206).

Back to Top

MORE ABOUT THE SUN JAVA STREAMING XML PARSER
 

by Kim LiChong

The February 22, 2005 Tech Tip Introducing the Sun Java Streaming XML Parser outlined the differences between the Sun Java Streaming XML Parser (SJSXP) and two API libraries for working with XML: the Simple API for XML (SAX) and the Document Object Model (DOM) libraries. Briefly, SJSXP is an implementation of the Streaming API for XML parsing (StAX) -- JSR173. As an implementation of StAX, SJSXP enables XML infosets to be transmitted and parsed serially during an application's runtime. SJSXP allows you to "pull" nodes from the XML document rather than having them "pushed" from the parser to the application. Consequently, SJSXP is very fast. A whitepaper titled Streaming APIs for XML Parsers shows how the performance of SJSXP compares to other StAX implementations. StAX is part of JAXP 1.4. You can download a standalone JAXP 1.4 implementation from the jaxp project page.

SJSXP supports both of the APIs defined by StAX for XML processing: cursor and iterator. The previous Tech Tip on SJSXP included code examples that showed how to use the cursor API to parse and write XML documents. This Tech Tip focuses on the iterator API. It provides code examples that show how to use the iterator API, and general guidelines that describe when to use the iterator API.

Comparing the Cursor and Iterator APIs

The cursor API provides a low level representation of SJSXP. It uses a cursor to point to one infoset element from the beginning to the end of a document. The cursor always moves forward through the document. To get this cursor-based, forward-only access to XML, you use two interfaces: XMLStreamReader and XMLStreamWriter. Different methods in XMLStreamReader allow you to pull data from where the cursor is pointing. As mentioned in the earlier tip on SJSXP, there are a number of get methods in XMLInputStreamReader that you can use to obtain the contents of the XML item that the cursor is pointing to. For example the following method:

   public int getEventType()

returns an integer code that identifies the type of event that the parser found under the cursor. An example of an event is the start of an XML element or the end of the document. The following method:

   public String getText();  

gets text from the XML item that the cursor is pointing to.

All of the XML information retrieved is returned as strings, much like information retrieved by SAX. Each event is represented by an integer constant. For example, the constant for the start of an XML element is XMLStreamConstants.START_ELEMENT, the constant for the end of an XML element is XMLStreamConstants.END_ELEMENT. The application needs to call a relevant method to get the information about the respective event. For example:

   while(parser.hasNext()) {
             
         eventType = parser.next();
         switch (eventType) {

              case START_ELEMENT:
              //  Do something
              break;
              case END_ELEMENT:
              //  Do something
              break;
              //  And so on ...
         }
     }

Different methods in XMLStreamWriter allow you to write node information to an XML document. For example, XMLStreamWriter.writeStartElement writes a start tag, and XMLStreamWriter.writeCharacters writes text.

By comparison, the iterator API represents the XML document as a set of discrete object events that you pull in the order in which they are read. These event objects are immutable and persistent, and they encapsulate all the associated information about the particular event. There is some overhead in creating each event object, so this approach is not as efficient as using the cursor API.

As is the case for the cursor API, the iterator API has two APIs for reading and writing: XMLEventReader and XMLEventWriter. For parsing, you can access a node by calling the XMLEventReader.nextEvent() method. This method returns an XMLEvent object. The XMLEvent interface has 13 subinterfaces that represent the different event types:

  • StartDocument
  • StartElement
  • EndElement
  • Characters
  • Comment
  • EndDocument
  • Attribute
  • Namespace
  • DTD
  • EntityReference
  • ProcessingInstruction
  • EntityDeclaration
  • NotationDeclaration

Note that the DTD, EntityReference, ProcessingInstruction, EntityDeclaration, and NotationDeclaration events are only created if the document contains a DTD.

Using XMLEventReader

Let's look at some code examples that use XMLEventReader. In these examples the target XML document is named HockeyTeams.xml. Here's the content of HockeyTeams.xml:

   <HockeyTeams xmlns="http://www.myhockey.net">
     <Team>
     <City>Toronto</City>
     <Nickname>Maple Leafs</Nickname>
     <Coach>Pat Quinn</Coach>
     <Captain>Mats Sundin</Captain>
     <Wins year="2003">45</Wins>
     <MarketValue currency="USD">280</MarketValue>   
     </Team>
   </HockeyTeams>

You can use the following code to parse the XML document:

   URL url = Class.forName("MyClassName").getResource(
          "HockeyTeams.xml");  
     InputStream in = url.openStream();
     XMLInputFactory factory = XMLInputFactory.newInstance();
     XMLEventReader r = factory.createXMLEventReader(in);

You can iterate through the code with a construct like this:

    while(r.hasNext()) {
         XMLEvent e = r.nextEvent();
         System.out.println(e.toString());
    }

If you print each returned XMLEvent from HockeyTeams.xml, the output should look like this:

 <<['http://www.myhockey.net']::
   HockeyTeams xmlns:='http://www.myhockey.net'>
   <['http://www.myhockey.net']::Team>
   <['http://www.myhockey.net']::City>
   Toronto
   </['http://www.myhockey.net']::City>
   <['http://www.myhockey.net']::Nickname>
   Maple Leafs
   </['http://www.myhockey.net']::Nickname>      
   <['http://www.myhockey.net']::Coach>
   Pat Quinn
   </['http://www.myhockey.net']::Coach>    
   <['http://www.myhockey.net']::Captain>
   Mats Sundin
   </['http://www.myhockey.net']::Captain>    
   <['http://www.myhockey.net']::Wins year='2003'>
   45
   </['http://www.myhockey.net']::Wins>               
   <['http://www.myhockey.net']::MarketValue currency='USD'>
   280
   </['http://www.myhockey.net']::MarketValue>
   </['http://www.myhockey.net']::Team>
   </['http://www.myhockey.net']::HockeyTeams>
   ENDDOCUMENT

Each XMLEvent encapsulates all of the information about that particular event. You can use the getEventType() method to get an integer code that specifies the type of event. Then you can get specific information about the event, such as a particular subtype, like this:

   if(event.getEventType() == event.CHARACTERS) {
      Characters chars = event.asCharacters();
      System.out.println("chars " + chars.getData() );
   }

Or you get element names or attributes from the event, like this:

   if(event.getEventType() == event.START_ELEMENT) {
     StartElement startE = event.asStartElement();
     System.out.println("start" + startE.getName());
     Iterator it = startE.getAttributes();
      while (it.hasNext()) {
       System.out.println(" attributes " + it.next()); 
      }
    }

Note that each StartElement object has information about the node: the local name of the start tag, its prefix, namespace URI, attributes, and namespace declaration. Specifically, StartElement.getName() returns a QName, that is, a qualified name as defined in the XML specifications. You can query the QName object to get the local part, and namespace URI. Typically, you get the StartElement before accessing secondary events such as attributes or namespace. However, you can report standalone Attribute or Namespace events (that is, without first getting the StartElement). Namespace events are accessible from either the StartElement or the corresponding EndElement. The StartDocument object contains information that includes encoding, the XML version, and standalone properties.

Using XMLEventWriter

The XMLEventWriter interface is used for writing. It contains the method XMLEventWriter.add(XMLEvent) to add an event to the output stream. Here is a snippet of code that directs output to an XML document using XMLEventWriter.

   XMLEventFactory eventFactory = XMLEventFactory.newInstance();
   XMLOutputFactory output = XMLOutputFactory.newInstance();
   XMLEventWriter xmlwriter = 
     output.createXMLEventWriter(System.out);

Notice the difference between using the cursor and iterator APIs. In the cursor API, you use different methods in XMLStreamWriter to write attributes, characters, or elements (by sending the arguments as String objects). However, for XMLEventWriter in the iterator API, you must first create the objects as XMLEvents by using the utility factory class XMLEventFactory. Then you create XMLEvents specifying an output stream, and add the XMLEvents to the XMLEventWriter. In this example, the output stream is System.out.

In the first line of the following code example, the EventFactory creates an object that implements the StartDocument interface, which, in turn, extends the XMLEvent interface. The two arguments specify the encoding and XML version. The methods createStartElement() and createAttribute() are overloaded, so there are different ways to create an XMLEvent. For example, in the example below, the attributes and namespaces are added as Iterator objects.

   xmlwriter.add(eventFactory.createStartDocument("UTF-8","1.0");
   //for attributes
   Attribute att = eventFactory.createAttribute("year", "2003");
   ArrayList attArr = new ArrayList();
   attArr.add(att);
   //for namespaces
    Namespace namespace = 
      eventFactory.createNamespace("foo","http://www.foo.org");
    ArrayList nameArr = new ArrayList();
    nameArr.add(namespace);
    //order namespace, localname, prefix
    QName qname = new Qname (
            "http://www.foo.org","HockeyTeam","foo");

    //now create the start element
    xmlwriter.add(eventFactory.createStartElement(
            qname, attArr.iterator(), nameArr.iterator()));
    xmlwriter.add(eventFactory.createCharacters(
            "Los Angeles Kings"));
    xmlwriter.add(eventFactory.createEndElement(
            qname, nameArr.iterator()));
    xmlwriter.add(eventFactory.createEndDocument());
    xmlwriter.flush();
    xmlwriter.close();

Deciding When to Use the Cursor or Iterator API

The Streaming API for XML chapter in the Java Web Services Developer Pack 1.6 Tutorial lists considerations in deciding between the cursor or iterator API. Here is a summary of those considerations:

  • You can make smaller, faster, and more efficient code with the cursor API.
  • You can pass objects created from the XMLEvent subclasses in arrays, lists, and maps in your applications even after the parser has moved to subsequent events. That's because these objects are immutable.
  • You can create subtypes of XMLEvent that are either completely new information items or extensions of existing items, but with additional methods.
  • If you need to modify the event stream, handle pluggable processing of the event stream, or create XML processing pipelines, use the iterator API.
  • In general, if you do not have a strong preference and memory constraints and speed are not factors in your decision, use the iterator API. It's more flexible and extensible.

For more information on SJSXP, see Chapter 3; Streaming API for XML in the Java Web Services Developer Pack 1.6 Tutorial. Also see the Sun Java Streaming XML Parser release notes.

Running the Sample Code

A sample package accompanies this tip. The code in the sample package includes some (but not all) of the code examples in the tip, and demonstrates some of the techniques covered in the tip. To install and run the sample:

  1. If you haven't already done so, install Java Web Services Developer Pack (Java WSDP) 1.6 from the Java WSDP Downloads page.

  2. Download the sample file, and extract its contents to the $JWSDP_HOME/sjsxp/samples directory. You should now see the newly extracted directory as $JWSDP_HOME/sjsxp/samples/techtip. For example, if you install Java WSDP on a Windows machine at C:\Sun\JWSDP-1.6, then your newly created directory should be at C:\Sun\JWSDP-1.6\sjsxp\samples\techtip. It is important to extract this zip file to this location because there is a build.xml file for the sample code that depends on the build.xml file packaged with the Java WSDP bundle. The classpath and execute properties are already defined in the build.xml file so that you can compile and execute the appropriate targets.

  3. Execute the ant targets in the ${JWSDP_HOME}/sjsxp/samples/techtip directory. The ant binary is located in ${JWSDP_HOME}/apache-ant/bin directory.

  4. Run the following command to compile:
         ant compile
    
    In response, you should see something like this:
         compile:
             [mkdir] Created dir: 
             C:\Sun\jwsdp-1.6\sjsxp\samples\build\classes
             [javac] Compiling 8 source files to 
             C:\Sun\jwsdp-1.6\sjsxp\samples\build\classes
             ...
         
         BUILD SUCCESSFUL     
    
  5. Run the following command to execute the event parse example:
         ant techtip.Parse
    
    In response, you should see output that begins like this:
         techtip.Parse:
         [echo] Current directory is C:\Sun\jwsdp-1.6\sjsxp\samples
         [echo]  Running EventParseExample Sample.
         [java] XMLEvent is <?xml version="1.0" encoding='UTF-8' 
         [java] standalone='no'?>
         [java] Its corresponding EventType Integer is 7 : 
         [java] START_DOCUMENT
         [java] Line Number 1
         [java] ----------------------------------------
         [java] XMLEvent is <!--
         [java]     Document   : HockeyTeams.xml
         [java]     Created on : September 19, 2005, 5:11 PM
         [java]     Author     : Administrator
         [java]     Description:
         [java]         Purpose of the document follows.
         [java] -->
         [java] Its corresponding EventType Integer is 5 : COMMENT
         [java] Line Number 9
         [java] ---------------------------------------- 
    
  6. Run the following command to execute the event output example:
         ant techtip.Output   
    
    When you run the output example, it will generate a SampleOutput.xml file in the $JWSDP_HOME/sjsxp/samples/techtip directory. The content of the SampleOutput.xml file should look like this:
         <!--
         This document will be a simplified version of the input 
         xml File to parse
         -->
         -
            <foo:MyHockeyTeams>
         -
            <foo:HockeyTeam>
         <foo:City>San Jose</foo:City>
         <NickName>Sharks</NickName>
         <Coach>Daryl Sutter</Coach>
         testerspace
         <myxml:MarketValue year="2003">35</myxml:MarketValue>
         </foo:HockeyTeam>
         </foo:MyHockeyTeams>
    

NOTE: If you do not run these samples with the provided build.xml, include the sjsxp.jar and jsr173_api.jar that are located in the ${JWSDP_HOME}/sjsxp/lib directory. You will need these jar files to compile and run the provided code.

About the Author

Kim LiChong is a Member of Technical Staff in the Java Web Services Performance Engineering Group at Sun Microsystems. His interests include XML parsing performance, and is one of the authors of the StAX 1.0 White Paper: Streaming APIs for XML Parsers. He also is a co-owner of xmltest, a microbenchmark used to measure XML parsing performance. For more information about xmltest, see the xmltest project page.

Back to Top

Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
If you would like a reply to your comment, please submit your email address:
Note: We may not respond to all submitted comments.
Comments? Send your feedback on the Tech Tips: http://developers.sun.com/contact/feedback.jsp?category=newslet

Subscribe to the following newsletters for the latest information about technologies and products in other Java platforms:
  • Core Java Technologies Tech Tips. Get tips on using core Java technologies and APIs, such as those in the Java 2 Platform, Standard Edition (J2SE).
  • Wireless Developer Tech Tips. Get tips on using wireless Java technologies and APIs, such as those in the Java 2 Platform, Micro Edition (J2ME).
You can subscribe to these and other Java technology developer newsletters or manage your current newsletter subscriptions on the Sun Developer Network Subscriptions page

IMPORTANT: Please read our Terms of Use, Privacy, and Licensing policies:
http://www.sun.com/share/text/termsofuse.html
http://www.sun.com/privacy/
http://developer.java.sun.com/berkeley_license.html

ARCHIVES: You'll find the Enterprise Java Technologies Tech Tips archives at:
http://java.sun.com/developer/EJTechTips/index.html

© 2005 Sun Microsystems, Inc. All Rights Reserved. For information on Sun's trademarks see: http://sun.com/suntrademarks
Java, J2EE, J2SE, J2ME, and all Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.