|
Articles Index
By Robert Hustead; Reprinted from JavaWorld
September 2000
ML is hot. Because XML is a form of
self-describing data, it can be used to encode rich data models. It's easy to see XML's utility as a
data exchange medium between very dissimilar systems. Data can be easily exposed or published as XML
from all kinds of systems: legacy COBOL programs, databases, C++ programs, and so on.
However, using XML to build systems poses two challenges. First, while generating XML is a
straightforward procedure, the inverse operation, using XML data from within a program, is not.
Second, current XML technologies are easy to misapply, which can leave a programmer with a slow,
memory-hungry system. Indeed, heavy memory requirements and slow speeds can prove problematic for
systems that use XML as their primary data exchange format.
Some standard tools currently available for working with XML are better than others. The SAX API in
particular has some important runtime features for performance-sensitive code. In this article, we
will develop some patterns for applying the SAX API. You will be able to create fast XML-to-Java
mapping code with a minimum memory footprint, even for fairly complex XML structures (with the
exception of recursive structures).
In Part 2 of this series, we will cover applying the SAX API to recursive XML structures in which
some of the XML elements represent lists of lists. We will also develop a class library that manages
the navigational aspects of the SAX API. This library simplifies writing XML mapping code based on
SAX.
Mapping Code is Similar to Compiling Code
Writing programs that use XML data is like writing a compiler. That is, most compilers convert
source code into a runnable program in three steps. First, a lexer module groups characters
into words or tokens that the compiler recognizes -- a process known as tokenizing. A second module,
called the parser, analyzes groups of tokens in order to recognize legal language
constructs. Last, a third module, the code generator, takes a set of legal language
constructs and generates executable code. Sometimes, parsing and code generation are intermixed.
To use XML data in a Java program, we must undergo a similar process. First, we analyze every
character in the XML text in order to recognize legal XML tokens such as start tags, attributes, end
tags, and CDATA sections.
Second, we verify that the tokens form legal XML constructs. If an XML document consists entirely of
legal constructs per the XML 1.0 specification, it is well-formed. At the most basic level,
we need to make sure that, for instance, all of the tagging has matching opening and closing tags,
and the attributes are properly structured in the opening tag.
Also, if a DTD is available, we have the option to make sure that the XML constructs found during
parsing are legal in terms of the DTD, as well as being well-formed XML.
Finally, we use the data contained in the XML document to accomplish something useful -- I call this
mapping XML into Java.
XML Parsers
Fortunately, there are off-the-shelf components -- XML parsers -- that perform some of these
compiler-related tasks for us. XML parsers handle all lexical analysis and parsing tasks for us.
Many currently available Java-based XML parsers support two popular parsing standards: the SAX and
DOM APIs.
The availability of an off-the-shelf XML parser may make it seem that the hard part of using XML in
Java has been done for you. In reality, applying an off-the-shelf XML parser is an involved task.
SAX and DOM APIs
The SAX API is event-based. XML parsers that implement the SAX API generate events that correspond
to different features found in the parsed XML document. By responding to this stream of SAX events
in Java code, you can write programs driven by XML-based data.
The DOM API is an object-model-based API. XML parsers that implement DOM create a generic object
model in memory that represents the contents of the XML
document. Once the XML parser has completed parsing, the memory contains a tree of DOM objects that
offers information about both the structure and contents of the XML document.
The DOM concept grew out of the HTML browser world, where a common document object model represents
the HTML document loaded in the browser. This HTML DOM then becomes available for scripting
languages like JavaScript. HTML DOM has been very successful in this application.
Dangers of DOM
At first glance, the DOM API seems to be more feature-rich, and therefore better, than the SAX API.
However, DOM has serious efficiency problems that can hurt performance-sensitive applications.
The current group of XML parsers that support DOM implement the in-memory object model by creating
many tiny objects that represent DOM nodes containing either text or other DOM nodes. This sounds
natural enough, but has negative performance implications. One of the most expensive operations in
Java is the new operator. Correspondingly, for every new operator executed
in Java, the JVM garbage collector must eventually remove the object from memory when no references
to the object remain. The DOM API tends to really thrash the JVM memory system with its many small
objects, which are typically tossed aside soon after parsing.
Another DOM issue is the fact that it loads the entire XML document into memory. For large
documents, this becomes a problem. Again, since the DOM is implemented as many tiny objects, the
memory footprint is even larger than the XML document itself because the JVM stores a few extra
bytes of information regarding all of these objects, as well as the contents of the XML document.
It is also troubling that many Java programs don't actually use DOM's generic object structure.
Instead, as soon as the DOM structure loads in memory, they copy the data into an object model
specific to a particular problem domain -- a subtle yet wasteful process.
Another subtle issue for the DOM API is that code written for it must scan the XML document twice.
The first pass creates the DOM structure in memory, the second locates all XML data the program is
interested in. Certain coding styles may traverse the DOM structure several additional times while
locating different pieces of XML data. By contrast, SAX's coding style encourages locating and
collecting XML data in a single pass.
Some of these issues could be addressed with a better underlying data-structure design to internally
represent the DOM object model. Issues such as encouraging multiple processing passes and
translating between generic and specific object models cannot be addressed within the XML parsers.
SAX for Survival
Compared to the DOM API, the SAX API is an attractive approach. SAX doesn't have a generic object
model, so it doesn't have the memory or performance problems associated with abusing the
new operator. And with SAX, there is no generic object model to ignore if you plan to
use a specific problem-domain object model instead. Moreover, since SAX processes the XML document
in a single pass, it requires much less processing time.
SAX does have a few drawbacks, but they are mostly related to the programmer, not the runtime
performance of the API. Let's look at a few.
The first drawback is conceptual. Programmers are accustomed to navigating
to get data; to find a file on a file server, you navigate by changing directories. Similarly, to
get data from a database, you write an SQL query for the data you need. With SAX, this model is
inverted. That is, you set up code that listens to the list of every available piece of XML data
available. That code activates only when interesting XML data are being listed. At first, the SAX
API seems odd, but after a while, thinking in this inverted way becomes second nature.
The second drawback is more dangerous. With SAX code, the naive "let's take a
hack at it" approach will backfire fairly quickly, because the SAX parser exhaustively navigates the
XML structure while simultaneously supplying the data stored in the XML document. Most people focus
on the data-mapping aspect and neglect the navigational aspect. If you don't directly address the
navigational aspect of SAX parsing, the code that keeps track of the location within the XML
structure during SAX parsing will become spread out and have many subtle interactions. This problem
is similar to those associated with overdependence on global variables. But if you learn to properly
structure SAX code to keep it from becoming unwieldy, it is more straightforward than using the DOM
API.
Basic SAX
There are currently two published versions of the SAX API. We'll use version 2 (see Resources) for our examples. Version 2 uses different class and method names
than version 1, but the structure of the code is the same.
SAX is an API, not a parser, so this code is generic across XML parsers. To get the examples to run,
you will need to access an XML parser that supports SAX v2. I use Apache's Xerces parser. (See Resources.) Review your parser's getting-started guide for specifics on
invoking a SAX parser.
The SAX API specification is pretty straightforward. In includes many details, but its primary task
is to create a class that implements the ContentHandler interface, a callback interface
used by XML parsers to notify your program of SAX events as they are found in the XML document.
The SAX API also conveniently supplies a DefaultHandler implementation class for the
ContentHandler interface.
Once you've implemented the ContentHandler or extended the DefaultHandler,
you need only direct the XML parser to parse a particular document.
Our first example extends the DefaultHandler to print each
SAX event to the console. This will give you a feel for what SAX events will be generated and in
what order.
To get started, here's the sample XML document we will use in our first example:
<?xml version="1.0"?>
<simple date="7/7/2000" >
<name> Bob </name>
<location> New York </location>
</simple>
Next, we see the source code for XML mapping code of the first example:
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
public class Example1 extends DefaultHandler {
// Override methods of the DefaultHandler class
// to gain notification of SAX Events.
//
// See org.xml.sax.ContentHandler for all available
events.
//
public void startDocument( ) throws SAXException {
System.out.println( "SAX Event: START DOCUMENT" );
}
public void endDocument( ) throws SAXException {
System.out.println( "SAX Event: END DOCUMENT" );
}
public void startElement( String namespaceURI,
String
localName,
String
qName,
Attributes attr
) throws SAXException {
System.out.println( "SAX Event: START ELEMENT[
" +
&nbs
p; localName + " ]" );
// Also, let's print the attributes if
// there are any...
for
( int i = 0; i < attr.getLength(); i++ ){
&nbs
p; System.out.println( " ATTRIBUTE: " +
&nbs
p; attr.getLocalName(i) +
&nbs
p; " VALUE: " +
&nbs
p; attr.getValue(i) );
}
}
public void endElement( String namespaceURI,
String
localName,
String qName )
throws SAXException {
System.out.println( "SAX Event: END ELEMENT[ " +
&nbs
p; localName + " ]" );
}
public void characters( char[] ch, int start, int length )
&nbs
p; throws SAXException {
System.out.print( "SAX Event: CHARACTERS[ " );
try {
OutputStreamWriter outw = new
OutputStreamWriter(System.out);
outw.write( ch, start,length );
outw.flush();
} catch (Exception e) {
e.printStackTrace();
}
System.out.println( " ]" );
}
public static void main( String[] argv ){
System.out.println( "Example1 SAX Events:" );
try {
// Create SAX 2 parser...
XMLReader xr =
XMLReaderFactory.createXMLReader();
// Set the ContentHandler...
xr.setContentHandler( new Example1() );
// Parse the file...
xr.parse( new InputSource(
new
FileReader( "Example1.xml" )) );
}catch ( Exception e ) {
e.printStackTrace();
}
}
}
Finally, here is the output generated by running the first example with our sample XML document:
Example1 SAX Events:
SAX Event: START DOCUMENT
SAX Event: START ELEMENT[ simple ]
ATTRIBUTE: date VALUE: 7/7/2000
SAX Event: CHARACTERS[
]
SAX Event: START ELEMENT[ name ]
SAX Event: CHARACTERS[ Bob ]
SAX Event: END ELEMENT[ name ]
SAX Event: CHARACTERS[
]
SAX Event: START ELEMENT[ location ]
SAX Event: CHARACTERS[ New York ]
SAX Event: END ELEMENT[ location ]
SAX Event: CHARACTERS[
]
SAX Event: END ELEMENT[ simple ]
SAX Event: END DOCUMENT
As you can see, the SAX parser will call the appropriate
ContentHandler method for every SAX event it discovers in the XML document.
Hello World
Now that we understand the basic pattern of SAX, we can start to do something slightly useful:
extract values from our simple XML document and demonstrate the classic hello world program.
First, for each element we are interested in mapping to Java, we will reset our collection buffer in
the startElement SAX event handler. Then, when startElement for a tag has
occurred, but endELement has not, we will collect the characters presented by the
characters SAX event. Finally, when the endElement for the tag has
occurred, we will store the collected characters in the appropriate field of a Java object.
Below you'll find the sample data for our hello world example:
<?xml version="1.0"?>
<simple date="7/7/2000" >
<name> Bob </name>
<location> New York </location>
</simple>
Here's the source listing for the XML mapping code of the hello world example:
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
public class Example2 extends DefaultHandler {
// Local variables to store data
// found in the XML document
public String name = "";
public String location = "";
// Buffer for collecting data from // the "characters" SAX event.
private CharArrayWriter contents = new CharArrayWriter();
// Override methods of the DefaultHandler class
// to gain notification of SAX Events.
//
// See org.xml.sax.ContentHandler for all available
events.
//
public void startElement( String namespaceURI,
String
localName,
String
qName,
Attributes attr
) throws SAXException {
contents.reset();
}
public void endElement( String namespaceURI,
String
localName,
String qName )
throws SAXException {
if ( localName.equals( "name" ) ) {
name = contents.toString();
}
if ( localName.equals( "location" ) ) {
location = contents.toString();
}
}
public void characters( char[] ch, int start, int length )
&nbs
p; throws SAXException {
contents.write( ch, start, length );
}
public static void main( String[] argv ){
System.out.println( "Example2:" );
try {
// Create SAX 2 parser...
XMLReader xr =
XMLReaderFactory.createXMLReader();
// Set the ContentHandler...
Example2 ex2 = new Example2();
xr.setContentHandler( ex2 );
// Parse the file...
xr.parse( new InputSource(
new
FileReader( "Example2.xml" )) );
// Say hello...
System.out.println( "Hello World from " +
ex2.name
&nbs
p; + " in " +
ex2.location );
}catch ( Exception e ) {
e.printStackTrace();
}
}
}
The following is the output of our hello world example:
Example2:
Hello World from Bob in New York
This is not the simplest hello world program ever written. As such, there are several things worth
noting in the example code.
First, the code demonstrates some of the bad features of event-driven code. Things get tricky when
event-driven code needs to respond to a pattern of events instead of just a single event. In this
specific case, we are looking for a pattern of SAX events that mark the name and location of our
simple XML document.
The tagged content is presented in the characters SAX event; the tags themselves are
spread between the startElement and endElement SAX events. I got around
this in the hello world example by coordinating around the contents buffer, which the
startElement always resets. The end element assumes that the contents have been
collected and assigns them to the appropriate local variable. This is not a bad pattern, but it
assumes that no two fields of a Java object possess the same tag -- not always a valid assumption.
We will address this issue later.
Another interesting feature of the example code is the use of a contents buffer -- a
little SAX gotcha. You can create a string directly in the characters SAX event instead
of copying the characters to a buffer as in the example. But that means ignoring the fact that the
SAX specification of the characters() method indicates the XML parser may call
characters() multiple times. This will cause data loss if the data between two tags are
large, or if the buffering of the stream feeding the XML parser data breaks in between two tags
while you are collecting data. Also, reusing a buffer is much more efficient than constantly
creating new strings.
Mapping Our First Java Object
Now that we've gotten through hello world, let's try a more useful example that maps an XML document
to a Java object. This example is similar to hello world, but maps data to a single object and has
an accessor for the object -- a useful pattern of using SAX present in the rest of the examples.
Unlike a constructor or a Factory method, objects mapped in a SAX parser are not
available until after parsing. A clean way to deal with this difference is to provide access methods
from the mapping class to the finished mapped object. That way, you create the mapping class, attach
it to an XMLReader, parse the XML, and then call the accessor to get a reference to the
mapped object. A variation of this theme is to supply a set method and then supply the object to be
mapped just before parsing.
Take a look at the sample XML document for the third example:
<?xml version="1.0"?>
<customer>
<FirstName> Bob </FirstName>
<LastName> Hustead </LastName>
<CustId> abc.123 </CustId>
</customer>
Next, we see a simple class that will be mapped with data supplied by our XML document:
package common;
import java.io.*;
// Customer is a very simple class
// that holds fields for a dummy Customer
// data.
// It has a simple method to print it's
// self to a print stream.
public class Customer {
// Customer member variables.
public String firstName = "";
public String lastName = "";
public String custId = "";
public void print( PrintStream out ) {
out.println( "Customer: "
);
out.println( " First Name
-> " + firstName );
out.println( " Last Name
-> " + lastName );
out.println( " Customer Id
-> " + custId );
}
}
This is the source code that does the XML mapping for our third example:
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
import common.*;
public class Example3 extends DefaultHandler {
// Local Customer object to collect
// customer XML data.
private Customer cust = new Customer();
// Buffer for collecting data from
// the "characters" SAX event.
private CharArrayWriter contents = new CharArrayWriter();
// Override methods of the DefaultHandler class
// to gain notification of SAX Events.
//
// See org.xml.sax.ContentHandler for all available
events.
//
public void startElement( String namespaceURI,
String
localName,
String
qName,
Attributes attr
) throws SAXException {
contents.reset();
}
public void endElement( String namespaceURI,
String
localName,
String qName )
throws SAXException {
if ( localName.equals( "FirstName" ) ) {
cust.firstName =
contents.toString();
}
if ( localName.equals( "LastName" ) ) {
cust.lastName = contents.toString();
}
if ( localName.equals( "CustId" ) ) {
cust.custId = contents.toString();
}
}
public void characters( char[] ch, int start, int length )
&nbs
p; throws SAXException {
contents.write( ch, start, length );
}
public Customer getCustomer() {
return cust;
}
public static void main( String[] argv ){
System.out.println( "Example3:" );
try {
// Create SAX 2 parser...
XMLReader xr =
XMLReaderFactory.createXMLReader();
// Set the ContentHandler...
Example3 ex3 = new Example3();
xr.setContentHandler( ex3 );
// Parse the file...
xr.parse( new InputSource(
new
FileReader( "Example3.xml" )) );
// Display customer to stdout...
Customer cust = ex3.getCustomer();
cust.print( System.out );
}catch ( Exception e ) {
e.printStackTrace();
}
}
}
The following is the output generated by our simple Customer object, populated with
data from our XML document:
Example3:
Customer:
First Name -> Bob
Last Name -> Hustead
Customer Id -> abc.123
A Simple List of Java Objects
For more complex XML documents, we will need to map lists of objects into Java. Mapping object lists
is like bartending: when a bartender pours several beers in a row, he usually leaves the tap running
while he quickly swaps glasses under the tap. This is exactly what we need to do to capture a list
of objects. We have no control over incoming SAX events; they flow in like beer from a tap that we
can't shut off. To solve the problem, we need to provide empty containers, allow them to fill up,
and continually replace them.
Our next example highlights this technique. Using an XML document that represents some information
about a fictional customer order, we will map the XML that represents a list of order items to a
vector of Java order-item objects. The key to implementing this concept is the current item. We'll
create a variable named currentOrderItem. Every time we get an event indicating a new
order item (startElement for the OrderItem tag), we will create a new
empty order-item object, add it to the list of order items, and assign it as the current order item.
The XML parser does the rest.
First, here is the XML document representing our fictional customer order:
<?xml version="1.0"?>
<CustomerOrder>
<Customer>
<FirstName> Bob </FirstName>
<LastName> Hustead </LastName>
<CustId> abc.123 </CustId>
</Customer>
<OrderItems>
<OrderItem>
<Quantity> 1 </Quantity>
<ProductCode&
gt; 48.GH605A </ProductCode>
<Description> Pet Rock
</Description>
<Price> 19.99 </Price>
</OrderItem>
<OrderItem>
<Quantity> 12 </Quantity>
<ProductCode&
gt; 47.9906Z </ProductCode>
<Description> Bazooka Bubble Gum
</Description>
<Price> 0.33 </Price>
</OrderItem>
<OrderItem>
<Quantity> 2 </Quantity>
<ProductCode&
gt; 47.7879H </ProductCode>
<Description> Flourescent Orange Squirt
Gun </Description>
<Price> 2.50 </Price>
</OrderItem>
</OrderItems>
</CustomerOrder>
Again, here is our simple customer class:
package common;
import java.io.*;
// Customer is a very simple class
// that holds fields for a dummy Customer
// data.
// It has a simple method to print it's
// self to a print stream.
public class Customer {
// Customer member variables.
public String firstName = "";
public String lastName = "";
public String custId = "";
public void print( PrintStream out ) {
out.println( "Customer: "
);
out.println( " First Name
-> " + firstName );
out.println( " Last Name
-> " + lastName );
out.println( " Customer Id
-> " + custId );
}
}
Next, a simple class to represent an order item:
package common;
import java.io.*;
// OrderItem is a very simple class
// that holds fields for dummy order
// item data.
// It has a simple method to print it's
// self to a print stream.
public class OrderItem {
// OrderItem member variables.
public int quantity = 0;
public String productCode = "";
public String description = "";
public double price = 0.0;
public void print( PrintStream out ) {
out.println( "OrderItem: "
);
out.println( " Quantity
-> " + Integer.toString(quantity) );
out.println( " Product Code
-> " + productCode );
out.println( " Description
-> " + description );
out.println( " price ->
" + Double.toString( price ) );
}
}
Now, we turn our attention to the SAX parser for example four, which maps customers and order items:
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
import java.util.*;
import common.*;
public class Example4 extends DefaultHandler {
// Local Customer object to collect
// customer XML data.
private Customer cust = new Customer();
// Local list of order items...
private Vector orderItems = new Vector();
// Local current order item reference...
private OrderItem currentOrderItem;
// Buffer for collecting data from
// the "characters" SAX event.
private CharArrayWriter contents = new CharArrayWriter();
// Override methods of the DefaultHandler class
// to gain notification of SAX Events.
//
// See org.xml.sax.ContentHandler for all available
events.
//
public void startElement( String namespaceURI,
String
localName,
String
qName,
Attributes attr
) throws SAXException {
contents.reset();
// New twist...
if ( localName.equals( "OrderItem" ) ) {
&nbs
p; currentOrderItem = new OrderItem();
orderItems.addElement( currentOrderItem );
}
}
public void endElement( String namespaceURI,
String
localName,
String qName )
throws SAXException {
if ( localName.equals( "FirstName" ) ) {
cust.firstName = contents.toString();
}
if ( localName.equals( "LastName" ) ) {
cust.lastName = contents.toString();
}
if ( localName.equals( "CustId" ) ) {
cust.custId = contents.toString();
}
if ( localName.equals( "Quantity" ) ) {
currentOrderItem.quantity =
Integer.valueOf(contents.toString().trim()).intValue();
}
if ( localName.equals( "ProductCode" ) ) {
currentOrderItem.productCode =
contents.toString();
}
if ( localName.equals( "Description" ) ) {
currentOrderItem.description =
contents.toString();
}
if ( localName.equals( "Price" ) ) {
currentOrderItem.price =
Double.valueOf(contents.toString().trim()).doubleValue();
}
}
public void characters( char[] ch, int start, int length )
&nbs
p; throws SAXException {
contents.write( ch, start, length );
}
public Customer getCustomer() {
return cust;
}
public Vector getOrderItems() {
return orderItems;
}
public static void main( String[] argv ){
System.out.println( "Example4:" );
try {
// Create SAX 2 parser...
XMLReader xr =
XMLReaderFactory.createXMLReader();
// Set the ContentHandler...
Example4 ex4 = new Example4();
xr.setContentHandler( ex4 );
// Parse the file...
xr.parse( new InputSource(
new
FileReader( "Example4.xml" )) );
// Display customer to stdout...
Customer cust = ex4.getCustomer();
cust.print( System.out );
// Display all order items to stdout...
OrderItem i;
Vector items = ex4.getOrderItems();
Enumeration e = items.elements();
while( e.hasMoreElements()){
&nbs
p; i = (OrderItem) e.nextElement();
i.print( System.out );
}
}catch ( Exception e ) {
e.printStackTrace();
}
}
}
Here's the output generated by our Customer and OrderItems objects:
Example4:
Customer:
First Name -> Bob
Last Name -> Hustead
Customer Id -> abc.123
OrderItem:
Quantity -> 1
Product Code -> 48.GH605A
Description -> Pet Rock
price -> 19.99
OrderItem:
Quantity -> 12
Product Code -> 47.9906Z
Description -> Bazooka Bubble Gum
price -> 0.33
OrderItem:
Quantity -> 2
Product Code -> 47.7879H
Description -> Fluorescent Orange Squirt Gun
price -> 2.5
When the structure of the XML document becomes more complex, the real task
is managing the creation of empty containers to contain the flow of SAX events. For simpler things
like a single list of objects, this management is straightforward. However, we will need to develop
techniques to help manage more complicated containment hierarchies such as lists of lists and lists
of objects that contain lists.
Objects Sharing Tags
Before we get to the more advanced containment layouts, there is another
difficulty with SAX we will sometimes need to address. While it may not
always be present, occasionally data at different places in the XML document will be tagged with the
same tag, but will have to be mapped to different objects in Java. Suppose you have a customer
section and a customer representative section in your XML document. Both of these sections have
fields with FirstName and LastName as tags. Because of this ambiguity, you
can no longer be sure which object the contents buffer should be assigned to during the
endElement SAX event. You must keep some information about containing
startElement SAX events to clarify which object collects the contents during the common
endELement SAX event.
This problem can become dangerous, even with XML documents that don't initially
have this structure, if the XML document doesn't have a DTD or the DTD is
changed without updating the mapping code. Without the DTD, your clients can
legally supply you with any tag that you are mapping in the wrong place within
the XML document.
In truth, the only way to safely deal with the problem is to constantly track information about all
open start tags. As a simple example, let's say you have the following XML document:
<?xml version=1.0"?>
<CustomerInformation>
<Customer>
<Name>
Some Customer Name
</Name>
<Company>
<Name>
The customer's company name
</Name>
</Company>
</Customer>
Even though the tag name Name is ambiguous, the full path to the name is not -- it's
either CustomerInformation->Customer->Name or
CustomerInformation->Customer->Company->Name. Keeping the full path available
at all times guarantees that accidentally reusing a tag name won't fool your mapping code. It turns
out that mapping recursive XML structures requires a solution to this problem; we will cover this
issue in the next article.
Next, we'll examine two examples for dealing with this situation. The first example is a brute force
if solution. I will set some flags during the containing element's
startElement SAX event. Then during the endElement event, I will run if
statements against the flags to determine which object the contents should be assigned
to.
Below you'll find our sample XML document demonstrating overlapping tag names:
<?xml version="1.0"?>
<Shapes>
<Triangle name="tri1" >
<x> 3 </x>
<y> 0 </y>
<height> 3 </height>
<width> 5 </width>
</Triangle>
<Triangle name="tri2" >
<x> 5 </x>
<y> 0 </y>
<height> 3 </height>
<width> 5 </width>
</Triangle>
<Square name="sq1" >
<x> 0 </x>
<y> 0 </y>
<height> 3 </height>
<width> 3 </width>
</Square>
<Circle name="circ1" >
<x> 10 </x>
<y> 10 </y>
<height> 3 </height>
<width> 3 </width>
</Circle>
</Shapes>
The following is a base class for all of our dummy shape classes:
package common;
// Dummy base class to hold values
// common to shapes.
public class Shape {
public int x = 0;
public int y = 0;
public int height = 0;
public int width = 0;
}
Here's a simple triangle class:
package common;
import java.io.*;
// Dummy triangle shape.
public class Triangle extends Shape {
// Dummy Triangle specific stuff...
public String name = "";
public void print( PrintStream out ){
out.println( "Triange: " + name +
" x: " + x +
" y: " + y +
" width: " + width +
" height: " + height );
}
}
Next, we see a simple square class:
package common;
import java.io.*;
// Dummy square shape.
public class Square extends Shape {
// Dummy Triangle specific stuff...
public String name = "";
public void print( PrintStream out ){
out.println( "Square: " + name +
" x: " + x +
" y: " + y +
" width: " + width +
" height: " + height );
}
}
Here's a simple circle shape:
package common;
import java.io.*;
// Dummy circle shape.
public class Circle extends Shape {
// Dummy Circle specific stuff...
public String name = "";
public void print( PrintStream out ){
out.println( "Circle: " + name +
" x: " + x +
" y: " + y +
" width: " + width +
" height: " + height );
}
}
Next, we map code that represents the brute force method of separating identical tag names
associated with different objects:
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
import java.util.*;
import common.*;
public class Example5 extends DefaultHandler {
// Flags to help us capture the contents
// of a tagged element.
private boolean inCircle = false;
private boolean inTriangle = false;
private boolean inSquare = false;
// Local list of different shapes...
private Vector triangles = new Vector();
private Vector squares = new Vector();
private Vector circles = new Vector();
// Local current shape references...
private Triangle currentTriangle;
private Circle currentCircle;
private Square currentSquare;
// Buffer for collecting data from
// the "characters" SAX event.
private CharArrayWriter contents = new CharArrayWriter();
// Override methods of the DefaultHandler class
// to gain notification of SAX Events.
//
// See org.xml.sax.ContentHandler for all available
events.
//
public void startElement( String namespaceURI,
String
localName,
String
qName,
Attributes attr
) throws SAXException {
contents.reset();
if ( localName.equals( "Circle" ) ) {
&nbs
p; inCircle = true;
&nbs
p; currentCircle = new Circle();
currentCircle.name = attr.getValue( "name"
);
circles.addElement( currentCircle );
}
if ( localName.equals( "Square" ) ) {
&nbs
p; inSquare = true;
currentSquare = new Square();
currentSquare.name = attr.getValue( "name"
);
squares.addElement( currentSquare );
}
if ( localName.equals( "Triangle" ) ) {
&nbs
p; inTriangle = true;
currentTriangle = new Triangle();
currentTriangle.name = attr.getValue( "name"
);
triangles.addElement( currentTriangle );
}
}
public void endElement( String namespaceURI,
String
localName,
String qName )
throws SAXException {
if ( localName.equals( "x" ) ) {
if ( inCircle ) {
&nbs
p; currentCircle.x =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString()
.trim()).intValue();
}
else if ( inSquare ) {
&nbs
p; currentSquare.x =
&nbs
p; Integer.valueO
f
&nbs
p; (contents.toString()
.trim()).intValue();
}
else {
&nbs
p; currentTriangle.x =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString()
.trim()).intValue();
}
}
if ( localName.equals( "y" ) ) {
if ( inCircle ) {
&nbs
p; currentCircle.y =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString()
.trim()).intValue();
}
else if ( inSquare ) {
&nbs
p; currentSquare.y =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString()
.trim()).intValue();
}
else {
&nbs
p; currentTriangle.y =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString()
.trim()).intValue();
}
}
if ( localName.equals( "width" ) ) {
if ( inCircle ) {
&nbs
p; currentCircle.width =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString()
.trim()).intValue();
}
else if ( inSquare ) {
&nbs
p; currentSquare.width =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString()
.trim()).intValue();
}
else {
&nbs
p; currentTriangle.width =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString()
.trim()).intValue();
}
}
if ( localName.equals( "height" ) ) {
if ( inCircle ) {
&nbs
p; currentCircle.height =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString()
.trim()).intValue();
}
else if ( inSquare ) {
&nbs
p; currentSquare.height =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString()
.trim()).intValue();
}
else {
&nbs
p; currentTriangle.height =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString()
.trim()).intValue();
}
}
if ( localName.equals( "Circle" ) ) {
&nbs
p; inCircle = false;
}
if ( localName.equals( "Square" ) ) {
&nbs
p; inSquare = false;
}
if ( localName.equals( "Triangle" ) ) {
&nbs
p; inTriangle = false;
}
}
public void characters( char[] ch, int start, int length )
&nbs
p; throws SAXException {
// accumulate the contents into a buffer.
contents.write( ch, start, length );
}
public Vector getCircles() {
return circles;
}
public Vector getSquares() {
return squares;
}
public Vector getTriangles() {
return triangles;
}
public static void main( String[] argv ){
System.out.println( "Example5:" );
try {
// Create SAX 2 parser...
XMLReader xr =
XMLReaderFactory.createXMLReader();
// Set the ContentHandler...
Example5 ex5 = new Example5();
xr.setContentHandler( ex5 );
// Parse the file...
xr.parse( new InputSource(
new
FileReader( "Example5.xml" )) );
// Display all circles to stdout...
Circle c;
Vector items = ex5.getCircles();
Enumeration e = items.elements();
while( e.hasMoreElements()){
&nbs
p; c = (Circle) e.nextElement();
c.print( System.out );
}
// Display all squares to stdout...
Square s;
items = ex5.getSquares();
e = items.elements();
while( e.hasMoreElements()){
&nbs
p; s = (Square) e.nextElement();
s.print( System.out );
}
// Display all triangle to stdout...
Triangle t;
items = ex5.getTriangles();
e = items.elements();
while( e.hasMoreElements()){
&nbs
p; t = (Triangle) e.nextElement();
t.print( System.out );
}
}catch ( Exception e ) {
e.printStackTrace();
}
}
}
The following is the output we have collected into our shape classes:
Example5:
Circle: circ1 x: 10 y: 10 width: 3 height: 3
Square: sq1 x: 0 y: 0 width: 3 height: 3
Triange: tri1 x: 3 y: 0 width: 5 height: 3
Triange: tri2 x: 5 y: 0 width: 5 height: 3
The second solution takes advantage of the fact that you can replace the
SAX ContentHandler of a SAX parser while it's running. This allows us to divide our
mapping tasks into modular pieces. We can implement mapping code only in the local terms of its
particular fragment of XML document.
The endElement() method of the second example does not contain a network of nested if
statements. This modularity becomes critical when processing more complex XML documents. It also
ensures that this style of mapping code does not error in the face of duplicate tag names in
unexpected locations within the XML document.
Although the second method is a little bulkier due to the replication
of most of the class definition, this technique of swapping the
ContentHandler is the first step toward a more generic solution to
parsing with SAX. Swapping the ContentHandler is also another way for us to swap mugs
under the running tap of a SAX parser.
The following code demonstrates the ContentHandler swap
technique. The contents buffer is shared by the
Example6 class and the other type-specific ContentHandler inner classes:
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
import java.util.*;
import common.*;
public class Example6 extends DefaultHandler {
// XML Parser...
XMLReader parser;
// Mapping delegates...
Example6Circle circleMapper = new
Example6Circle();
Example6Square squareMapper = new
Example6Square();
Example6Triangle triangleMapper = new
Example6Triangle();
// Local list of different shapes...
private Vector circles = new Vector();
private Vector triangles = new Vector();
private Vector squares = new Vector();
// Buffer for collecting data from
// the "characters" SAX event.
private CharArrayWriter contents = new CharArrayWriter();
// Constructor with XML Parser...
Example6( XMLReader parser ) {
this.parser = parser;
}
// Override methods of the DefaultHandler class
// to gain notification of SAX Events.
//
// See org.xml.sax.ContentHandler for all available
events.
//
public void startElement( String namespaceURI,
String
localName,
String
qName,
Attributes attr
) throws SAXException {
contents.reset();
if ( localName.equals( "Circle" ) ) {
&nbs
p; Circle aCircle = new Circle();
aCircle.name = attr.getValue( "name" );
circles.addElement( aCircle );
circleMapper.collectCircle( parser, this,
aCircle );
}
if ( localName.equals( "Square" ) ) {
&nbs
p; Square aSquare = new Square();
aSquare.name = attr.getValue( "name" );
squares.addElement( aSquare );
squareMapper.collectSquare( parser, this,
aSquare );
}
if ( localName.equals( "Triangle" ) ) {
Triangle aTriangle = new Triangle();
aTriangle.name = attr.getValue( "name" );
triangles.addElement( aTriangle );
triangleMapper.collectTriangle( parser, this,
aTriangle);
}
}
public void endElement( String namespaceURI,
String
localName,
String qName )
throws SAXException {
// Nothing left for the Example 6 mapper
// to handle in the endElement SAX event.
}
public void characters( char[] ch, int start, int length )
&nbs
p; throws SAXException {
// accumulate the contents into a buffer.
contents.write( ch, start, length );
}
public Vector getCircles() {
return circles;
}
public Vector getSquares() {
return squares;
}
public Vector getTriangles() {
return triangles;
}
public static void main( String[] argv ){
System.out.println( "Example6:" );
try {
// Create SAX 2 parser...
XMLReader xr =
XMLReaderFactory.createXMLReader();
// Set the ContentHandler...
Example6 ex6 = new Example6(xr);
xr.setContentHandler( ex6 );
// Parse the file...
xr.parse( new InputSource(
new
FileReader( "Example6.xml" )) );
// Display all circles to stdout...
Circle c;
Vector items = ex6.getCircles();
Enumeration e = items.elements();
while( e.hasMoreElements()){
&nbs
p; c = (Circle) e.nextElement();
c.print( System.out );
}
// Display all squares to stdout...
Square s;
items = ex6.getSquares();
e = items.elements();
while( e.hasMoreElements()){
&nbs
p; s = (Square) e.nextElement();
s.print( System.out );
}
// Display all triangle to stdout...
Triangle t;
items = ex6.getTriangles();
e = items.elements();
while( e.hasMoreElements()){
&nbs
p; t = (Triangle) e.nextElement();
t.print( System.out );
}
}catch ( Exception e ) {
e.printStackTrace();
}
}
}
class Example6Circle extends DefaultHandler {
// Local current circle reference...
private Circle currentCircle;
// Parent...
ContentHandler parent;
// XML Parser
XMLReader parser;
// Buffer for collecting data from
// the "characters" SAX event.
private CharArrayWriter contents = new CharArrayWriter();
public void collectCircle( XMLReader parser,
ContentHan
dler parent,
Circle
newCircle ) {
this.parent = parent;
this.parser = parser;
parser.setContentHandler( this );
currentCircle = newCircle;
}
// Override methods of the DefaultHandler class
// to gain notification of SAX Events.
//
// See org.xml.sax.ContentHandler for all available
events.
//
public void startElement( String namespaceURI,
String
localName,
String
qName,
Attributes attr
) throws SAXException {
contents.reset();
}
public void endElement( String namespaceURI,
String
localName,
String qName )
throws SAXException {
if ( localName.equals( "x" ) ) {
&nbs
p; currentCircle.x =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString().trim()).int
Value();
}
if ( localName.equals( "y" ) ) {
&nbs
p; currentCircle.y =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString()
.trim()).intValue();
}
if ( localName.equals( "width" ) ) {
&nbs
p; currentCircle.width =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString().trim()).int
Value();
}
if ( localName.equals( "height" ) ) {
&nbs
p; currentCircle.height =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString().trim(
)).intValue();
}
if ( localName.equals( "Circle" ) ) {
&nbs
p; // swap content handler back to parent
parser.setContentHandler(parent);
}
}
public void characters( char[] ch, int start, int length )
&nbs
p; throws SAXException {
// accumulate the contents into a buffer.
contents.write( ch, start, length );
}
}
class Example6Square extends DefaultHandler {
// Local current square reference...
private Square currentSquare;
// Parent...
ContentHandler parent;
// XML Parser
XMLReader parser;
// Buffer for collecting data from
// the "characters" SAX event.
private CharArrayWriter contents = new CharArrayWriter();
public void collectSquare( XMLReader parser,
ContentHan
dler parent,
Square
newSquare ) {
this.parent = parent;
this.parser = parser;
parser.setContentHandler( this );
currentSquare = newSquare;
}
// Override methods of the DefaultHandler class
// to gain notification of SAX Events.
//
// See org.xml.sax.ContentHandler for all available
events.
//
public void startElement( String namespaceURI,
String
localName,
String
qName,
Attributes attr
) throws SAXException {
contents.reset();
}
public void endElement( String namespaceURI,
String
localName,
String qName )
throws SAXException {
if ( localName.equals( "x" ) ) {
&nbs
p; currentSquare.x = Integer.valueOf
&nbs
p; (contents.toString().trim(
)).intValue();
}
if ( localName.equals( "y" ) ) {
&nbs
p; currentSquare.y = Integer.valueOf
&nbs
p; (contents.toString().trim(
)).intValue();
}
if ( localName.equals( "width" ) ) {
&nbs
p; currentSquare.width = Integer.valueOf
&nbs
p; (contents.toString().trim(
)).intValue();
}
if ( localName.equals( "height" ) ) {
&nbs
p; currentSquare.height = Integer.valueOf
&nbs
p; (contents.toString().trim(
)).intValue();
}
if ( localName.equals( "Square" ) ) {
&nbs
p; // swap content handler back to parent
parser.setContentHandler(parent);
}
}
public void characters( char[] ch, int start, int length )
&nbs
p; throws SAXException {
// accumulate the contents into a buffer.
contents.write( ch, start, length );
}
}
class Example6Triangle extends DefaultHandler {
// Local current triangle reference...
private Triangle currentTriangle;
// Parent...
ContentHandler parent;
// XML Parser
XMLReader parser;
// Buffer for collecting data from
// the "characters" SAX event.
private CharArrayWriter contents = new CharArrayWriter();
public void collectTriangle( XMLReader parser,
ContentHan
dler parent,
Triangle
newTriangle ) {
this.parent = parent;
this.parser = parser;
parser.setContentHandler( this );
currentTriangle = newTriangle;
}
public void endElement( String namespaceURI,
String
localName,
String qName )
throws SAXException {
if ( localName.equals( "x" ) ) {
&nbs
p; currentTriangle.x =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString().trim(
)).intValue();
}
if ( localName.equals( "y" ) ) {
&nbs
p; currentTriangle.y =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString().trim()).int
Value();
}
if ( localName.equals( "width" ) ) {
&nbs
p; currentTriangle.width =
&nbs
p; Integer.valueOf
&nbs
p; (contents.toString().trim(
)).intValue();
}
if ( localName.equals( "height" ) ) {
&nbs
p; currentTriangle.height = Integer.valueOf
&nbs
p; (contents.toString().trim(
)).intValue();
}
if ( localName.equals( "Triangle" ) ) {
&nbs
p; // swap content handler back to parent
parser.setContentHandler(parent);
}
}
public void characters( char[] ch, int start, int length )
&nbs
p; throws SAXException {
// accumulate the contents into a buffer.
contents.write( ch, start, length );
}
}
Notice that we get the same output from our shape classes:
Example6:
Circle: circ1 x: 10 y: 10 width: 3 height: 3
Square: sq1 x: 0 y: 0 width: 3 height: 3
Triange: tri1 x: 3 y: 0 width: 5 height: 3
Triange: tri2 x: 5 y: 0 width: 5 height: 3
Conclusion
We've demonstrated that SAX, when properly applied, has many advantages over the DOM API. We've
covered some of the basic perspectives regarding SAX that allow us to effectively write XML to Java
mapping code for simple and moderately complex XML documents. We've also highlighted some of the
danger areas for applying the SAX API.
Finally, I hope you now understand the implications of using the DOM API in performance-sensitive
environments, where the SAX API shines.
In the next article, we will tackle recursive XML structures, the ambiguous
tag name problem, and the navigational aspects of SAX. These three threads come together in a
general purpose class library that turns even the most complicated XML mapping code into a
declarative style of coding that focuses on container management -- swapping beer glasses under the
open tap.
About the author
Bob Hustead
has been wasting way too much time on computers since the Commodore
64 was introduced. He originally focused on communication protocols
and device drivers in C, then migrated to object-oriented designs,
first in C++ and then in Java. He has since shifted his focus to
middleware architecture. Bob currently works at AIG Insurance in New
York City as an architect for enterprise application
infrastructure.
Resources

Reprinted with permission from the July 2000 edition of JavaWorld magazine. Copyright ITworld.com, Inc., an IDG
Communications company. Register for editorial
e-mail alerts
_______
1 As used on this web site, the terms "Java
virtual machine" or "JVM" mean a virtual machine for the Java
platform.
|