JAXP Compatibility Guide
for the J2SE Platform, versions 1.4 and 1.5

Contents

Introduction

The J2SE 1.4 platform included the "Crimson" reference implementation for JAXP 1.1. The J2SE 1.5 platform includes a reference implementation for JAXP 1.3 based on the Apache "Xerces" library.

Because these implementations come from entirely different codebases, and because the JAXP standard has evolved from 1.1 to 1.3, there are some subtle differences between the implementations, even though they both conform to the JAXP standard. These two factors combine to create the compatibility issues described in this guide.

Note:
The reference implemention for the JAXP 1.3 APIs will be under development for some time. The latest version can always be found at java.net.

What's New

JAXP 1.3, which is part of the J2SE 1.5 platform, provides some compelling features:

That's the good news. The bad news is that some compatibility issues have survived all attempts at eradication. The remainder of this document discusses those issues.

Using the JAXP 1.3 JAR Files With J2SE 1.4

Version 1.4 of the Java 2 platform includes JAXP 1.2, which is older than the JAXP version included in Java WSDP version 1.6.  Among the many changes made in between JAXP 1.2 and JAXP 1.3, some package and class names have been renamed, which means that the API as well as the implementation have changed. 

If you try to run an application that used the J2SE 1.4 JAXP JARS on Java WSDP version 1.6, you would see an exception because the JAXP 1.3 implementation classes would be incompatible with the JAXP API included in J2SE 1.4.  Therefore, in order to use the JAXP JARs included with Java WSDP in an application that uses J2SE 1.4, you must override the JAXP JARS that come with J2SE 1.4 with those supplied with the Java WSDP. 

You do this by setting the java.endorsed.dirs system property of your application to the paths of the JAXP API JARS included with the Java WSDP, which are located in the <JWSDP_HOME>/jaxp/lib and <JWSDP_HOME>/jaxp/lib/endorsed directories.

For more information on the java.endorsed.dirs system property, see the Endorsed Standards documentation for the 1.4 version of the Java platform.

 

Using the JAXP 1.3 JAR Files With J2SE 1.5

Both the Java WSDP 1.6 and Version 1.5 of the Java 2 platform include JAXP 1.3. However, the version of JAXP in the Java WSDP also includes some bug fixes. Therefore, the implementations of JAXP 1.3 in Java WSDP and J2SE 1.5 differ.

To use the JAXP 1.3 implementation packaged with Java WSDP instead of the one in J2SE 1.5, you need to override the JAXP 1.3 implementation JARS included in J2SE 1.5 with those included in Java WSDP. To do this, you need to set the java.endorsed.dirs system property of your application to the paths of the JAXP implementation JARS included in Java WSDP, which are located in the <JWSDP_HOME>/jaxp/lib/endorseddirectory.

 

DOM Level 3

While the reference implementation in J2SE 1.4 supported the DOM Level 2 API, the implementation in J2SE 1.5 supports the DOM Level 3 family of APIs. This section covers the impact of those changes on programs that used the JAXP 1.2 reference implementation:

For more information, see the complete list of changes in the DOM Level 3 Changes appendix.

Methods added to DOM interfaces

In DOM level 3, additional methods were defined in the following interfaces:

The added methods only affect applications that implement the interfaces directly, and only then when the application is recompiled. Applications that use the factory methods to obtain implementation classes for these interfaces will have no problems.

Preserving the XML format

These changes affect an application that reads in XML data into a DOM, makes modifications, and then writes it out in a way that preserves the original formatting.

In JAXP 1.1, extraneous whitespace was automatically removed on input, and a single property (ignoringLexicalInfo) was set to false to preserve entity nodes and CDATA nodes, for example. Including the additional nodes made the DOM somewhat more complex to process, but because they were there, adding whitespace output (indentation and newlines) produced highly readable, formatted version of the XML data which closely approximated the input.

In JAXP 1.3, there are four APIs that the application uses to determine how much lexical (formatting) information is available to process, using the following DocumentBuilderFactory methods:

The default values for all of these properties is false, which preserves all the lexical information necessary to reconstruct the incoming document in its original form. Setting them all to true lets you construct the simplest possible DOM, so the application can focus on the data's semantic content, without having to worry about lexical syntax details.

Note:
When adding new nodes, the application must add any indentation and newline formatting that is needed for readability, since it is not provided automatically.

SAX 2.0.1

In general, SAX 2.0.1 is a bug-fix release, with no API changes. There are some additions, however, which are of interest to application developers:

Note:
One point of compatibility is also worth mentioning. Namespace recognition was turned off by default in J2SE 1.4 (JAXP 1.1). For backward compatibility, that policy is continued in J2SE 1.5 (JAXP 1.3). However, namespace recognition is turned on by default in the official SAX implementation at www.saxproject.org. While not strictly a compatibility issue from the standpoint of JAXP, it is an issue that sometimes comes as a surprise.

Null handlers are possible

In SAX 2.0.1, an application can set ErrorHandler, EntityResolver, ContentHandler, or DTDHandler to null. This is a relaxation of the previous restriction in SAX 2.0, which generated a NullPointerException (NPE) in such circumstances.

So the following code is legal in JAXP 1.3:

SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader reader = sp.getXMLReader();

reader.setErrorHandler(null);
reader.setContentHandler(null);
reader.setEntityResolver(null);
reader.setDTDHandler(null);

IOException added to SAX EntityResolver API

The resolveEntity() method in the EntityResolver API now throws IOException, as well as SAXException. (Before, it only threw SAXException.)

The vast majority of applications are unaffected by this change, because the DefaultHandler implementation class has been modified to declare the additional exception, and very few applications use the DefaultHandler in such a way that they will run into a problem.

The only way an application can be affected is if it overrides the resolveEntity() method and also invokes super.resolveEntity(). In that case, the application won't compile in J2SE 1.5 until the method is modified to handle the IOExceptions that super.resolveEntity() could throw.

New Features and Properties

The following new features are recognized :

and the following new property:

For a complete list of Xerces features and properties, see http://xml.apache.org/xerces2-j/features.html and http://xml.apache.org/xerces2-j/properties.html.

Using XSLT

Code that uses the standard JAXP APIs to create and access an XSL transformer does not need to be changed. The output will be the same, but will in general be produced much faster, since the XSLTC compiling transformer will be used by default, instead of the interpreting Xalan transformer.

Note:
There is no significant difference between Xalan and XSLTC performance for a single run on a small data set, as when you are developing and testing an XSL stylesheet. But there is a major performance benefit when using XSLTC on anything larger.

Invoking XSLT from the Command Line

The command for invoking XSLT from the command line is unchanged, as well. That command is:

java com.sun.org.apache.xalan.internal.xslt.Process -IN xmlIn -XSL xslStyle {-OUT out}

However, the following Xalan-interpretive command line options are not supported:

Programmatic Access to Xalan XPath

Xalan-interpretive is not included in the reference implementation. If an application uses the Xalan XPath API to evaluate a standalone XPath expression (one that is not part of an XSLT stylesheet), you'll need to download and install the Apache libraries for Xalan, put them on the classpath, and import org.apache.xpath.* in the application.

Package Name Changes

This change does not affect applications that confine themselves to using the standard JAXP APIs. But applications that access implementation-specific features of the XML processors defined in previous JAXP versions will have to be modified to take into account package names that changed in JAXP 1.3.

The change has several effects on previous applications:

  1. The property-values that were used to access the internal implementations must be changed.
  2. Applications that used internal APIs from the Xalan implementation classes must change the import statements that gave them access to those APIs.
  3. Applications that used internal APIs from the Crimson implementation must be recoded -- ideally, by using newer JAXP APIs or, if necessary, by using Xerces APIs.

What Changed, and Why

In J2SE 1.4, the fact that JAXP was built into the Java platform was a mixed blessing. On the one hand, an application could rely on that fact that it was there. On the other, most applications needed features and bug fixes that were available in later versions.

But adding new libarires had no effect, because internal classes always take precedence over the classpath. The solution for that problem in 1.4 was to use the endorsed standards mechanism. However, that was a new mechanism, and one which frequently placed an additional burden on the end user, as well as the application developer.

The solution in the JAXP 1.3 reference implementation is to change the package names of the Apache libraries used in the implementation. That change lets you reference newer Apache libraries in the classpath, so application developers can use them in the same way that they would use any other additions to the Java platform.

The new names given to the Apache packages in the JAXP 1.3 reference implementation are shown below:

  JAXP 1.1 (Crimson) JAXP 1.3 (Xerces)
JAXP org.apache.crimson

-/-
com.sun.org.apache.xerces.internal
  org.apache.xml com.sun.org.apache.xml.internal
XSLT org.apache.xalan
org.apache.xpath
-/-
com.sun.org.apache.xpath.internal
com.sun.org.apache.xalan.internal.xsltc

Using System Properties and Implementation Classes

Applications specify system properties on the command line with -D, in the JRE's lib/jaxp.properties file, or by hard-coding them into the application, generally do so in order to access functionality that is not present in the standard APIs.

JAXP 1.3 contains many new additions. When upgrading such applications, it is advisable to look for standard APIs in the javax.xml.* packages that will do the same job, because that's the best way to keep from having to change the application in the future. If absolutely necessary (either because of functionality restrictions or lack of time to investigate the new APIs), the property values can be changed by converting old-format package names into the new format:

org.apache.somePackage --> com.sun.org.apache.SomePackage.internal

Similarly, internal implementation classes all use the new package names. If your application is using implementaton classes (it shouldn't!) those package names will have to change, as well.

Security Issue Posed by Nested Entity Definitions

While XML does not allow recursive entity definitions, it does permit nested entity definitions, which produces the potential for Denial of Service attacks on a server which accepts XML data from external sources. For example, a SOAP document like the following that has very deeply nested entity definitions can consume 100% of CPU time and large amounts of memory in entity expansions:

<?xml version="1.0" encoding ="UTF-8"?>
<!DOCTYPE foobar[
<!ENTITY x100 "foobar">
<!ENTITY x99 "&x100;&x100;">
<!ENTITY x98 "&x99;&x99;">
...
<!ENTITY x2 "&x3;&x3;">
<!ENTITY x1 "&x2;&x2;">
]>
<SOAP-ENV:Envelope xmlns:SOAP-ENV=...>
<SOAP-ENV:Body>
<ns1:aaa xmlns:ns1="urn:aaa" SOAP-ENV:encodingStyle="...">
<foobar xsi:type="xsd:string">&x1;</foobar>
</ns1:aaa>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>

A system that doesn't take in external XML data need not be concerned with the issue, but one that does can utilize one of the following safeguards to prevent the problem:

New system property to limit entity expansion
The entityExpansionLimit system property lets existing applications constrain the total number of entity expansions without recompiling the code. The parser throws a fatal error once it has reached the entity expansion limit. (By default, no limit is set.)

To set the entity expansion limit using the system property, use an option like the following on the java command line: -DentityExpansionLimit=100000
 
New parser property to limit entity expansion
The http://apache.org/xml/properties/entity-expansion-limit parser property lets an application set a limit on total entity expansions without having to use the command line. It accepts a value of java.lang.Integer type. The parser throws a fatal error once it has reached the entity expansion limit.

To set the entity expansion limit with this property, the application can use code like the following:

DocumentBuilderFactory dfactory = DocumentBuilderFactory.newInstance(); dfactory.setAttribute(
  "http://apache.org/xml/properties/entity-expansion-limit",
  new Integer("100000")
); 
New parser property to disallow DTDs
The application can also set the http://apache.org/xml/features/disallow-doctype-decl parser property to true. A fatal error is then thrown if the incoming XML document contains a DOCTYPE declaration. (The default value for this property is false.) This property is typically useful for SOAP based applications where a SOAP message must not contain a Document Type Declaration.