Sun Java Solaris Communities My SDN Account Join SDN
 
Article

Long-Term Persistence for JavaBean

 

Long-Term Persistence for JavaBeans

By Philip Milne & Kathy Walrath    

(Note: After you've read this article, please see the update.)

At JavaOne '99 we gave a preliminary talk on a new persistence model for Swing that would allow Swing user interfaces to be "serialized" as XML documents. As we found out from the BOF sessions after the talks, there was a great deal of interest in this topic, much of it on the more general problem of archiving graphs of JavaBeans as XML documents.


Since then we've been working with the IDE vendors to generalize and refine these techniques to deal with the practical issues that arise in saving real designs as constructed by commercial tools. The recent surge of interest in Swing IDE's has brought the question of interoperability to the fore. At the heart of this issue is the question of persistence and how a design can be saved in a format that is not tied to the tool that created it.

Now it's time to solicit feedback from developers working outside of the IDE space. We invite you to:

Please send any comments, criticisms, or ideas you have on this work to java-beans@java.sun.com.
 

Design and Implementation of the Persistence Model

The proposed persistence model is implemented in a downloadable package named archiver. The archiver package works only with 1.3 versions of the Java 2 SDK.

This section describes the following aspects of the design and implementation of the persistence model:

Goals

The new persistence model is designed to handle the process of converting a graph of JavaBeans to and from a persistent form. We had the following goals in creating a long term persistence scheme for user interfaces made of JavaBeans:
  • Resilience to changes in the versions of both VMs and class libraries.
  • Fault tolerance, allowing an archive to load even when  part of it is damaged.
  • Archives exclusively written in terms of public APIs, not private implementations.
  • Textual (ASCII) output that can be edited with standard tools (.xml and .java output options).
  • Comprehensive use of defaults (redundancy elimination) to minimize file size.
  • Performance that varies linearly with the number of nodes in the graph.

A Note on Marshaling vs. Archiving

There are two common approaches to the problem of persisting graphs of objects:
  • Recording all state in an object graph, including non-public state.
  • Recording all state that can be reconstituted using the public APIs of the objects in the graph.
The simplest scheme taking the first approach requires the inclusion of all the classes that define the objects -- which is too expensive. The practical alternative, which is to refrain from serializing the byte codes that define the classes themselves, is workable between identical implementations of the same libraries. The serialization framework in 1.1 implements this and is therefore the method of choice for sending faithful copies of an object graph between two similarly configured VMs.

The second approach cannot produce as faithful a copy of the original objects as the first but can store the state of the graph in such a way that any API-compatible implementation of the classes involved will be sufficient to reconstitute it. Since APIs are so much more stable than their private implementations, this single step virtually solves the versioning issues for most practical purposes. As importantly, from an applet perspective, the behavior of the constructors residing on the client machine can be leveraged, often dramatically reducing the size of the files that need to be transferred.

Luckily, although the serialization APIs in JDK 1.1 provided direct support for only the first task, they used a series of interfaces to ensure that support for the second operation could easily be accommodated by the same framework. We use this framework to implement the ObjectOutput and ObjectInput interfaces and house a complimentary scheme which is designed to solve the long term persistence problem for user interfaces of JavaBeans.

The Archiver Package

The archiver package generates archives that depend only on the public APIs of the Beans in the archive, and not on the state of any private implementation. Like modern programming languages, the implementation of the archiver deals with the syntactic and semantic elements of this task separately. This architecture allows us to include support for some popular formats now while allowing users to plug in other "syntax-modules" to support other file types in the future.

This formal separation allows the majority of the internal architecture and any special-case code that changes the way the state of a particular class is archived to be written in a form that is independent of the syntax of the output format.  Given both the proliferation of new XML standards and their rapid evolution, this accommodating rather than defining role seems to be the best way to provide Beans with a long-term persistence strategy that can coexist with this evolutionary process.

To test this approach we have implemented two very different formats as examples: a declarative XML format and a procedural Java-like scripting language. We've also implemented a third, output-only format that produces compilable Java files. The formal separation of evaluation semantics has also proven useful in the internal implementation of our redundancy elimination mechanism, which requires the write-time evaluation of the statements being written to the output.

The archiver package currently supports three file formats:

  • XML files that use a declarative DTD and conform to the W3 specification. The corresponding reader and writer classes are XMLInputStream and XMLOutputStream.
  • Java files. The corresponding class is JavaOutputStream. Due to the complexity of this format, we don't provide a corresponding input stream. The object graph can be retrieved by compiling the Java files and loading the resulting classes.
  • "BeanScript" files, which have a format similar to Java files, but simplified so that the files can be parsed easily. The corresponding reader and writer classes are BeanScriptInputStream and BeanScriptOutputStream.
The file formats are described in detail in the section on file formats. The classes and methods defined in the archiver package are listed in the Archiver API Documentation.

How to Use the New Output Streams

The code for using the new output stream classes is almost identical to the code for using the binary serialization output stream class, ObjectOutputStream. For example, if the code for using ObjectOutputStream looks like this:
try {
    ObjectOutput os = new ObjectOutputStream(System.out);
    os.writeObject(new JButton("Press Me"});
    os.close();
} catch(Exception e) {
    e.printStackTrace();
}

Then the code for writing an XML document requires just one small change:

try {
    ObjectOutput os = new XMLOutputStream(System.out);
    os.writeObject(new JButton("Press Me"});
    os.close();
} catch(Exception e) {
    e.printStackTrace();
}

The code for using the new input streams is likewise similar to the code for using ObjectInputStream.

The Persistence Model

Typically the new streams provided in the archiver package reduce the serialization problem for JavaBeans to the problem of providing an ordered list of properties that define the state of the JavaBean. All values of the properties of a JavaBean are assumed to be JavaBeans. To make this recursive definition work, we have to widen the notion of what is considered a JavaBean slightly so as to include all possible values that properties can take. In our implementation, Color objects are considered to be JavaBeans, as are LayoutManagers, Vectors, Hashtables, Numbers, and Boolean values. To handle the "wiring" part of the user interface it has also proven convenient to provide built-in support for some other key classes in the JDK including Method, Class, array classes, and proxy classes.

To handle all these extra classes the first requirement is that we are able to create instances of them. In all the special cases, this requires extra information that describes how a new instance should be created. For most classes this extra information simply associates the arguments of a chosen constructor with names of the properties they represent. So, for example, the java.awt.Color class is augmented with meta data recording the fact that the three integers that appear as arguments in one of the constructors are the red, green, and blue properties of the new instance. Given this extra information the recording of a Color object is reduced to the simpler problem of recording the Integer values of those properties.

In other cases, such as java.lang.Method, the extra information is used to indicate to the output streams the fact that, instead of using a constructor, the static getMethod method in java.lang.Class should be used to retrieve instances of the Method class. Near the bottom of these recursive definitions are the wrappers for the primitive types of the Java virtual machine: the Boolean class and the Number derivatives. All of these classes have a useful invariant in that they may be reconstructed by calling their single-argument constructor using the value returned by the toString method.  Closing this recursive definition, then, is the java.lang.String class, for which each file format must provide built-in support, and in terms of which all other objects will be represented.

Identity

As the object graph is traversed a hashtable (actually a special kind of hashtable that uses "==" instead of "equals") is used to detect when a node is revisited. When it is, the archiver gives this instance a name so that it can be referred to multiple times in the archive. That way the identity of objects in the graph is preserved by the archival process. 

Size

Even though an XML encoding, term for term, takes significantly more space than a binary encoding, the archives produced by the new streams are typically between 10x and 100x smaller than their serialized counterparts. This is due to a comprehensive system for excluding default information from the archives. For details, read Redundancy Elimination.

Listeners

A crucial part of a user interface, beyond the way it will appear, is the way it will be connected to the logic of an application. In the past this has only been possible by generating Java source files that implement event listener interfaces and compiling  them at design time. With the introduction of the java.lang.reflect.Proxy API's in SDK 1.3 it is now possible to consider the "wiring" of the user interface as part of the state of the design and saved as an integral part of archive that represents it.

Even with the much improved footprint of inner classes in 1.2 the generation of classes is still a potentially costly solution if an inner class is generated for each action in a user interface. Our example builder, Bean Builder, demonstrates how instances of the java.lang.reflect.Proxy API can be used to create "trampoline" objects that can be installed as listeners to arbitrary events and used to call a given method on a target object when the event takes place. The proxy APIs are used to synthesize listeners of arbitrary types at runtime – instead of having to compile code. The key thing about this technique is that we generate one "trampoline" class per event type, which is a considerable saving over techniques that generate one class per event. The result is that the incremental cost of wiring, for example, a button to a method in a target  object is the footprint of an instance of the "trampoline" class rather than the footprint of the class itself. This will typically save between one and two orders of magnitude in the overall footprint of the event-handling code in an application.

Most importantly, the "trampoline" class that we have implemented exposes all of its state using the Beans conventions. It can therefore be archived in the same way that any other bean is archived – as a textual representation of its public properties.
 

Summary

The new input and output streams complement the binary serialization support that was introduced in JDK 1.1 with support for some new formats. The new streams have been implemented to a set of design goals that make them more suitable than binary serialization as a persistence mechanism for user interfaces. Committing the archives to the public APIs of the classes to which they refer makes the archives inherently more robust than those that contain private state. The redundancy elimination system used in the new streams makes the new formats attractive in that they are both human-readable and, in most cases, one or two orders of magnitude smaller than their binary equivalents.

Downloading the Archiver Package

Note: This version of the archiver works only with the 1.3 version of the Java 2 SDK.
To try out the new streams, download archiver.zip (~100 K), unzip it, and follow the instructions in README.txt. Information on using the new streams is in The Archiver Package and How to Use the New Output Streams
 

Downloading the Bean Builder

Note: This version of the Bean Builder works only with the 1.3 version of the Java 2 SDK.

To show how the new streams can be used in an IDE environment to save the designs that a user creates, we have built a simple BeanBox-style PropertyEditor/GUI builder and included support for persisting designs, including event handling, as XML documents.

Click to enlarge

Like the original BeanBox, the builder is not a commercial product and is intended to serve only as an example of how these techniques would be used in a real IDE. This builder takes the original BeanBox concept forward a little by showing not just how the properties of a single Bean can be manipulated but how a group of Beans can be "wired up" to make the user interface part of an application.

To try out the builder, download beanbuilder.zip (~580 KB), unzip it, and follow the instructions in README.txt.



Oracle is reviewing the Sun product roadmap and will provide guidance to customers in accordance with Oracle's standard product communication policies. Any resulting features and timing of release of such features as determined by Oracle's review of roadmaps, are at the sole discretion of Oracle. All product roadmap information, whether communicated by Sun Microsystems or by Oracle, does not represent a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. It is intended for information purposes only, and may not be incorporated into any contract.