Sun Java Solaris Communities My SDN Account Join SDN
 
Article

File Formats

 

File Formats

As explained in the earlier sections, the archiving package deals with the syntax of the archives as a separate and replaceable last stage of the writing process and as a separate and replaceable first stage of the reading process. This way, the bulk of the archiving infrastructure is syntax-independent and we insulate the work that goes into providing special support for the archival of new JavaBeans from the ongoing process of supporting new file formats, such as XML and its many emerging schemas. This section deals with the syntaxes that are provided in this release.

In the current implementation we encapsulate the "syntax modules" that support the different formats as concrete implementations of the ObjectInput and ObjectOutput interfaces that already exist in the java.io package. In the java.io package in JDK 1.1, these interfaces had only one concrete implementation: the ObjectInputStream and ObjectOutputStream classes that provided support for reading and writing to streams in a binary format. We will not discuss the details of the ObjectOutputStream and ObjectInputStream here, for more details on these implementations please refer to the online documentation on the serialization support that was introduced in JDK 1.1.

The new streams included in this package augment ObjectOutputStream and ObjectInputStream with support for XML, Java, and BeanScript. Here are the names of the streams that are used for each format.
 

Format OutputStream InputSream
XML XMLOutputStream XMLInputStream
Java JavaOutputStream <NONE>
BeanScript BeanScriptOutputStream BeanScriptInputStream

The XML format can be used to write object graphs as XML documents. Although the XML archives are slightly more verbose than the other formats, they conform to the W3 specification and so can be viewed and manipulated by generic XML tools. The JavaOutputStream produces standard Java code that conforms to the Java Language Specification. There is no JavaInputStream, so .java archives have to be compiled and loaded into a VM as classes. The BeanScript format is much like Java but is simplified so that it can be parsed using very simple techniques. Because it is concise, it can be used to create a useful command-line interpreter for experimenting with beans and their properties. Each format is described in more detail below.

The XML Format

Here's an example of an archive that defines a simple Swing GUI. It represents a JPanel which contains a number of components. One of the components is a JButton which has a custom object as its action listener.
<JAVA-OBJECT-ARCHIVE VERSION="0.1"> 
  <CLASS ID="JPanel" NAME="javax.swing.JPanel"/> 
  <CLASS ID="JButton" NAME="javax.swing.JButton"/> 
  <CLASS ID="Test" NAME="Test"/> 
  <CLASS ID="Rectangle" NAME="java.awt.Rectangle"/> 
  <CLASS ID="Integer" NAME="java.lang.Integer"/> 
  <CLASS ID="JTextField" NAME="javax.swing.JTextField"/> 
  <OBJECT ID="JPanel0" CLASS="JPanel"> 
    <OBJECT METHOD="add"> 
      <OBJECT ID="JButton0" CLASS="JButton"> 
        <OBJECT METHOD="addActionListener"> 
          <OBJECT CLASS="Test"/> 
        </OBJECT> 
        <OBJECT PROPERTY="bounds" CLASS="Rectangle"> 
          <OBJECT CLASS="Integer" VALUE="10"/> 
          <OBJECT CLASS="Integer" VALUE="20"/> 
          <OBJECT CLASS="Integer" VALUE="100"/> 
          <OBJECT CLASS="Integer" VALUE="20"/> 
        </OBJECT> 
        <OBJECT PROPERTY="text" VALUE="cut"/> 
      </OBJECT> 
    </OBJECT> 
    <OBJECT METHOD="add"> 
      <OBJECT ID="JTextField0" CLASS="JTextField"> 
        <OBJECT PROPERTY="nextFocusableComponent" IDREF="JButton0"/> 
        <OBJECT PROPERTY="bounds" CLASS="Rectangle"> 
          <OBJECT CLASS="Integer" VALUE="30"/> 
          <OBJECT CLASS="Integer" VALUE="50"/> 
          <OBJECT CLASS="Integer" VALUE="200"/> 
          <OBJECT CLASS="Integer" VALUE="20"/> 
        </OBJECT> 
      </OBJECT> 
    </OBJECT> 
    <OBJECT PROPERTY="layout"/> 
    <OBJECT PROPERTY="bounds" CLASS="Rectangle"> 
      <OBJECT CLASS="Integer" VALUE="0"/> 
      <OBJECT CLASS="Integer" VALUE="0"/> 
      <OBJECT CLASS="Integer" VALUE="539"/> 
      <OBJECT CLASS="Integer" VALUE="366"/> 
    </OBJECT> 
  </OBJECT> 
</JAVA-OBJECT-ARCHIVE> 

This is just one possible XML format and is an example of a declarative style encoding where the structure of the document reflects the containment relations amongst the objects in the graph. We use the term "containment" to describe the relationship between any object that holds references to other (child) objects. All property values of a JavaBean are deemed "contained" by the bean that owns them as is all other state that can be directly managed by the API of the bean. In this sense, a Vector contains its elements, a Container contains its Components, a JButton contains its listeners, etc.

The format we have chosen is closely tied to the Java language in that, as well as being able to define properties on objects, it is also possible to call methods on them. The advantage of this is that the persistent state which is not represented as JavaBeans properties, like the elements of a Vector or the listeners of a JButton, can be included in the XML document and interpreted by a simple evaluator that has no knowledge of Vectors or JButtons. So in the example above, the line:

     <OBJECT PROPERTY="text" VALUE="cut"/>

declares the "text" property of the enclosing object,  the JButton, to have the (String) value: "cut". In Java this could be written:

     JButton0.setText("cut");

The outermost object, the JPanel, contains the JButton and this relationship is represented by the line:

    <OBJECT METHOD="add">
      <OBJECT ID="JButton0" CLASS="JButton">
         ...
      </OBJECT>
    </OBJECT>

This code is declaring the JButton as part of the state of the JPanel, not as a property, but by the add method, the side effect of which will change  the state of the JPanel when the file is read. In this case, the expression that creates the JButton is an argument to the enclosing method, rather than the value of a property. In Java, this fragment could be written:

    JButton JButton0 = new JButton();
    JPanel0.add(JButton0);

Information on how these expressions can be generated automatically by the XMLOutputStream is contained in the BeanInfo of, in this case, the JPanel class (actually the Container class from which the JPanel inherits). It is possible to provide any class with "meta data" so that state which does not follow the simple beans conventions can be written to an archive.

Many other XML persistence schemes (including all of our first implementations) solved the meta data problem by using "synthetic properties" or "registries" of meta data to handle the special cases. Some of these schemes can produce smaller XML documents because the information about how these pieces of data should be added to the objects when they are read is not part of the archive. The disadvantage of these schemes however is that the XML document cannot be read unless the registries exist in the (run time) environment which reads them. Moving to a more explicit format allows all the special cases to be dealt with entirely in the design time environment; with all class-specific "meta data" being included as part of the class's BeanInfo.

The Structure of the XML Encoding

All XML archives begin with a single XML Element:

<JAVA-OBJECT-ARCHIVE VERSION="0.1">

and end with:

</JAVA-OBJECT-ARCHIVE>

This seemingly superfluous element serves only to denote the version of the archive and place the (potentially many) object graphs that were written to the archive in a single block (a requirement for a well formed XML document).

In the main body of the archive, there are just two tags: OBJECT and CLASS.

The CLASS tag has two attributes: ID and NAME. The CLASS tag is used, before the actual object graph, somewhat like an import statement in a Java program to create a reference to a class with a given, fully qualified, name. Like import statements, these statements do not directly contribute to the object graph but will define all the classes that will be required to create it.

After the part of the archive that declares the classes comes the actual object graph which is represented as a set of nested elements using a single tag: the OBJECT tag. OBJECTS typically contain other OBJECTS and form a tree of nodes which are "contained" within each other. Circularities in the object graph are closed with XML's built-in ID and IDREF directives. The OBJECT tag has five other attributes: PROPERTY, PROPERTYREF, CLASS, METHOD and VALUE. Properties are set using the PROPERTY attribute and retrieved with the PROPERTYREF attribute. Methods are called using the METHOD attribute and applied to classes, rather than instances, when the CLASS attribute is defined. The default method name is new and this is used to instantiate a new instance of a class when the CLASS attribute is defined. The VALUE attribute is used to denote a literal String. Entries that have no PROPERTY or METHOD attribute are assumed to be arguments to an enclosing method.
 

The Java Format

The JavaOutputStream creates .java files in accordance with the Java Language Specification. There is no JavaInputStream, instead Java files must be compiled and loaded into the VM as .class files. No information is available from the .java file other than what is defined in the Specification and so no information is lost when the file is compiled and loaded as a class. All parts of the file may be edited and all edits are valid provided they result in a .java file which compiles and creates the desired object graph when the resulting .class file is loaded. If the JavaOutputStream is used to write out an object graph from a modified .java program the .java program will, almost invariably, be structured differently to the original .java file. Even so the new .java file will, within the limitations of the meta data defining the objects it contains, produce the same object graph as the original.

The JavaOutputStream is the least mature of the implementations in this release and does not yet fully support all of Java's syntactic constructs correctly. It is included in this release to show the strong link between the meta data that is used to generate archives and the data which is required by a traditional code generator.
 

The BeanScript Format

Here is an archive representing the same object graph as in the XML archive above, in BeanScript:
 
{ 
    let JPanel0 = Class.forName("javax.swing.JPanel").new(); 
    let JButton0 = Class.forName("javax.swing.JButton").new(); 
    JButton0.addActionListener(Class.forName("Test").new()); 
    let Rectangle = Class.forName("java.awt.Rectangle"); 
    JButton0.bounds := Rectangle.new(10, 20, 100, 20); 
    JButton0.text := "cut"; 
    JPanel0.add(JButton0); 
    let JTextField0 = Class.forName("javax.swing.JTextField").new(); 
    JTextField0.nextFocusableComponent := JButton0; 
    JTextField0.bounds := Rectangle.new(30, 50, 200, 20); 
    JPanel0.add(JTextField0); 
    JPanel0.layout := null; 
    JPanel0.bounds := Rectangle.new(0, 0, 539, 366); 
    JPanel0 
}; 

BeanScript is a very simple file format which fulfills all of the fundamental requirements of an archive of a graph of JavaBeans. BeanScript files are typically a little smaller than the corresponding .java and .class files that contain the same information and are almost always significantly smaller than their XML counterparts. After compression we think this format is close to optimal though indented XML benefits hugely from the compression step.

Unlike the XML format, BeanScript is procedural and each syntactic construct corresponds to an action to be taken when the file is read. It defines syntactic forms for the three essential operations of the unarchiving process: instantiating objects, giving them a name and setting properties on them. For familiarity's sake, the syntax is Java-like except for the omission of the static type information which is not required by the evaluator.

The syntax of BeanScript is defined by a simple grammar involving the following special characters: ".", "=", ":", ";", ",", "()", "let" and "{}". Parentheses are used for Java-style method invocations and curly braces denote a block. A block construct is required to delimit multiple writeObject calls to a single file. The last statement in a block is the result and will typically be the value returned by readObject. These syntactic constructs have the same precedences as they do in Java and may be mixed freely.  End-of-line characters are treated as white space.

The syntax for calling methods is the same as it is in Java and statements are terminated with the ";" character. To call a method "m" defined by an object "o" on arguments "a1", "a2", etc. we write:

    o.m(a1, a2, ...);

Local variables may be declared using the Java conventions, except that the static type of the variable is replaced with the keyword "let". Eg.

  let a = 1;

Assignment to a variable or property is written:

    a := c;

In fact, the ":" and "let" tokens are currently thrown away by the parser; the "a = b" syntax can be used in place of the longer forms. The reason these extraneous tokens are included in the archives generated by the BeanScriptOutputStream is so that the  precise meaning of the assignment and declaration operators can be changed whilst preserving backward compatibility of existing archives.

Blocks are the equivalent of LISP's PROGN and evaluate a list of expressions returning the value of last one. E.g. the following block evaluates to the number 2.

{1; 2};

Like LISP and Scheme there are no restrictions on which constructs can appear inside others, it is possible to use an assignment or a block as an argument to a method. E.g.:

o.m({let x = 1; x});

BeanScript is sufficiently concise that an interactive interpreter, which can be used to instantiate and introspect user interfaces on the fly, can be created simply by using the BeanScriptInputStream as a wrapper around the VM's default input stream: System.in. The TestShell example program in this release does exactly this when no file is passed to it on the command line.

To run the interpreter type:

java TestShell

You can then type expressions in BeanScript, which will be parsed an evaluated like the statements in a block.

> F = Class.forName("javax.swing.JFrame");
class javax.swing.JFrame
> f = F.new();
javax.swing.JFrame[frame0,0,0,0x0, ...]
> f.visible;
false
> f.visible = true;
true

When the visible property of the JFrame is set to true, you should see a window appear on the desktop.
 

Evaluation Semantics

All of the readers use the same evaluation model and pass an expression tree to the evaluator as the last part of their readObject implementation. Object arrays are currently used to represent the parse tree. The evaluator is a minimal evaluation engine defining the six operations required in the reconstruction of an object graph; it uses the simple conventions of the Scheme programming language.

The interpreter has four variables bound on startup:

  • null
  • true
  • false
  • Class
Numbers evaluate to themselves. All other objects, including arrays, are considered non-primitive and must be instantiated explicitly using new. The properties of an object are the read-write properties defined by the introspector. Arrays are treated as objects with properties that are Integers instead of Strings.

Class objects can be created using the static forName method in Class.class. Once they are instantiated, classes can be used to:

  • Get/set the static members of a class.
  • Invoke the static methods of a class.
  • Invoke the newInstance methods of a class's constructors.
  • Invoke the instance methods "inherited" from Class.class.

Back to Persistence Article