We'll describe the algorithm used here by refining a naive reflective
"walk" of the object graph.
An Algorithm for Persisting Beans as Text
[For illustrative purposes we will output text a in Java-like language
though the same techniques apply to other formats such as XML].
The simplest way to do this is as follows (we omit the termination
conditions which include a hash table containing the visited nodes):
writeObject(Object node) {
nodeName = <generate a symbol>
write(nodeName + " = new " + node.class +
"()\n");
for each property p in node.class.properties
do {
write(nodeName +
"." + p.name + " = " + writeObject(p.value(node)) + "\n");
}
return name;
}
There are a number of special cases which need to be handled to
make this work on all classes involved in the faithful storage of
a Swing user interface. These include:
- Classes that do not have a nullary constructors, e.g. Color,
Rectangle, Dimension, Integer, String, etc...
- Classes whose public constructor has parameters that refer
to private state, e.g. EmptyBorder, JScrollPane$ScrollBar, etc...
- Classes that have no public constructors: Method, Class, array.
- Classes that do not expose all of their useful state as standard
properties: Vector, Hashtable, Container, array.
The ability to be able fix these issues without being able to change
the implementations of the classes involved is an essential part of
any practical scheme that relies purely on public APIs. We have produced
an API to allow users of this form of persistence to manipulate a
property schema defined by a class.
Here are our solutions:
- Meta data specifies the parameters of a constructor as named
properties (e.g. Color("red", "green", "blue")).
- A comprehensive defaults mechanism that obviates the need for
these instantiations. (see: Elimination of Redundancy).
- Meta data describes the method by which these instances should
be created.
- Meta data describes the method by which these instances should
be modified.
We will not go into this further here other than to report that these
anomalies exist and that we have produced some APIs to allow the manipulation
of the table of meta data which is required to resolve them. To simplify
the rest of this document we will describe the rest of this algorithm
as if each object in the graph is a regular bean with a nullary constructor
and an ordered list of public properties.
The biggest problem with this technique is the size of the files
that are created. The algorithm walks deeply into the internal structures
of most Swing components and generates large archives which are
mostly copies of the default values of the objects they contain.
The rest of this document explains the techniques we use to eliminate
this default information and make the archives much smaller than
their .ser equivalents.
Elimination of Redundancy
The algorithm outlined above has the problem that all properties will
be recorded for a Bean even if its constructor initializes some of
them by default. It is possible to ensure that this additional information,
defining the default value of each property, is available but only
by adding (a significant amount of) extra meta data for each class.
This is fundamentally error prone since it leaves the default information
two places: the default value from the meta data and the default value
provided by the constructor. Nevertheless, this is a workable solution
and has been used in a number of other UI kits and IDEs. Inprise notably,
minimized this inconsistency by extending their language to support
the notion of defaults.
Typically (Delphi, VB, ...) this technique is only implemented
for primitive types, leaving default non-primitives types in the
position that they must always be instantiated. If the constructor
had already instantiated a value for a non-primitive property and
this value was of the same class as the desired value, it would
be much better to use that value as that would avoid the creation
of a new object. Also, the constructor may have already configured
this child object after it created it. There is little point in
writing into each archive, property setting code which effectively
duplicates the efforts of such a constructor.
[Alternatively, one could try to handle defaults for non-primitives
in the same way that they are handled for primitives. To do this
we could define a default value for all non-primitive properties
along with a means for comparing them - perhaps by recursively examining
their properties. This definition of `equals' - effectively
as `property equals' - implies an algorithm which passes over the
graph of objects more than once; this is undesirable for a number
of reasons.]
Again, we'll not go into the details other than to say that we
believe the following algorithm for handling defaults achieves all
of these goals and can do so in a single pass over the object graph.
Recursive differencing
The idea here is essentially to run the unarchiving process at the
same time as the archiving process, thereby creating a second copy
of the object graph as it it being written to the stream. This second,
in memory, copy of the object graph can then be used to determine
whether each property assignment that we are contemplating will have
any effect.
Example
Suppose we are currently examining the "state" property of an object
called "button0". Let us call the environment (hash table) in which
the original object and its children will be placed: "env1", and the
environment in which the clones are placed: "env2".
If env1.button0.state is true we will be considering writing the
following assignment to the output:
write("button0.state = true;\n");
We know that the destination environment will have a property env2.button0
as well. If the value of env2.button0.state is also true we can
skip writing out this statement since we know that when the file
is finally read, this statement will have no effect. This handles
defaults for primitive values nicely. The surprising part is that,
because we only ever compare objects from the same environment,
this exact technique is all that we need to do to handle non-primitive
values as well.
Preservation of Identity
Throughout our implementation we use "==" to test for equality instead
of equals. Internally the hashtable that is used to detect circularities
in the object graph is a special hashtable which uses "==" instead
of "equals" and "System.identityHashcode" instead of "hashCode".
The result is just what we need, that all objects have the right
properties assigned to them and objects that were shared
in the source environment end up being shared in the destination
environment.
Example
Consider the following pseudo-Java program fragment:
INPUT:
l1 = new JList();
l2 = new JList();
writeObject(new Object[]{l1, l2});
This will produce an archive equivalent to the following:
OUTPUT:
l1 = new JList();
l2 = new JList();
result = new Object[]{l1, l2};
So, two lists with models that have exactly the same properties
are instantiated by their constructors producing two different lists
containing two different models.
Example
INPUT:
model = new DefaultListModel();
l1 = new JList(model);
l2 = new JList(model);
writeObject(new Object[]{l1, l2});
OUTPUT:
l1 = new JList();
l2 = new JList();
l2.model = l1.model;
result = new Object[]{l1, l2};
Here the two lists have the same model, and the file reflects
this. Moreover, the algorithm has realized that since the first
list already has a model of the right type there is no point in
throwing away the model inside the list, l1, only to replace it
with another fresh instance of the same model. If there are
any public properties, they can be set on the model which was created
by the constructor. If not (as here), the default model contained
in the Jlist is indistinguishable from one that we would instantiate
afresh.
Efficiency
At first sight this technique seems expensive. In fact, its performance
characteristics are extraordinarily well suited to the problem domain.
First of all, extra time is spent at the writing stage so as to minimize
the size of the file. For Applet use, downloading time is critical
and minimizing the size of resources that will be vended by a server
is paramount. Secondly, any successful Application will be read hundreds
of times more than it is written. It is therefore a good trade to
spend more time in writing the archive if this can reduce the time
taken to read it.
It is also worth highlighting that the cost of performing incremental
cloning during the write operation is, in both time and heap allocation,
equal to the cost of reading it. If this operation is unduly costly
- for any reason - the resulting file will be too costly to load
anyway and is probably of little value.
Error Handling
To prevent archives form becoming useless to future versions of a
Java package, comprehensive error recovery is required. Some
Beans may contain properties that cannot be set to certain values
when the Bean is in a certain state. During the writing phase, we
will be setting these values in our cloning environment and can therefore
catch any exceptions and skip the emission of the offending assignment.
This guarantees the correctness of an archive to the point that the
reading process is sure to proceed in another (identically configured)
VM without error.
Of course, a VM containing a different version of the library defining
the classes may throw errors in different places, so the input stream
must also catch errors encountered in the evaluation of its statements.
When an error is encountered during the reading process the statement
is discarded and evaluation continues with the next statement. If
the statement that threw the error was an instantiation that was
intended to bind a name to a value in the evaluating environment,
this variable is left unbound. Any subsequent reference to that
variable will also throw an error and skip the statement that referenced
it.
Special Cases
Objects could, in theory, have undesirable side effects precluding
the use of this technique - forcing the default values of a class
to be stated explicitly. If an object only calls instance methods
it cannot change state anywhere outside the `cloning' environment
in which it is created. This is the normal case and exceptions to
it seem to be very rare. Static methods uniquely provide a means for
the clones to change state in areas outside their environment.
So far, we have only found one such example of this sort of side
effect, and that is in the awt's Window class.
INPUT:
window = new Window();
writeObject(window);
OUTPUT:
window = new Window();
window.name = "window0";
result = window;
Here, a static variable is being incremented as each window is
created, the defaults system notices this and sets the property
so that it is identical to the original.
Note also that the Window has one other undesirable side effect:
if the "visible" property is true, for example, a second window
would appear on the screen every time a window object was archived!
For this reason the default values of a Window are handled as a
special case.
Other special cases are possible, perhaps a GUI might log into
a database on startup. If the database only accepted one entry at
a time, such a GUI could not be written out without manually removing
the offending properties. Again, the APIs that we have defined for
the definition of properties allow these potential problems to be
circumvented at development time.
Advantages
We have mentioned how this technique reduces the size of the archives
produced. Smaller files, with less redundant information, are not
just good because they take less time and space to use, they are also
less contingent on the stability of an API.
Interestingly, the reason we invested so much time in this defaults
system is that the list of properties provided by the Introspector
caused our serialization code to walk very deeply into the internal
structures of the objects that were being serialized. Taking each
of Swing's top level components we found that roughly half of them
contained at least one object that could not be instantiated naively.
Invariably, the problem classes did not have nullary constructors
and did not expose enough of their state to produce the values required
in the constructors they did define.
More than 99% of the "problem cases" in swing were fixed by using
the defaults scheme above.
Side Effects
In Swing's top level components there are many 'cover methods' that
expose, for example, the properties of the models they contain. Since
these models also expose their properties using the Beans conventions
the properties will be detected by the introspector. This brings up
the possibility of setting properties twice, unless the list of properties
is manually configured to remove the duplicates.
Example
Suppose we have an instance of Swing's JList class: "list0" with properties
"model" and "count" - where the list's model also has the property
"count". The two possible assignments are:
list0.count = 10;
list0.model.count = 10;
An interesting and desirable consequence of using the incremental
cloning technique is that this does not matter which order the "model"
and "count" properties are defined in the JList class. The reason
for this is that the side effect of setting the "count" property
on the list is precisely as we would expect - it sets the "count"
property of the model it contains. Since this happens to the cloned
version of the input, by the time we come to compare the values
in the second assignment they will be "equal" and the assignment
will be skipped.
Ordering of Properties
In all the Swing classes the accommodation of side effects reduces
the task of providing a good serialization strategy for each Swing
component to that of providing a list of properties to record and
an order in which they should be recorded. In general, it is most
efficient to vend the properties in the order where the properties
with the most side effects appear first. That way, the majority of
the state in the properties which follow can be skipped and the file
will be shorter.
Conclusion
We have spent the last twelve months or so refining a technique for
persisting user interfaces that use the Swing toolkit by restricting
all our archives strictly to the use of public APIs. In practice this
step appears to make the archives "version resilient" in the same
way that old Java programs will compile and run with backward compatible
APIs of newer libraries.
With such a large library, general techniques turned out to be
far more effective than Swing specific ones. So much so that our
final code base is entirely independent of Swing (the class defining
the meta data is the only class which imports the Swing classes).
We have built a simple BeanBox style builder to demonstrate the
construction and serialization of interfaces and, so far, all Swing
components can be written in a compact form and read without error.
According to our tests, time spent parsing the files and making
the reflective calls is insignificant compared to the raw cost of
performing the operations specified in them. We think the resulting
file format (at least in structural terms) is as small as it is
practical to make it - normally being significantly smaller than
the normal .java, .class and .ser representations that contain the
same information. If the files are minimally small (which has only
been demonstrated empirically) this implies this is the fastest
of all techniques that read archives written in textual (human readable)
formats.
Back to Persistence Article
|
|