[Contents] [Prev] [Next] [Index]

CHAPTER 7 - Object Mutability:
Strings and Other Things

mu·ta·ble (myoo2 t@ b@l), adj. 1. liable or subject to change or alteration
2. given to changing; inconstant

Random House Webster's Dictionary

While most objects are mutable, some are not. For example, any bean that provides a setXXX method is mutable. Immutable objects can be used to define values or attributes that you don't want to be changed. For example, the class in Listing 7-1 could be used to define mathematical concepts such as pi or the speed of light in a vacuum. A simulation might set up these values in a method called bigBang; once they are set, the immutability of the MathematicalConstant class prevents them from being modified.

public class MathematicalConstant {
	private double value;
	public MathematicalConstant(double value) {
		this.value = value;
	}
	public double getValue() {
		return value;
	}
}
Immutable objects

Even though this example is somewhat academic, there are many cases where immutable objects are used in everyday programming. (The primary example of a class with immutable instances is String, which is discussed in Section 7.2.)

The choices you make when handling objects must take into account their mutability. With both mutable and immutable objects, it's possible to create numerous, useless, intermediate objects with seemingly benign usage. The allocation, initialization, and collection of these short-lived useless objects can cause major inefficiencies in your software, even when running on an advanced runtime such as the HotSpot VM.

7.1 Lots of Little Objects

The creation and destruction of objects is a performance bottleneck in most object-oriented languages. Many Smalltalk and C++ programmers have learned to be wary of allocating too many small objects. Developers using the Java programming language should share this concern. Creating many short-lived objects is a common performance bottleneck for software on the Java platform.

When you allocate a Java object with the keyword new, you are causing many things to happen. First, space is allocated on the heap for the object. Then, the class's constructor is called, and the class's fields are initialized. The object's status is then tracked so the garbage collector can determine if it should remove the object from the heap. (For a more detailed explanation of the lifecycle of an object, see Appendix A, The Truth About Garbage Collection.)

While there are obviously costs associated with creating objects, the situation is improving. Modern JVMs, such as the HotSpot VM, provide much faster object allocation and improved collection mechanisms. However, there will always be costs associated with object allocation.

It is important to note that while creating objects can be an issue, it isn't always a problem. Objects are a key part of the Java programming language. You can't write a program without creating objects. You just want to be cautious when the number of objects you're allocating becomes very high-for example, when allocating objects inside loops. As with other optimization decisions, you should let your profiler be your guide. If your profiling tools show that a large amount of time is being spent allocating a particular type of object, then you can use the techniques discussed in this chapter to reduce the number of objects used.

See Section 7.6 for more information about object allocation and collection in the HotSpot VM. For information about the technical details of HotSpot's GC system, see Section B.2.1 in Appendix B.

7.2 Handling String Objects

Text processing of one type or another is central to many types of software-Java servlets, for example, often perform a lot of String processing. The String class is typically used to represent text and offers many convenient methods that help with basic text processing tasks. For heavy-duty text processing, however, some uses of the String class can become major performance bottlenecks.

Most of the problems with using String stem from the fact that String objects are immutable. Once they've been created, they cannot be changed. Operations that might appear to modify String objects actually generate completely new ones.

This is one of the reasons that the java.lang.StringBuffer class exists. The String and StringBuffer classes are meant to be used together. This relationship even extends to the implementation of Java language compilers such as javac. For example, when javac encounters the code snippet

String xyz = "x" + y + "z";
It automatically transforms the code to

 String xyz = new StringBuffer().append("x")
                                .append(y)
                                .append("z")
                                .toString();
This gives you an idea how String concatenation actually works. Note that two objects are created to perform the transformation: A new StringBuffer is created explicitly and a new String is returned from toString. Knowing this, the problem with concatenating a number of String objects as shown in Listing 7-2 becomes obvious.

String result = "";
for (int i=0; i < 20; i++) {
    result += getNextString();
}
Concatenating String objects
The javac compiler would automatically transform this to

 String result = "";
 for (int i=0; i < 20; i++) {
 	result = new StringBuffer().append(result)
                                .append(getNextString())
                                .toString();
 }
 

This code creates two objects every time through the loop-one StringBuffer and one String (via the call to toString). That's OK if you're only going to iterate over this loop a few times, but if you're going to be executing this code often you might want to handle the String objects differently. The code in Listing 7-3 produces the same results, but does not allocate any objects inside the loop. This approach is much more efficient.

String result = "";
StringBuffer buffer = new StringBuffer();
for (int i=0; i < 20; i++) {
    buffer.append(getNextString())
}
result = buffer.toString();
Concatenating String objects more efficiently

Another important fact to note is that the creation of extra String instances is not limited to occasions where the overloaded mathematical operators are used. There are several methods in the String class that generate new instances, including

Anytime you find yourself using one of these methods in a compute-intensive part of your code, you might want to consider using a StringBuffer.

7.3 Mutable Objects in AWT and Swing

The java.awt package defines several classes that encapsulate geometric information . These geometry classes are shown in Table 7-1.

AWT Geometry Classes

Class


Description


Point


(x,y) location in space


Dimension


Component width and height


Insets


Representation of the borders of a container


Rectangle


Area in a coordinate space

The java.awt.Component and java.awt.Container classes define methods to access certain geometric information. These methods are shown in Listing 7-4.

 public Point getLocation();
 public void setLocation( Point loc);
 public Dimension getSize();
 public void setSize(Dimension size);
 public Insets getInsets();
 public void setInsets(Insets insets);
 public Rectangle getBounds();
 public void setBounds(Rectangle bounds);
Methods for accessing geometric information

This functionality illustrates the importance of decisions about mutability. What happens when the following code is executed?

 Rectangle bounds = button.getBounds();
 bounds.x += 10;
 
Is the component moved? The answer has to be no. AWT needs to prevent this type of operation to avoid inconsistencies. For example, when the setBounds method is called, AWT makes sure that the Component is marked invalid. This ensures that layout is performed properly. Similarly, many other types of actions and notifications occur when the geometry-related set methods are called. If you could directly modify a Component object's internal data structures, you could easily put it into an inconsistent state.

How does AWT prevent modification of the internal state of a Component? It returns a newly created Rectangle object every time getBounds is called. The actual internal representation of data in the Component remains private and is never passed outside the Component itself. For example, the code in Listing 7-5 actually creates four separate Rectangle objects:

int x = button.getBounds().x;
int y = button.getBounds().y;
int h = button.getBounds().height;
int w = button.getBounds().width;
Component mutability

Although several of these objects can be created without having a detectable effect on performance, creating large numbers of temporary objects can negatively impact performance. Profiling tools can help you determine whether or not temporary allocations are affecting your application's performance.

Small Objects in Swing

When the Swing team began performance tuning version 1.0 of Swing, profiling tools revealed that a large number of small objects were being created in performance-sensitive areas. For example, 12 temporary objects were allocated every time a cell in a JTable was painted. Similar problems were uncovered in many areas of the system. Eliminating a large percentage of these temporary allocations made many operations in Swing nearly twice as fast.

7.3.1 Eliminating Temporary Objects

So how do you eliminate temporary allocations while still maintaining solid data encapsulation? There are several possible solutions-which one is best depends on the particular circumstances.

Swing added methods to provide access to the information in the geometry objects directly, which eliminated the need to copy the objects. For example, the following methods were added to the JComponent class:

 public int getX();
 public int getY();
 public int getHeight();
 public int getWidth();
Because these methods return primitive types instead of objects, there is no need to worry about encapsulation being violated. With these methods, rather than writing

int width = comp.getSize().width; // allocates temp object
you can write

int width = comp.getWidth(); // no allocation
The primary drawback to this approach is that is complicates the public API of the class, which usually translates to

Another problem with this solution is that it moves the responsibility for controlling mutability out of the object and into any object that wants to use it. In the previous Swing example, the Rectangle object's mutability is being controlled by JComponent. Any other class that wants to use Rectangle in a similar manner has to duplicate methods that already exist in JComponent.

For Swing, this was really the only solution-the Rectangle class has existed since JDK 1.0, and there were many reasons to reuse the existing class instead of creating one from scratch. However, if you don't have to deal with legacy classes, there are other solutions that can be very effective. Some of these are discussed in the next section.

7.4 Other Mutable Object Tactics

When you're designing new solutions rather than working with legacy code, you have more flexibility in how you choose to minimize temporary allocations. One tactic uses a concept similar to the const keyword defined in the C++ language.

7.4.1 Simulating const

If you've programmed in C++, you're already familiar with the concept of const objects. In C++, the const keyword allows you to specify that a particular object is to be treated as immutable. Any attempt to change a const object's state triggers a compiler error. Although the Java programming language doesn't provide a direct analog to const, it is fairly easy to structure your classes so that you can simulate it.

To demonstrate how simulating the behavior of const in a Java program can help minimize temporary allocations, we'll use two versions of a highly simplified physics simulation framework. The first implements the framework using traditional techniques similar to those used in AWT; the second uses the const technique. Both versions provide encapsulation of an object's internal data representation.

This simple physics simulation framework consists of two classes: Body and Location. A Body, as shown in Listing 7-6, has a mass and a location in space. A Location is a three-dimensional point that represents a body's position.

public class Body {
   private int mass = 10;
   private Location loc = new Location();

   public int getMass() {
      return mass;
   }
   public void setMass(int mass) {
      this.mass = mass;
   }
   public Location getLocation() {
      return new Location(loc.x, loc.y, loc.z);
   }
   public void move() {
      // we're just moving at random here
      // in a real sim we'd have forces and such
      loc.x += 1;
      loc.y += 2;
      loc.z += 3;
   }
}
Body

Listing 7-7 shows the Location class. Note that the getLocation method in the Body class returns a copy of the internally stored Location object-not a reference to the original. This is done to preserve encapsulation and prevents the Location fields from being modified by external code.

To analyze the performance of this small framework we can use a Simulation class. This class, shown in Listing 7-8, creates a large number of Body objects and performs various operations on them. This example simulation doesn't actually do any useful work, but it approximates the kind of work that might be performed in

const vs. final

The Java keyword final is often compared to the C++ const keyword, but they are in fact very different. Both const and final can be used to describe local variables, as well as object fields. When used in this context the two keywords are fairly similar. However, both const and final have other uses. The const keyword becomes very interesting when paired with the C++ reference mechanism. Together, they allow you to create const references to objects. For example, consider a C++ member function with the following prototype:

const Rectangle& getBounds();
This declares that the getBounds member function returns a const Rectangle reference. Although the Rectangle class can be constructed in such a way that objects of that type are normally mutable, anyone that calls this getBounds method will be unable to modify the state of the Rectangle that is returned. The Java language's final keyword has no similar functionality.

a real simulation. (A real simulation might simulate the effects of gravity or some other force.)

public class Location {
   public int x;
   public int y;
   public int z;
   public Location() { }
   public Location(int x, int y, int z) {
      this.x = x;
      this.y = y;
      this.z = z;
   }
}
Location
public class Simulation {

   static ArrayList bodies = new ArrayList();
   static final int NUM_BODIES = 200;
   static final int TIME_STEPS = 100000;
   
   public static void main(String[] args) {
      for (int i = 0; i < NUM_BODIES; i++) {
         bodies.add(new Body());
      }
      Stopwatch timer = new Stopwatch().start();
      for (int i = 0; i < TIME_STEPS ; i++) { 
         doTimeStep(i);
      }
      timer.stop();
      System.out.println(timer.getElapsedTime());
   }

   public static void doTimeStep(int timeStep) {
      Iterator iter = bodies.iterator();
      while (iter.hasNext()) {
         Body body = (Body)iter.next();
         body.move();
         Location loc = body.getLocation();
         log(body, loc, timeStep); 
      }
   }

   public static void log (Body body, Location loc, int time) {
      // log this info to somewhere
   }
}
A simple simulation

Simulation Profiling Results

Method


Time


Body.getLocation


29.0%


Simulation.doTimeStep


20.7%


Location.<init>


11.7%


Body.move


7.9%


java.util.AbstractList$Itr.hasNext


6.3%

Running this simulation on our test configuration takes about 16 seconds. Using a profiling tool to analyze the simulation gives us a better understanding of where the time is spent.

The profiling results in Table 7-2 show that more than 40 percent of the time it takes to run the simulation is spent in two methods: Body.getLocation and the constructor for the Location class. Almost all of this overhead is related to copying the returned Location objects.

In a real simulation, more work would likely be done in Body.move or elsewhere in the Simulation class, so the percentages might be quite different. However, the overhead of copying the Location objects is still likely to be significant.

Since the profiling results indicate that a significant amount of time is being spent copying the Location objects, this is a good candidate for optimization. There are a number of ways you can improve performance in this situation without sacrificing encapsulation. One solution would be to do what Swing did for its geometry objects-add accessor methods to Body:

 public int getX();
 public int getY();
 public int getZ();

This would improve performance, but there are drawbacks. For example, if the simulation framework were more full-featured there might be many internal objects. This could cause an explosion in the number of these accessor methods. For example, the interface of your Body class might have to change to include

 public int getLocationX();
 public int getLocationY();
 public int getLocationZ();
 public int getVelocityX();
 public int getVelocityY();
 public int getVelocityZ();
 // and even more
 

Adding many methods like this to your public API can needlessly complicate your code. A better alternative would be to move the concept of mutability into the Location object. To do this, you can split the single Location class into two classes-one that is immutable and one that is mutable. Listing 7-9 shows the modified Location class.

Note that two things have changed from the original version in Listing 7-7. First, the fields of the class have been changed from public to protected. This means that these fields can only be accessed by subclasses of Location, or by other classes in the same package. Any client code outside the package that contains this class will be denied access to the fields. Since the fields cannot be directly accessed, get methods have been added for read-only access.

public class Location {
   protected int x;
   protected int y;
   protected int z;

   public Location() { }
   public Location(int x, int y, int z) {
      this.x = x;
      this.y = y;
      this.z = z;
   }

   public final int getX() { return x; }
   public final int getY() { return y; }
   public final int getZ() { return z; }
}
The new Location class

There are times when you need a mutable version of the Location class. The MutableLocation class, shown in Listing 7-10, is a subclass of Location. The main purpose of this subclass is to enable modification of the object's internal fields. This is done by adding set methods for each field.

public class MutableLocation extends Location{
   public MutableLocation() { }
   public MutableLocation(int x, int y, int z) {
      super(x,y,z);
   }
   public final void setX(int x) { this.x=x; }
   public final void setY(int y) { this.y=y; }   
   public final void setZ(int z) { this.z=z; }
}
MutableLocation

Once you have the separate Location and MutableLocation classes, you can easily create an approximation of the C++ const facility. Internally, you store a MutableLocation object, but return it typed as a simple Location when you want to allow only read-only access. This is similar to returning a const reference in C++.

Listing 7-11 shows the changes that need to be made to the Body class to implement this behavior. In this version, the internally stored Location becomes a MutableLocation, and the getLocation method is changed to return a direct reference of the loc field, instead of a copy. Note that getLocation still returns a Location.

private MutableLocation loc = new MutableLocation();
public Location getLocation() {
      return loc;
}
Modifications to the Body class

If the following code is written in a package separate from the Location class, it will now cause compile-time errors:

Location loc = body.getLocation();
 loc.x = 5; // field x is not accessible
 loc.setX(5); // method setX not found in class Location

Be aware, however, that it is possible to cast the returned Location object to a MutableLocation. The following code will work and is quite dangerous.

 Location loc = body.getLocation();
 MutableLocation mLoc = (MutableLocation)loc;
 mLoc.setX(5);
 
This is perfectly legal from the compiler's perspective and gives code in any package access to the internals of the Location object. By writing this code, however, you're explicitly asking to do dangerous things. Note that the C++ const keyword is subject to the same limitation. You can "cast away" const-ness, but do so at your own risk.

So, how does this new version of the Location code perform? Running the same simulation as before, the code executes in about 8 seconds-almost twice as fast as the previous version. These results are consistent with the profiling data we collected: The profiler indicated that almost half of the execution time was spent copying the Location objects.

7.5 Mutable Object Case Study

As part of the tuning efforts for J2SE v. 1.3, the java.math package was rewritten. The java.math package includes the classes BigDecimal and BigInteger. In older versions of J2SE, these classes were implemented mostly as C code. For version 1.3, they were ported entirely to use the Java language. (This project is discussed further in Section 9.3.2 on page 143.)

One of the goals of this project was to improve performance of these classes. BigInteger, much like String, is an immutable object. One of the key performance enhancements in the rewrite was to create a mutable version of BigInteger. A private class called MutableBigInteger was added to the package java.math, and although it isn't exposed as public API, it is used internally to speed up many operations. Mike McCloskey, the engineer at Sun who did most of the work on this project, had the following to say about it:

The original BigInteger is well designed and easy to use, but it has a major performance drawback in its immutability. When you perform multistep operations such as gcd, modInverse, and modPow, you have to create a new immutable number every step you take. Some of these operations take hundreds or thousands of steps, so it was absolutely necessary to make a mutable multiprecision number so the calculations could be done in place. Then you save copying the bits around, initializations of new numbers, allocating memory for new numbers, garbage collection of temporary numbers, etc. That's why we use the MutableBigInteger class behind the scenes.1

7.6 Small Objects and the Modern JVM

One of the advertised advantages of the new generation of JVMs, such as the HotSpot VM, is that they radically improve the performance of allocation and collection of small objects. Benchmarks of the HotSpot VM show that this isn't just marketing; it really does deliver major improvements in small object handling. However, initialization and allocation costs do still exist and can be significant in many cases.

Table 7-3 shows results for the simple physics simulation benchmark from the previous section. The Classic VM column shows the execution times under the classic virtual machine implementation with the Symantec JIT. The HotSpot VM column shows the execution times under the HotSpot Client VM.

Small objects under different JVM implementations

Test


Classic VM


HotSpot VM


Copy return result


30,370 ms


16,260 ms


Don't copy return result


7,520 ms


8,510 ms

Interestingly, the penalty for creating a lot of small objects is much greater with the classic VM implementation. Under the classic implementation, the version of the benchmark that creates all the small objects is over four times slower than the version that does not.

Under the HotSpot Client VM, the penalty for creating all of the small objects is significantly reduced. However, there is still an obvious penalty. This means that creating large numbers of small objects can still be an issue, although not as critical an issue as it once was. (For more information about how the HotSpot VM implements garbage collection, see Memory Allocation and Garbage Collection on page 208.)

7.6.1 Object Pooling

The small object penalty is well known, and has led programmers using older JVMs to live in fear of small objects. Many articles have been published on the topic of object caching or object pooling. The Object Pool pattern uses some type of collection (such as a Vector, Hashtable, or raw array) to store free lists of objects. Generally, when the program starts, a number of objects are put into the pool. Then when the program needs a new instance of the object, it simply gets it from the free list. When the immediate use of this object is over, it is returned to the free list.

In the past, object pooling was often used successfully. However, with the new generation of JVM implementations that include advanced memory management systems, object pooling small objects is often counterproductive. The overhead of managing the object pool is often greater than the small object penalty. Pooling can also increase a program's memory footprint. The need for many of these small objects can often be avoided altogether by having control over object mutability. Pooling small objects is not a recommended tactic when you're working with the new generation of JVMs.

Although pooling small objects isn't recommended with newer JVM implementations, pooling large objects or objects that work with native resources can be useful. For example, large bitmaps or arrays are often good candidates for reuse. Classes like Thread or Graphics that require native resources are also often excellent candidates for caching. Large arrays are also good candidates due to the overhead of clearing all of the elements during initialization.

In short, when making decisions about caching or reusing objects, let your profiler be your guide. If you find that you're spending a lot of time creating a particular type of object, and you can't control the creation by manipulating its mutability, then you might want to consider pooling. Be aware, however, that pooling might actually hurt performance when used with small objects on new JVMs; be sure to benchmark so you can compare the different solutions.

7.7 Array Mutability

Just as with object mutability, you need to be aware of array mutability. In fact, mutability is often more important with arrays because they can be much larger than a typical object. The Java 2 collections classes introduce a new interface called Iterator. It is possible to use the Iterator interface in a fashion that provides you with immutable arrays. The following example is designed to show why such a construct is needed.

Listing 7-12 shows a fragment of a class that might be part of a program designed to help ship packages.

public class ShippingInfo {
   private static final String[] states = {
      "AK", "AZ", "CA", "DE", "NV", "NY"};
   // more stuff down here
}
Simple shipping class

The following code fragment could be used to iterate through the list of states and print them out.

 for (int i = 0; i < ShippingInfo.states.length; i++) {
 	System.out.println(ShippingInfo.states[i]);
 }
This is easy enough, but using the final keyword with arrays can be tricky. The following code does not compile as you might expect.

ShippingInfo.states = new String[50];
It fails because you cannot assign values to final variables. The following code, however, is perfectly legal:

ShippingInfo.states[5] = "Java City";
This code replaces the entry for Nevada ("NV") with "Java City." Obviously, final arrays are not immutable, and passing them around can violate encapsulation. No syntax in the Java programming language provides a truly immutable array. This can lead to all kinds of inconsistencies. To avoid these problems, one solution is to make the following changes:

  1. Make the array private.
  2. Add a getStates method.
  3. Return a copy of the states array from the getStates method.

This preserves encapsulation, but is likely to cause performance issues-especially if the array is large. The array in the shipping example represents the 50 states and contains a relatively small number of elements, but in another situation the array might contain thousands of elements. For example, the array might represent the part numbers for all of the parts in a new car. One way to avoid copying the array and still maintain encapsulation is to create an Iterator.

Listing 7-13 shows a custom class that implements the Iterator interface. This listing creates the Iterator as an inner class, which gives it access to the private states array. Note that the remove method, which is required by the Iterator interface, throws an UnsupportedOperationException. This is how the Collections Framework enables you to create a read-only Iterator.

public class ShippingInfo {
   private static final String[] states = {
      "AK", "AZ", "CA", "DE", "NV", "NY"};

   public static Iterator getStates() {
      return new StateIterator();
   }

   public static class StateIterator implements Iterator {

      private int current = 0;
      /* from Iterator */
      public boolean hasNext() {
         return current < states.length;
      }

      /* from Iterator */
      public Object next() {
         return nextState();
      }

      /* from Iterator */         
      public void remove() {
         throw new UnsupportedOperationException();
      }
      /* custom typesafe next */         
      public String nextState() {
         if (current < states.length) {
            String state = states[current];
            current++;
            return state;
         } else {
            throw new NoSuchElementException();
         }
      }
   }
}
An Iterator as a read-only array

The following code snippet can be used to iterate through the array safely, without concern that it could be accidentally damaged.

 Iterator iter = ShippingInfo.getStates();
 while (iter.hasNext()) {
     System.out.println(iter.next());
 }
Note that the Java 2 Collections Framework provides a great deal of infrastructure for creating read-only collections. For more information on this feature, see Section 8.4.10, Immutable Collections.

The tactic of adding a wrapper object to hide the mutability of an underlying structure isn't unique to arrays. In fact, you can use this general approach to hide the mutability of many types of structures. Doug Lea discusses this idea in Section 2.4.3 of his book Concurrent Programming in Java.2

Key Points

  • Encapsulation is important, and an object's mutability has major implications for how an object's internal data representation should be protected.
  • Creating many objects has an impact on performance, even with advanced runtime systems such as the HotSpot VM.
  • String objects are immutable, thus many string operations create new String objects.
  • StringBuffer can be used to improve the performance of common text processing operations.
  • AWT, Swing and other libraries return a new copy of an object each time an accessor method is called. This can lead to the creation of many small objects.
  • There are tactics you can use to avoid copying mutable objects while still preserving encapsulation.
  • While there is no syntax in the Java programming language for creating immutable arrays, you can use the Iterator class to simulate them.



[Contents] [Prev] [Next] [Index]

1

From an email exchange with Mike McCloskey.

2

Doug Lea, Concurrent Programming in Java: Design Principles and Patterns, Second Edition, pp. 132-135. Addison-Wesley, 1999. Chapter 2 provides a good introduction to some of the problems associated with encapsulation in a multithreaded environment and is well worth reading.

Copyright © 2001, Sun Microsystems,Inc.. All rights reserved.