[Contents] [Prev] [Next] [Index]

Appendix B

The Java HotSpot
Virtual Machine

The Java Virtual Machine Specification1 describes the various behaviors that a JVM implementation must perform. However, there are many ways to implement these behaviors. Sun's original version of the JVM, which evolved from version 1.0 through version 1.2.2, was based on technology that had been in use for many years in other systems. Then, in 1999, Sun released the first version of the Java HotSpot virtual machine. The HotSpot VM uses cutting-edge techniques in the areas of memory management, thread synchronization, and dynamic compilation.

While most of the material in this book is relevant no matter which JVM you use, the HotSpot VM does represent an important part of the Java Platform's performance landscape. Starting with the J2SE v. 1.3 SDK, all of Sun's JVM implementations will be based on HotSpot technology. Also, several other vendors have licensed the HotSpot code base for inclusion in their own implementation of the JRE.

This appendix provides an overview of the HotSpot architecture and discusses how the HotSpot VM achieves improved performance. Sections B.4 and B.5 summarize the various option settings that can be used to control the behavior of the HotSpot VM at runtime.

B.1 HotSpot Architecture

There are two main parts to the HotSpot system: the runtime and the compiler (Figure B-1). The runtime portion includes a bytecode interpreter, memory management and garbage collection functionality, and machinery for handling thread synchronization and other low-level tasks. The compiler's job is simply to translate bytecodes into native machine instructions, thus improving execution speed.
HotSpot architecture
Note that it is possible to use the HotSpot VM as a fully compliant JVM without the compiler. The only difference will be a decrease in performance.

B.1.1 Two Versions of HotSpot

When the first version of the HotSpot VM was shipped in April 1999, it was dubbed the Java HotSpot Performance Engine. The Java HotSpot Performance Engine made major improvements in the performance of many server-side applications. However, it wasn't ideal for many client-side programs. The requirements for client and server can be quite different. For example, client programs often favor lower RAM footprint and faster start-up time over maximum computational performance. Due to these different requirements, the HotSpot technology was split into two lines-one for the client and one for the server. Figure B-2 shows how this breaks down.

HotSpot Compiler != javac

The terminology used here can be confusing. The javac tool included with the J2SDK is a source-code to bytecode compiler. The "compiler" included with the HotSpot VM is a bytecode to native machine-code compiler. Though they're both generically referred to as compilers, they perform very different tasks. This terminology sometimes leads to the mistaken impression that you have to compile your source code with a different compiler to use the HotSpot VM. This is not the case. Any class files that execute on older JVMs should run under the HotSpot VM without modification.

HotSpot product lines

As of version 1.3 of the J2SE SDK, all of Sun's implementations include a version of the HotSpot Client VM. The HotSpot Server VM is an optional add-on. The Client VM and the Server VM are very similar, and actually share a lot of code. The only part of the system that is different is the compiler. The Server VM contains a highly advanced adaptive compiler that supports many of the same types of optimizations performed by optimizing C++ compilers (as well as a few optimizations C++ compilers only wish they could do). The Client VM is much simpler. It doesn't try to perform many of the more complex optimizations performed by the compiler in the Server VM, but in exchange the Client VM requires less time to analyze and compile a particular piece of code. This means that the Client VM can start up faster, and requires less warm-up time to reach peak performance. Figure B-3 shows the two systems-only the shaded areas are significantly different.

HotSpot Client and HotSpot Server

B.2 Runtime Features

Both the HotSpot Client VM and the HotSpot Server VM share the same runtime code. The runtime is primarily responsible for the following types of operations:

Simple JIT compilers compile all methods before they are executed. This turns out to be wasteful, as many methods are only executed once (or a very few times). In such cases, the time to compile the method can dwarf the time required to execute it. All this compilation also increases memory usage because the compiled code must be stored. As a result, the HotSpot runtime executes many methods in a purely interpreted mode. To ensure maximum performance for these methods, the HotSpot runtime provides a highly optimized bytecode interpreter.

In addition to the bytecode interpreter, the runtime is responsible for memory management and thread synchronization. The HotSpot runtime provides several important optimizations in these areas.

B.2.1 Memory Allocation and Garbage Collection

As previously discussed, how you handle memory is of critical importance to the performance of your software. While there are many optimizations that can reduce memory requirements, you will always need to allocate and collect objects. One of the HotSpot VM's most important features is its superior memory allocator and garbage collector. The HotSpot runtime provides an exact, generational, incremental, state-of-the-art garbage collector. (Chapter 7, Object Mutability: Strings and Other Things, also contains information on this subject.)

Accuracy

The HotSpot garbage collector is a fully accurate collector. In contrast, many other garbage collectors are conservative or partially accurate. While conservative garbage collection can be attractive because it is easy to implement, it has certain drawbacks.

A conservative collector does not know for sure where all object references are located. As a result, it must assume that memory words that appear to refer to an object are in fact object references. This means that it can make certain kinds of mistakes, such as confusing an integer for an object pointer. This has several negative impacts.

First, when such mistakes are made (which in practice is not very often), memory leaks can occur unpredictably in ways that are virtually impossible for application programmers to reproduce or debug (although crashes caused by dangling object references are still prevented, and the program still executes correctly if there is enough spare memory).

Second, since it might have made a mistake, a conservative collector must either use handles to refer indirectly to objects (decreasing performance), or avoid relocating objects. Relocating handleless objects requires that all of the references to the object be updated, which cannot be done if the collector does not know for sure that an apparent reference is a real reference. The inability to relocate objects causes object memory fragmentation and, more importantly, prevents use of the advanced generational copying collection algorithms described below.

Because the HotSpot collector is fully accurate, it can make several strong design guarantees that a conservative collector cannot make:

Generational Copying Collection

The HotSpot runtime employs a state-of-the-art generational copying collector2 that provides two major benefits:

A generational collector takes advantage of the fact that, in most programs, the vast majority of objects (often greater than 95 percent) are very short-lived. (In other words, they're used as temporary data structures.) By allocating objects from a dedicated object "nursery," a generational collector can accomplish several things. First, because new objects are allocated contiguously in stacklike fashion in the object nursery, allocation becomes extremely fast. This is because it involves merely updating a single pointer and performing a single check for nursery overflow. Second, by the time the nursery overflows, most of the objects in the nursery are already "dead," allowing the garbage collector to simply move the few surviving objects elsewhere. This way, it avoids doing any reclamation work for dead objects in the nursery.

Mark-Compact "Old Object" Collector

Although the generational copying collector collects most dead objects efficiently, longer-lived objects still accumulate in the "old object" memory area (old objects are objects that have existed for a while in machine terms). Occasionally, based on low-memory conditions or programmatic requests, an old-object garbage collection must be performed. The HotSpot runtime can use a standard mark-compact3 collection algorithm, which traverses the entire graph of live objects from its "roots," and then sweeps through memory, compacting away the gaps left by dead objects. By compacting gaps in the heap rather than collecting them into a free list, memory fragmentation is eliminated, and old-object allocation is streamlined by eliminating freelist searching.

Incremental "Pauseless" Garbage Collector

The mark-compact collector does not eliminate all user-perceivable pauses. User- perceived GC pauses occur when old objects need to be garbage collected, and these pauses are proportional to the amount of live object data that exists. This means that the pauses can become arbitrarily large as more data is manipulated, which is a very undesirable property for server applications, animations, and other soft real-time applications.

The HotSpot runtime provides an alternative old-space garbage collector to solve this problem. This collector is fully incremental,4 eliminating most user-detectable garbage collection pauses. This incremental collector scales smoothly, providing relatively constant pause times even when extremely large object datasets are being manipulated. This provides excellent behavior for:

The pauseless collector works by using an incremental old-space collection scheme referred to academically as the "train" algorithm. This algorithm breaks up old-space collection pauses into many tiny pauses (typically less than 10 milliseconds) that can be spread out over time so that the program virtually never appears to pause to a user. Since the train algorithm is not a hard real-time algorithm, it cannot guarantee an upper limit on pause times; however, in practice much larger pauses are extremely rare, and are not caused directly by large datasets.

The pauseless collector also has the highly desirable side benefit of producing improved memory locality. This happens because the algorithm works by attempting to relocate groups of tightly coupled objects into regions of adjacent memory, which provides excellent paging and cache locality properties for those objects. This can also benefit highly multithreaded applications that operate on distinct sets of object data.

B.2.2 Thread Synchronization

Another big attraction of the Java programming language is the provision of language-level thread synchronization, which makes it easy to write multithreaded programs with fine-grained locking. Unfortunately, older JVMs' synchronization implementations are highly inefficient relative to other micro-operations in the Java programming language, making use of fine-grain synchronization a major performance bottleneck.

HotSpot incorporates a unique synchronization implementation that boosts performance substantially. The synchronization mechanism provides its performance benefits by providing ultra-fast, constant-time performance for all uncontended synchronizations, which dynamically comprise the great majority of synchronizations.

The Java HotSpot synchronization implementation is fully suitable for multi-processing, and exhibits excellent multiprocessor performance characteristics.

B.3 HotSpot Server Compiler

While the HotSpot Client VM uses fairly traditional compilation technology, the HotSpot Server VM uses many advanced techniques to achieve maximum computational performance. A few of these optimizations are described in the next three sections.

B.3.1 Aggressive Inlining

Method inlining is an important compiler optimization. However, static compilers are restricted in the amount of inlining they can do, for a couple of reasons. First, a static compiler can inline a method only if the compiler can determine that method is not overridden in a subclass. A static compiler can inline static, final, and private methods because it knows those methods can't be overridden. However, public and protected methods can be overridden in a subclass, and static compilers therefore cannot inline those methods.

Second, even if it were possible to determine through static analysis which methods are overridden and which are not, a static compiler still could not inline public and protected methods. The Java language allows classes to be loaded during runtime, and such dynamically loaded classes can change the structure of a program significantly. In particular, such dynamic loading can render invalid any inlining that was done based on pre-runtime, static analyses.

The HotSpot dynamic compiler uses runtime analysis to perform inlining aggressively, yet safely. Once the HotSpot profiler has collected runtime information about program hot spots, it not only compiles the hot spot into native code, but also performs extensive method inlining on that code. The HotSpot compiler can afford to be aggressive in the way it inlines because it can always back out an inlining optimization if it determines that the method inheritance structure has changed during runtime due to dynamic class loading.

The HotSpot VM can revert to using the interpreter whenever compiler deoptimizations are called for because of dynamic class loading. When a class is loaded dynamically, the HotSpot VM checks to ensure that the interclass dependencies of inlined methods have not been altered. If a dynamically loaded class affects any dependencies, the HotSpot VM can back out affected inlined code, revert to interpreting for a while, and reoptimize later based on the new class dependencies.

On the other hand, when running statically compiled code, a JVM does not have access to the original bytecodes, and cannot fall back on an interpreter when optimizations in the statically compiled code become unsafe. Therefore, static compilers cannot be as aggressive in their optimizations as dynamic compilers, which results in slower performance.

The extensive inlining enabled by the dynamic compiler gives it a huge advantage over static compilers. Inlining reduces the number of method invocations and their associated performance overhead. This is a significant bonus with the Java programming language, in which methods are virtual by default and method invocations are frequent.

Method inlining is also synergistic with other optimizations. Inlining produces large blocks of code that make additional optimizations easier for the compiler to perform. The HotSpot Server compiler's ability to perform aggressive inlining is a key factor in making it faster than current JIT and static compilers.

B.3.2 Other Optimizations

The optimizer performs all of the classic optimizations such as dead code elimination, loop invariant hoisting, common subexpression elimination, and constant propagation. It also features optimizations more specific to Java technology, such as null-check elimination. The register allocator is a global graph coloring allocator and makes full use of large register sets.

B.3.3 Array Bounds Checks

None of the current generation of HotSpot compilers eliminate unnecessary array bounds checks. While it is theoretically possible to automatically remove many array bounds-related computations from certain types of loop structures, the HotSpot compiler doesn't yet do this. The HotSpot engineering team has run tests that show that only a small improvement in performance on the SpecJVM benchmark is to be expected when this feature is implemented. Specific applications might see much larger increases, however, depending on the amount of array access that they perform.

B.4 -X Flags

Both the HotSpot Client and Server VMs enable some control over the performance of the virtual machine. In some special circumstances, these options can be important. Keep in mind, however, that they are all nonstandard and subject to change without notice. Table B-1 shows all of the special HotSpot options.

Table B-1 HotSpot Options


Option

Description

-Xmixed


Mixed mode execution (default)


-Xint


Interpreted mode execution only


-Xbootclasspath:<directories and zip/jar files separated
by ;>


Set search path for bootstrap classes and resources


-Xnoclassgc


Disable class garbage collection


-Xincgc


Enable incremental garbage collection


-Xbatch


Disable background compilation


-Xms<size>


Set initial Java heap size


-Xmx<size>


Set maximum Java heap size


-Xprof


Output CPU profiling data

B.4.1 -Xnoclassgc

This flag turns off class unloading. Under JDK 1.1.x, this was important in some circumstances. With Java 2, however, the semantics of class loading are such that you really don't ever need to use this flag.

B.4.2 -Xincgc

This option enables the incremental garbage collector and reduces the average length of garbage collection pauses. Even without the incremental collector, pauses are usually not user-detectable. However, some applications have stringent requirements about how often certain operations need to happen. The incremental collector isn't designed for hard real-time applications, but can be useful in many soft real-time situations where concrete guarantees about CPU time are not required.

B.4.3 -Xbatch

This flag disables background compilation. By default, the HotSpot VM can compile methods in the background while they are executing. This smooths operation by eliminating the pauses that can occur when waiting for a method to be compiled. However, compiling methods in the background does have a slight performance impact. Using the -Xbatch flag on server processes that don't directly interact with the user might result in higher peak performance on single-processor servers. In general, using this flag hurts performance on multiprocessor machines because compilation could otherwise be off-loaded to a different processor.

B.4.4 -Xms

This flag sets the initial size of the object heap. The current default is 2MB. Increasing this size can help improve startup time for applications that need large heaps by eliminating the extra garbage collection that occurs before the heap is automatically expanded.

B.4.5 -Xmx

This flag sets the maximum size of the Java heap. It currently defaults to 64MB. If the program needs to allocate more memory than is allowed by the current setting of the max heap size, then the system will throw an OutOfMemoryError. Programs that work on very large amounts of data might need to increase this value.

B.4.6 -Xprof

The HotSpot VM includes a fairly simple CPU profiling tool. While it isn't a replacement for a full-featured commercial tool, it is quick and easy to use. It also gives some useful information about HotSpot internals that can be interesting.

To use this profiler, simply include the -Xprof option on the command line when you start your program. The profiler option gives you a basic CPU profiler. It does not provide any memory profiling options. When a thread terminates, a report, such as the one shown in Figure B-4, is printed to the console.

HotSpot profiler results

This profiler shows the methods that used the most time during that thread's lifetime. The methods are divided into three categories:

Interpreted methods are executed by the bytecode interpreter. As shown in Figure B-4, almost 10 percent of the program's time was spent in methods run by the interpreter. However, this is deceptive. You'll notice that there are two columns. The Native column shows the amount of time spent in native C methods called by interpreted methods. Thus, when you break it down that way, this program spends very little time in the interpreter, only a few ticks.

Compiled methods are those that are translated from bytecode to machine code by the compiler. In this case, that is where the program is spending most of its time. Note that it is possible for compiled methods to call out to native C functions, although that doesn't show up in this profile.

The Stub category shows methods called though JNI. The Stub column shows the amount of time it took to set up the call, while the Native column shows the amount of time spent in the native function.

The Global Summary at the end of the profile provides useful information such as how much time the thread spent blocked, and how much time was spent loading classes.

B.5 -XX Flags

While the flags in Section B.4 are subject to change at any time, the flags in this section are even less reliable. The flags described in this section are for experimentation purposes only. They aren't documented as part of the HotSpot release, and are not supported in any way. Use them at your own risk!

B.5.1 Kinds of -XX Flags

There are really two kinds of -XX flags. The first is a Boolean flag. The second is an Integer flag. Boolean flags are used in the following manner:

-XX:<+/-><flagname>
For example, passing the following string as an option to the java command would activate the GoFaster option if one existed.

-XX:+GoFaster
To turn this option off (if it was on by default) you would pass

-XX:-GoFaster
Integer flags are a little different. For example, the following string would set the NumCylinders option to eight.

-XX:NumCylinders=8
The next few paragraphs describe some of the more interesting flags.

B.5.2 PrintBytecodeHistogram

Default Value: false

Example Usage: java -XX:+PrintBytecodeHistogram <yourclass>

This option prints out statistics that show what bytecodes were executed while your program was running. Some sample output from this option is shown in Figure B-5. This type of information isn't commonly used when performance tuning typical programs, but might be of interest to researchers.
Bytecode histogram

B.5.3 CompileThreshold

Default Value: 1500

Example Usage: java -XX:CompileThreshold=1000000 <yourclass>

The current implementation of HotSpot usually waits for a method to be executed a certain number of times before it is compiled. Not compiling every method helps startup time and reduces RAM footprint. This option allows you to control that threshold. By increasing the number, you can trade slight reductions in RAM footprint in exchange for a longer period of time before your program reaches peak performance.

B.5.4 NewSize

Default Value: 655360

Example Usage: java -XX:NewSize=196608 <yourclass>

This option allows you to control the default size of the New generation (also known as the nursery) of the HotSpot VM's generational garbage collector. Increasing the amount of new space means that fewer objects will have to be copied to old space. However, a small new space can be scavenged more quickly, and works better with processor caches.



[Contents] [Prev] [Next] [Index]

1

Tim Lindholm and Frank Yellin, The Java Virtual Machine Specification, Second Edition, Section 3.5.3. Addison-Wesley, 1999.

2

For more information about generational copying collectors, see Richard Jones and Rafael Lins, Garbage Collection: Algorithms for Automatic Dynamic Memory Management, pp. 143-180. John Wiley & Sons, 1996.

3

For more information about the mark-compact collection algorithm, see Jones and Lins, pp. 97-114.

4

For more information about incremental collectors, see Jones and Lins, pp. 183-223.

Copyright © 2001, Sun Microsystems,Inc.. All rights reserved.