[Contents] [Prev] [Next] [Index]
Appendix B
The Java HotSpot
Virtual Machine
The Java Virtual Machine Specification1 describes the various behaviors that a
JVM implementation must perform. However, there are many ways to implement
these behaviors. Sun's original version of the JVM, which evolved from version
1.0 through version 1.2.2, was based on technology that had been in use for many
years in other systems. Then, in 1999, Sun released the first version of the Java
HotSpot virtual machine. The HotSpot VM uses cutting-edge techniques in the
areas of memory management, thread synchronization, and dynamic compilation.
While most of the material in this book is relevant no matter which JVM you use, the HotSpot VM does represent an important part of the Java Platform's performance landscape. Starting with the J2SE v. 1.3 SDK, all of Sun's JVM implementations will be based on HotSpot technology. Also, several other vendors have licensed the HotSpot code base for inclusion in their own implementation of the JRE.
This appendix provides an overview of the HotSpot architecture and discusses how the HotSpot VM achieves improved performance. Sections B.4 and B.5 summarize the various option settings that can be used to control the behavior of the HotSpot VM at runtime.
B.1 HotSpot Architecture
There are two main parts to the HotSpot system: the runtime and the compiler
(Figure B-1). The runtime portion includes a bytecode interpreter, memory management
and garbage collection functionality, and machinery for handling thread
synchronization and other low-level tasks. The compiler's job is simply to translate
bytecodes into native machine instructions, thus improving execution speed.
HotSpot architecture
Note that it is possible to use the HotSpot VM as a fully compliant JVM without
the compiler. The only difference will be a decrease in performance.
B.1.1 Two Versions of HotSpot
When the first version of the HotSpot VM was shipped in April 1999, it was
dubbed the Java HotSpot Performance Engine. The Java HotSpot Performance
Engine made major improvements in the performance of many server-side applications.
However, it wasn't ideal for many client-side programs. The requirements
for client and server can be quite different. For example, client programs often favor
lower RAM footprint and faster start-up time over maximum computational
performance. Due to these different requirements, the HotSpot technology was
split into two lines-one for the client and one for the server. Figure B-2 shows
how this breaks down.
HotSpot Compiler != javac
The terminology used here can be confusing. The javac tool included
with the J2SDK is a source-code to bytecode compiler. The "compiler"
included with the HotSpot VM is a bytecode to native machine-code
compiler. Though they're both generically referred to as compilers, they
perform very different tasks. This terminology sometimes leads to the
mistaken impression that you have to compile your source code with a
different compiler to use the HotSpot VM. This is not the case. Any class
files that execute on older JVMs should run under the HotSpot VM without
modification.
|
HotSpot product lines
As of version 1.3 of the J2SE SDK, all of Sun's implementations include a version of the HotSpot Client VM. The HotSpot Server VM is an optional add-on. The Client VM and the Server VM are very similar, and actually share a lot of code. The only part of the system that is different is the compiler. The Server VM contains a highly advanced adaptive compiler that supports many of the same types of optimizations performed by optimizing C++ compilers (as well as a few optimizations C++ compilers only wish they could do). The Client VM is much simpler. It doesn't try to perform many of the more complex optimizations performed by the compiler in the Server VM, but in exchange the Client VM requires less time to analyze and compile a particular piece of code. This means that the Client VM can start up faster, and requires less warm-up time to reach peak performance. Figure B-3 shows the two systems-only the shaded areas are significantly different.
HotSpot Client and HotSpot Server
B.2 Runtime Features
Both the HotSpot Client VM and the HotSpot Server VM share the same runtime
code. The runtime is primarily responsible for the following types of operations:
- Interpretation of bytecodes
- Memory allocation and garbage collection
- Thread synchronization
Simple JIT compilers compile all methods before they are executed. This turns out to be wasteful, as many methods are only executed once (or a very few times). In such cases, the time to compile the method can dwarf the time required to execute it. All this compilation also increases memory usage because the compiled code must be stored. As a result, the HotSpot runtime executes many methods in a purely interpreted mode. To ensure maximum performance for these methods, the HotSpot runtime provides a highly optimized bytecode interpreter.
In addition to the bytecode interpreter, the runtime is responsible for memory management and thread synchronization. The HotSpot runtime provides several important optimizations in these areas.
B.2.1 Memory Allocation and Garbage Collection
As previously discussed, how you handle memory is of critical importance to the
performance of your software. While there are many optimizations that can reduce
memory requirements, you will always need to allocate and collect objects. One
of the HotSpot VM's most important features is its superior memory allocator and
garbage collector. The HotSpot runtime provides an exact, generational, incremental,
state-of-the-art garbage collector. (Chapter 7, Object Mutability: Strings
and Other Things, also contains information on this subject.)
Accuracy
The HotSpot garbage collector is a fully accurate collector. In contrast, many
other garbage collectors are conservative or partially accurate. While conservative
garbage collection can be attractive because it is easy to implement, it has certain
drawbacks.
A conservative collector does not know for sure where all object references are located. As a result, it must assume that memory words that appear to refer to an object are in fact object references. This means that it can make certain kinds of mistakes, such as confusing an integer for an object pointer. This has several negative impacts.
First, when such mistakes are made (which in practice is not very often), memory leaks can occur unpredictably in ways that are virtually impossible for application programmers to reproduce or debug (although crashes caused by dangling object references are still prevented, and the program still executes correctly if there is enough spare memory).
Second, since it might have made a mistake, a conservative collector must either use handles to refer indirectly to objects (decreasing performance), or avoid relocating objects. Relocating handleless objects requires that all of the references to the object be updated, which cannot be done if the collector does not know for sure that an apparent reference is a real reference. The inability to relocate objects causes object memory fragmentation and, more importantly, prevents use of the advanced generational copying collection algorithms described below.
Because the HotSpot collector is fully accurate, it can make several strong design guarantees that a conservative collector cannot make:
- All logically inaccessible object memory can be reclaimed reliably.
- All objects can be relocated, allowing object memory compaction to eliminate
object memory fragmentation and increases memory locality.
Generational Copying Collection
The HotSpot runtime employs a state-of-the-art generational copying collector2
that provides two major benefits:
- Major increases in both allocation speed and overall garbage collection
efficiency (often by more than a factor of 5) for most programs, compared to
the Java 2 SDK
- A corresponding decrease in the frequency of user-perceivable garbage
collection pauses
A generational collector takes advantage of the fact that, in most programs, the
vast majority of objects (often greater than 95 percent) are very short-lived. (In
other words, they're used as temporary data structures.) By allocating objects
from a dedicated object "nursery," a generational collector can accomplish several
things. First, because new objects are allocated contiguously in stacklike fashion
in the object nursery, allocation becomes extremely fast. This is because it involves
merely updating a single pointer and performing a single check for nursery
overflow. Second, by the time the nursery overflows, most of the objects in the
nursery are already "dead," allowing the garbage collector to simply move the few
surviving objects elsewhere. This way, it avoids doing any reclamation work for
dead objects in the nursery.
Mark-Compact "Old Object" Collector
Although the generational copying collector collects most dead objects efficiently,
longer-lived objects still accumulate in the "old object" memory area (old objects
are objects that have existed for a while in machine terms). Occasionally, based on
low-memory conditions or programmatic requests, an old-object garbage collection
must be performed. The HotSpot runtime can use a standard mark-compact3
collection algorithm, which traverses the entire graph of live objects from its
"roots," and then sweeps through memory, compacting away the gaps left by dead
objects. By compacting gaps in the heap rather than collecting them into a free
list, memory fragmentation is eliminated, and old-object allocation is streamlined
by eliminating freelist searching.
Incremental "Pauseless" Garbage Collector
The mark-compact collector does not eliminate all user-perceivable pauses. User-
perceived GC pauses occur when old objects need to be garbage collected, and
these pauses are proportional to the amount of live object data that exists. This
means that the pauses can become arbitrarily large as more data is manipulated,
which is a very undesirable property for server applications, animations, and other
soft real-time applications.
The HotSpot runtime provides an alternative old-space garbage collector to solve this problem. This collector is fully incremental,4 eliminating most user-detectable garbage collection pauses. This incremental collector scales smoothly, providing relatively constant pause times even when extremely large object datasets are being manipulated. This provides excellent behavior for:
- Server applications, especially high-availability applications
- Applications that manipulate very large live object data sets
- Applications where all user-noticeable pauses are undesirable, such as games,
animations, and other highly interactive applications
The pauseless collector works by using an incremental old-space collection scheme referred to academically as the "train" algorithm. This algorithm breaks up old-space collection pauses into many tiny pauses (typically less than 10 milliseconds) that can be spread out over time so that the program virtually never appears to pause to a user. Since the train algorithm is not a hard real-time algorithm, it cannot guarantee an upper limit on pause times; however, in practice much larger pauses are extremely rare, and are not caused directly by large datasets.
The pauseless collector also has the highly desirable side benefit of producing improved memory locality. This happens because the algorithm works by attempting to relocate groups of tightly coupled objects into regions of adjacent memory, which provides excellent paging and cache locality properties for those objects. This can also benefit highly multithreaded applications that operate on distinct sets of object data.
B.2.2 Thread Synchronization
Another big attraction of the Java programming language is the provision of
language-level thread synchronization, which makes it easy to write multithreaded
programs with fine-grained locking. Unfortunately, older JVMs' synchronization
implementations are highly inefficient relative to other micro-operations in the
Java programming language, making use of fine-grain synchronization a major
performance bottleneck.
HotSpot incorporates a unique synchronization implementation that boosts performance substantially. The synchronization mechanism provides its performance benefits by providing ultra-fast, constant-time performance for all uncontended synchronizations, which dynamically comprise the great majority of synchronizations.
The Java HotSpot synchronization implementation is fully suitable for multi-processing, and exhibits excellent multiprocessor performance characteristics.
B.3 HotSpot Server Compiler
While the HotSpot Client VM uses fairly traditional compilation technology, the
HotSpot Server VM uses many advanced techniques to achieve maximum computational
performance. A few of these optimizations are described in the next three
sections.
B.3.1 Aggressive Inlining
Method inlining is an important compiler optimization. However, static compilers
are restricted in the amount of inlining they can do, for a couple of reasons. First,
a static compiler can inline a method only if the compiler can determine that
method is not overridden in a subclass. A static compiler can inline static,
final, and private methods because it knows those methods can't be overridden.
However, public and protected methods can be overridden in a subclass, and
static compilers therefore cannot inline those methods.
Second, even if it were possible to determine through static analysis which methods are overridden and which are not, a static compiler still could not inline public and protected methods. The Java language allows classes to be loaded during runtime, and such dynamically loaded classes can change the structure of a program significantly. In particular, such dynamic loading can render invalid any inlining that was done based on pre-runtime, static analyses.
The HotSpot dynamic compiler uses runtime analysis to perform inlining aggressively, yet safely. Once the HotSpot profiler has collected runtime information about program hot spots, it not only compiles the hot spot into native code, but also performs extensive method inlining on that code. The HotSpot compiler can afford to be aggressive in the way it inlines because it can always back out an inlining optimization if it determines that the method inheritance structure has changed during runtime due to dynamic class loading.
The HotSpot VM can revert to using the interpreter whenever compiler deoptimizations are called for because of dynamic class loading. When a class is loaded dynamically, the HotSpot VM checks to ensure that the interclass dependencies of inlined methods have not been altered. If a dynamically loaded class affects any dependencies, the HotSpot VM can back out affected inlined code, revert to interpreting for a while, and reoptimize later based on the new class dependencies.
On the other hand, when running statically compiled code, a JVM does not have access to the original bytecodes, and cannot fall back on an interpreter when optimizations in the statically compiled code become unsafe. Therefore, static compilers cannot be as aggressive in their optimizations as dynamic compilers, which results in slower performance.
The extensive inlining enabled by the dynamic compiler gives it a huge advantage over static compilers. Inlining reduces the number of method invocations and their associated performance overhead. This is a significant bonus with the Java programming language, in which methods are virtual by default and method invocations are frequent.
Method inlining is also synergistic with other optimizations. Inlining produces large blocks of code that make additional optimizations easier for the compiler to perform. The HotSpot Server compiler's ability to perform aggressive inlining is a key factor in making it faster than current JIT and static compilers.
B.3.2 Other Optimizations
The optimizer performs all of the classic optimizations such as dead code elimination,
loop invariant hoisting, common subexpression elimination, and constant
propagation. It also features optimizations more specific to Java technology, such
as null-check elimination. The register allocator is a global graph coloring allocator
and makes full use of large register sets.
B.3.3 Array Bounds Checks
None of the current generation of HotSpot compilers eliminate unnecessary array
bounds checks. While it is theoretically possible to automatically remove many
array bounds-related computations from certain types of loop structures, the
HotSpot compiler doesn't yet do this. The HotSpot engineering team has run tests
that show that only a small improvement in performance on the SpecJVM benchmark
is to be expected when this feature is implemented. Specific applications
might see much larger increases, however, depending on the amount of array access
that they perform.
B.4 -X Flags
Both the HotSpot Client and Server VMs enable some control over the performance
of the virtual machine. In some special circumstances, these options can be
important. Keep in mind, however, that they are all nonstandard and subject to
change without notice. Table B-1 shows all of the special HotSpot options.
Table B-1 HotSpot Options
Option
|
Description
|
-Xmixed
|
Mixed mode execution (default)
|
-Xint
|
Interpreted mode execution only
|
-Xbootclasspath:<directories and zip/jar files separated by ;>
|
Set search path for bootstrap classes and resources
|
-Xnoclassgc
|
Disable class garbage collection
|
-Xincgc
|
Enable incremental garbage collection
|
-Xbatch
|
Disable background compilation
|
-Xms<size>
|
Set initial Java heap size
|
-Xmx<size>
|
Set maximum Java heap size
|
-Xprof
|
Output CPU profiling data
|
B.4.1 -Xnoclassgc
This flag turns off class unloading. Under JDK 1.1.x, this was important in some
circumstances. With Java 2, however, the semantics of class loading are such that
you really don't ever need to use this flag.
B.4.2 -Xincgc
This option enables the incremental garbage collector and reduces the average
length of garbage collection pauses. Even without the incremental collector,
pauses are usually not user-detectable. However, some applications have stringent
requirements about how often certain operations need to happen. The incremental
collector isn't designed for hard real-time applications, but can be useful in many
soft real-time situations where concrete guarantees about CPU time are not
required.
B.4.3 -Xbatch
This flag disables background compilation. By default, the HotSpot VM can compile
methods in the background while they are executing. This smooths operation
by eliminating the pauses that can occur when waiting for a method to be compiled.
However, compiling methods in the background does have a slight performance
impact. Using the -Xbatch flag on server processes that don't directly interact
with the user might result in higher peak performance on single-processor
servers. In general, using this flag hurts performance on multiprocessor machines
because compilation could otherwise be off-loaded to a different processor.
B.4.4 -Xms
This flag sets the initial size of the object heap. The current default is 2MB.
Increasing this size can help improve startup time for applications that need large
heaps by eliminating the extra garbage collection that occurs before the heap is
automatically expanded.
B.4.5 -Xmx
This flag sets the maximum size of the Java heap. It currently defaults to 64MB.
If the program needs to allocate more memory than is allowed by the current setting
of the max heap size, then the system will throw an OutOfMemoryError. Programs
that work on very large amounts of data might need to increase this value.
B.4.6 -Xprof
The HotSpot VM includes a fairly simple CPU profiling tool. While it isn't a replacement
for a full-featured commercial tool, it is quick and easy to use. It also
gives some useful information about HotSpot internals that can be interesting.
To use this profiler, simply include the -Xprof option on the command line when you start your program. The profiler option gives you a basic CPU profiler. It does not provide any memory profiling options. When a thread terminates, a report, such as the one shown in Figure B-4, is printed to the console.
HotSpot profiler results
This profiler shows the methods that used the most time during that thread's lifetime. The methods are divided into three categories:
- Interpreted
- Compiled
- Stub
Interpreted methods are executed by the bytecode interpreter. As shown in Figure B-4, almost 10 percent of the program's time was spent in methods run by the interpreter. However, this is deceptive. You'll notice that there are two columns. The Native column shows the amount of time spent in native C methods called by interpreted methods. Thus, when you break it down that way, this program spends very little time in the interpreter, only a few ticks.
Compiled methods are those that are translated from bytecode to machine code by the compiler. In this case, that is where the program is spending most of its time. Note that it is possible for compiled methods to call out to native C functions, although that doesn't show up in this profile.
The Stub category shows methods called though JNI. The Stub column shows the amount of time it took to set up the call, while the Native column shows the amount of time spent in the native function.
The Global Summary at the end of the profile provides useful information such as how much time the thread spent blocked, and how much time was spent loading classes.
B.5 -XX Flags
While the flags in Section B.4 are subject to change at any time, the flags in this
section are even less reliable. The flags described in this section are for experimentation
purposes only. They aren't documented as part of the HotSpot release,
and are not supported in any way. Use them at your own risk!
B.5.1 Kinds of -XX Flags
There are really two kinds of -XX flags. The first is a Boolean flag. The second is
an Integer flag. Boolean flags are used in the following manner:
-XX:<+/-><flagname>
For example, passing the following string as an option to the java command
would activate the GoFaster option if one existed.
-XX:+GoFaster
To turn this option off (if it was on by default) you would pass
-XX:-GoFaster
Integer flags are a little different. For example, the following string would set the
NumCylinders option to eight.
-XX:NumCylinders=8
The next few paragraphs describe some of the more interesting flags.
B.5.2 PrintBytecodeHistogram
Default Value: false
Example Usage: java -XX:+PrintBytecodeHistogram <yourclass>
This option prints out statistics that show what bytecodes were executed while
your program was running. Some sample output from this option is shown in Figure
B-5. This type of information isn't commonly used when performance tuning
typical programs, but might be of interest to researchers.
Bytecode histogram
B.5.3 CompileThreshold
Default Value: 1500
Example Usage: java -XX:CompileThreshold=1000000 <yourclass>
The current implementation of HotSpot usually waits for a method to be executed
a certain number of times before it is compiled. Not compiling every method helps
startup time and reduces RAM footprint. This option allows you to control that
threshold. By increasing the number, you can trade slight reductions in RAM
footprint in exchange for a longer period of time before your program reaches
peak performance.
B.5.4 NewSize
Default Value: 655360
Example Usage: java -XX:NewSize=196608 <yourclass>
This option allows you to control the default size of the New generation (also
known as the nursery) of the HotSpot VM's generational garbage collector. Increasing
the amount of new space means that fewer objects will have to be copied
to old space. However, a small new space can be scavenged more quickly, and
works better with processor caches.
[Contents] [Prev] [Next] [Index]
1
Tim Lindholm and Frank Yellin, The Java Virtual Machine Specification, Second Edition, Section 3.5.3. Addison-Wesley, 1999.
2
For more information about generational copying collectors, see Richard Jones and Rafael Lins, Garbage Collection: Algorithms for Automatic Dynamic Memory Management, pp. 143-180. John Wiley & Sons, 1996.
3
For more information about the mark-compact collection algorithm, see Jones and Lins, pp. 97-114.
4
For more information about incremental collectors, see Jones and Lins, pp. 183-223.
Copyright © 2001, Sun Microsystems,Inc.. All rights
reserved.