|
The Java HotSpot performance engine was officially launched April 27, 1999. Far more than a mere performance tune-up, it is in reality a Java1 virtual machine (VM) that has been engineered for maximum performance from the ground upoften providing at least a two-fold increase in speed for server-side Java technology-based applications.
As any developer who works with the Java programming language knows, the virtual machine mediates between Java applications and the underlying hardware platformexecuting application bytecodes, managing system memory, providing system security, and juggling multiple execution threads. Using the Java 2 platform's new pluggable architecture, Java HotSpot can be seamlessly dropped into the platform, replacing both the classic virtual machine and the Just-In-Time (JIT) compiler. Once this new performance engine is installed, any application or applet processed with the Java 2 runtime environment (application launcher, plug-in, or applet viewer) will by default use the Java HotSpot performance engine. With input and commentary from David Stoutamire, manager of the Java HotSpot compilation group, this article drills down into the engine's functionality, exploring how it does what it doeswhere it screams, and where it only purrsalong with code samples that illustrate the engine's inner-workings. The BasicsThe Java HotSpot performance engine concentrates on several key areas to achieve its state-of-the-art performance enhancement.
Such enhancements are typically most effective on server side applications. "The way things are tuned," explains David Stoutamire, "you get more and more benefit the longer the application runs, and the more it's involved in executing Java bytecode. If an application presents a bouncing ball on the screenwhich may use graphic intensive C code or system calls, and which may only execute until someone clicks on a linkthen the Java HotSpot VM doesn't have a chance to really shine. Performance of an application written in the Java programming language generally depends upon four factors:
The typical client side application's performance (particularly graphics applications) are most heavily impacted by native libraries, whereas the typical server side application stresses the speed of bytecode executionwhich is where the new VM shines.
But future releases of the engine, according to Stoutamire, will specifically target application performance on the client side. Adaptive CompilationMost applications spend the vast majority of their time executing a small minority of their code. The Java HotSpot performance engine analyzes an application as it runs, identifying the areas that are most critical to performancewhere the greatest time is being spent executing bytecode. Rather than compiling an entire program when it first starts, or compiling each method as it is called (as does the JIT compiler), the performance engine initially runs the program using an interpreter, and then analyzes it as it runs, looking for performance "hot spots." It then compiles and optimizes only those performance-critical areas of code. This monitoring process continues dynamically throughout the life of the program, with the performance engine adapting on-the-fly to the ongoing performance needs of the application. The Java HotSpot adaptive compilation technology makes it far superior to a Just-In-Time compiler. With a JIT compiler, because 20% of the code may take up most of the execution time, optimizing the other 80% at runtime doesn't always helpthe potential increase in speed may not pay back the cost of the optimization. The Java HotSpot method of dynamic optimization produces the following benefits:
Method InliningOn-the-fly, dynamic compilation is just the beginning of the optimization performed by the Java HotSpot performance engine. "Suppose I have a method that does something trivial like add one to an argument, and then return it," says Stoutamire. "In that case, the compiler may as well just generate code that, instead of actually calling the method, simply adds one to the variable. In that way, it's saved the overhead of an instruction that jumps to that method, as well as the return." But such "method inlining" becomes problematic within the world of object oriented code design. "Often, the address you're going to jump to is hard coded in the instruction," says Stoutamire. "But that's not always the case. In some instances, with what's called dynamic dispatch, or virtual method invocation, you have a pointer at runtime that's used to call one of a set of different methods." This dynamic dispatch concept is at the heart of the Java programming languagethe idea that a subclass can override an already existent method, and then at runtime, the proper method ends up automatically being called. "The problem with inlining," continues Stoutamire, "is that you can't inline across dynamic dispatch. The reason for that is that you're never really sure what method you're going to call, so you can't bring the body of the method up into the call. And the reason for that is because Java allows you to load new classes at any time. That may introduce a new class, with a new methodand if what came before that introduction had already been inlined, all of a sudden, all that code could be incorrect." The Java HotSpot VM handily works around this problem by using dynamic deoptimization. "Deoptimization is the ability to, at any point, revert from compiled code to interpreted code," explains Stoutamire. "In essence, it's the ability to convert a compiled stack frame to an interpreted stack frame. The interpreter has its own representation of an executing method. If you have a local variable in an executing method in compiled code, it may be on the stack somewhere, or in a register somewhere. But the interpreter isn't constrained to have that same layout. So you have to be able to take all of that stuff and reshuffle it, and make it look as if it's an interpreted frame, before the interpreter can continue on with it. This is one of the things that sets the Java HotSpot VM apart from other virtual machines." As the name implies, dynamic deoptimization is an ongoing process within a given program. "Let's say that you have a running program," says Stoutamire, "and a given performance hot spot of the program gets compiled. During that compilation, the compiler takes advantage of the fact that there's only a single subclass, so it's able to do method inlining. But then later on, a new class is dynamically loaded, and that breaks the existing compiled code. So the Java HotSpot VM undoes that existing compilation, and restarts the code with the interpreteror, if it continues to be a performance hot spot, recompiles it, and restarts it with the new compilation." Object LayoutPart of the new and improved object layout in the Java HotSpot virtual machine is a two machine-word object headerrather than the three word header found in most other Java VMs. The first header word is a reference to the object's class. The second, contains other information, such as the identity hash code, and garbage collection status information. Only arrays have a third header field, for the array size. Since the average Java programming language object size is small, this word economization has a significant positive impact on memory consumption (approximately an 8% savings in heap size). The Java HotSpot VM also eliminates the concept of "handles," an i ndirection facility used to access objects in memory. This both reduces memory usage and speeds processing. "A traditional representation of objects is that when you have one object that points to another," says Stoutamire, "it has a pointer to the header of that other object. But the classic virtual machine uses a different representation, where, instead of pointing directly to the object, it points into a table. And the header of the object is in that table, as well as a pointer to where the fields of the object are." Such indirection is particularly useful in terms of relocating objects in memory, which often comes into play during garbage collection (see below). "If object A conceptually points to object B," explains Stoutamire, "what it really does is point into the handle table. Suppose that object A points to the fourth entry in the table, and the fourth entry in the table has a pointer to object B. Now, when I move object B, all I have to do for object A is update the entry in the table. That's not so useful if I only have a single object pointing to B, but if I have 1000 objects pointing to it (or I'm not sure where all the pointers are), then all I have to do is update that single entry when I move object B." While the use of handles brings greater ease of processing, such indirection is considerably slower, and the table takes up more space in memory. Further, the handle table's primary reason for being, to better facilitate object relocation in memory during garbage collection, has also since been shown to be less than necessary. "In practice, it isn't much more expensive to simply update all pointers during object relocation," says Stoutamire. "In order to do proper garbage collection, you have to trace through the entire heap anywayyou have to go through every pointer. So the fact that you're visiting every pointer anyway, means that you have the opportunity to change it right then and there, without doing much extra work." Finally, because the Java HotSpot VM uses direct memory references, it doesn't have to set up a memory-referencing handle when allocating memory, and it doesn't have to manage handles in addition to managing object memory. That makes the allocation of temporary data structures as fast as C's stack-based memory allocations, which is a big win. Garbage CollectionThe Java programming language is the first mainstream language to provide built-in automatic garbage collection. Pre-Java HotSpot VMMany Java virtual machines use "conservative" or partially-accurate collectors. A conservative collector assumes that anything that looks like a valid pointer may actually be a pointer. Conservative collectors are easier to implement, but a conservative collector doesn't always know for certain where all object references are in memory. As a result, it can, in rare instances, make mistakessuch as confusing an integer for an object pointer which can create difficult-to-debug memory leaks. Also, a conservative collector has to either use handles to refer indirectly to objects, or avoid relocating objects, since relocating handleless objects requires updating all references to the object and the conservative collector can't even be certain that an apparent reference is in fact real. This inability to relocate objects, in turn, results in memory fragmentation, and prevents the use of more sophisticated garbage collection algorithms. Conservative garbage collection also has a negative impact in the realm of native methods. "A conservative garbage collector has to make sure that it doesn't move anything that's pointed to from outside of the Java code," explains Stoutamire. "And that means that you have to scan areas of memory, looking for pointers into your heapand that all takes time." This was one of the problems faced in using Java's old Native Method Interface (NMI) specification. Ironically, this difficulty was solved through the use of a different kind of handles. "With the Java 1.1 platform," says Stoutamire, NMI was replaced with Java Native Interface (JNI). With JNI, you never point from the outside to Java objects directly, you only point to handles, which then point to the objects. That means that the garbage collector doesn't have to worry about what's going on on the outside. But handles are only used to the extent that native code wants to point into the VM," he says. "It's no less efficient than the old way and allows for far more efficient garbage collection. So if you want to use the Java 2 platform and, hence, the Java HotSpot VM, you have to be using JNI." Post-Java HotSpot VMLike other aspects of the Java HotSpot performance engine, the garbage collection implementation has been redesigned from the ground-up. The Java HotSpot VM garbage collector is "fully accurate," offering the guarantee that:
And there are other state-of-the-art features that are also a part of the Java HotSpot VM garbage collection. "In order to perform accurate garbage collection," says Stoutamire, "you have to trace out the entire heapbecause if you don't trace out every pointer, it might point to something living that you could mistakenly collectand that would be bad. On the other hand, if having to trace out the entire heap was strictly true, then garbage collection would get slower and slower as the heap grew." Generational Copying CollectionOne way around this eventuality is the Java HotSpot state-of-the-art generational copying collection algorithm. "Generational collection exploits the fact that most objects don't really live that long," says Stoutamire. "Here, there are basically two heaps set upone for older objects, and one for newer objects. And the system records any references from old things to new things in a separate table." Using generational garbage collection, the majority of objects (often, greater than 95%) can be reclaimed simply by making many small "scavenges" of the new-object space (sometimes also known as "the nursery"). Longer lived objects are ultimately copied, or "tenured," into the old space area. "If things live for a certain while," says Stoutamire, "then they're probably going to live for a while more. You want to give them a chance to mature before tenuring them, but once they've proven themselves by not dying, you can go ahead and tenure them." Because new objects are added contiguously in stack-like fashion in the nursery, allocation is extremely fastsince it involves simply updating a single pointer and checking for overflow. By the time the nursery becomes full, most of the objects there are already dead. The garbage collector can then simply copy the remaining live objectsthereby avoid having to do any further reclamation work. Mark-Compact CollectionThe generational garbage collection scheme deals with most dead objects while still in the nursery area. But longer-lived objects do ultimately accumulate in the old object area. There, based upon a low-memory condition, or a programmatic request, old object garbage collection must occasionally occur. The Java HotSpot VM uses a standard mark-compact algorithm for this task, traversing the entire tree of live objects from its roots. "Mark-compact goes through memory and marks all of the objects that are reachable," says Stoutamire, "that is, all the ones that aren't garbage." From there, any gaps left by dead objects are compacted away. By compacting gaps in the heap, rather than collecting them into a free area, memory fragmentation is eliminated, old-object allocation is streamlined (by eliminating freelist searching), and cache can be used more effectively. Incremental "Train" Collection
But a generational/mark-compact garbage collection algorithm can't eliminate
all user-perceivable pauses. Such pauses occur during old object collection
and are proportional to the amount of live object data being used. To address
the need for "pauseless" garbage collection, this new virtual
machine also offers an incremental, or "train," algorithm. The
incremental collection option (which can be selected with the
"The train algorithm is a sophisticated variant of copying and generational collection," says Stoutamire. "Instead of just having a new space and an old space, it has a middle space made up of many small spaces. It tries to keep them as small as possible and it tries to group objects that point to one another within the same space." Keeping tightly "coupled" objects in adjacent areas of memory has a side benefit for highly multi-threaded applications which operate on distinct sets of object data. The train algorithm breaks up old-space garbage collection pauses into many tiny pauses (on the order of a few milliseconds), which are spread out over time such that the pauses become virtually imperceptible to the user. "If you're playing with a graphical program," says Stoutamire, "and you're dragging something with the mouse, you don't want to see it suddenly hiccup and pause for a second. That's often been users' experiences with systems that garbage collectevery now and then they'd feel this stutter. But the train algorithm virtually eliminates that." Together, the various garbage collection algorithms work in symphony to deliver the Java HotSpot state-of-the-art garbage collection. "The first stage is the nursery, or what's sometimes called 'Eden,'" says Stoutamire. "Most objects die young, so they never have to get moved out of Eden. Copying, which is what's used in Eden, is most effective where objects are constantly dyingbecause then you end up copying fewer things." Assuming incremental garbage collection is turned on, objects next get moved to the area managed by the train algorithm. And from there, they are moved to the permanent area where more long-lived objects end up, which is managed by a mark-compact algorithm. But the incremental (train) mode is not without its overhead (an approximately 10% degradation in speed). "There is some overhead to the incremental mode," confirms Stoutamire. That's why it's not turned on by default. Because this release is positioned as a server product, we wanted to go for peak throughput rather than being concerned about pause times. If you were to use the Java HotSpot VM on the client side, however, you might want to turn on the incremental mode, because it would give you a more consistent, pauseless response." As a side note, Stoutamire points out that, in reality, a middle garbage collection area exists even when the incremental/train algorithm is not activated. "There's still an intermediate generation," he explains, "it's just managed with a copy collection algorithm, without the same fine granularity of the train algorithm."
Thread SynchronizationSun estimates that 40 percent of hardware resources used by a typical Java application are devoted to garbage collection and multithreading (where multiple I/O data streams are handled simultaneously). The Java HotSpot virtual machine incorporates a breakthrough in thread synchronization which boosts performance by a major factor. "The important thing for programmers to know," says Stoutamire, "is that we do what they would want us to do with native threads. In older versions of the Java VM, there were funny restrictions on how I/O workedwhether or not, if you had a Java thread, there was really a native OS thread that executed it. Things like that ultimately had performance gotchas." But the Java HotSpot thread synchronization implementation offers "fully preemptive" threads, using the host operating system's thread model. "With the Java HotSpot VM," adds Stoutamire, "every Java thread corresponds to a native OS thread. That was not always the case with the classic VMthere were times when a single native thread would be multiplexed to multiple Java threads. But with that scenario, if a thread blocks for some reason, then all the other threads aren't going to run, either. Once you have a one-to-one correspondence between native and Java threads, however, as is the case in the Java HotSpot virtual machine, then if a Java thread blocks for some reason, it won't interfere with other Java threads. That's what preemptive meansthat a thread can run and preempt another at any time. In a non-preemptive system, one thread can potentially starve another." The FutureOften, the first thing a developer wants to do after installing the Java HotSpot performance engine is to take it out for a spin. But this performance engine was designed to be at its best when working with real-world or enterprise systems, or both. Trying to anticipate and exercise its inner workings through a piece of microbenchmarking code can often prove a frustrating and disappointing experience. As a result, Sun has put together a Benchmarking Q&A, with code samples that explain common misconceptions made in trying to exercise many performance enhancements of this performance engine. One commonly performed benchmark sets up a simple piece of code to test the speed of execution for many iterations of a loop (which, perhaps, increments a number within the loop). The Java HotSpot VM starts out by running such a program in interpreted mode, but soon discerns (due to the many repetitions of the loop) that the area is "hot." As a result, it sends the method off to be compiled.
In the current release, this newly compiled version of the code won't actually
be invoked, however, until the next time the method ( "The solution to such a situation," explains Stoutamire, "is on-stack replacementwhich has already been implemented in the next Java HotSpot revision, but has not yet been released." On-stack replacement is the exact opposite of dynamic deoptimizationan interpreted frame is turned into a compiled frame, but while the method is still running. "We weren't concerned about this scenario in the Java HotSpot version 1.0," says Stoutamire, "because it's not what real applications typically do, but this addresses the situation and makes such microbenchmarks do what one would expect them to do." ConclusionSun is initially offering the Java HotSpot performance engine for the Solaris operating system and Microsoft Windows operating system. It is available free to end users and ISVs, but entails royalty payments for vendors who include it in operating systems. The beta test version of Java HotSpot 2.0 is scheduled for sometime this summer, and promises an additional 30% performance increase! Links
About the AuthorSteven Meloan is a writer, journalist, and former software developer. He is a frequent contributor to the Java Developer Connection, and his work has also appeared in Wired, Rolling Stone, BUZZ, San Francisco Examiner, ZDTV's "The Site," and American Cybercast's "The Pyramid."
_______ | ||||||||
Oracle is reviewing the Sun product roadmap and will provide guidance to customers in accordance with Oracle's standard product communication policies. Any resulting features and timing of release of such features as determined by Oracle's review of roadmaps, are at the sole discretion of Oracle. All product roadmap information, whether communicated by Sun Microsystems or by Oracle, does not represent a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. It is intended for information purposes only, and may not be incorporated into any contract.
|
| ||||||||||||