HotSpot Benchmarking Questions and Answers

Many early testers of the HotSpotTM JavaTM virtual machine (HotSpot) have run into the same issues when trying to determine how much faster HotSpot is than other virtual machines. This document provides answers for some frequently asked questions about evaluating HotSpot.

Q: I write a simple loop to time a simple operation and HotSpot looks even slower than Java 2 SDK. What am I doing wrong? Here's my program:

        public class Benchmark {
            public static void main(String[] arg) {
                long before = System.currentTimeMillis();
                int sum = 0;
                for (int index = 0; index < 10*1000*1000; index += 1) {
                    sum += index;
                }
                long after = System.currentTimeMillis();
                System.out.println("Elapsed time: " +
                                   Long.toString(after - before) +
                                   " milliseconds");
            }
        }
    

A: You are writing a microbenchmark.

Remember how HotSpot works. It starts by running your program with an interpreter. When it discovers that some method is "hot" -- that is, executed a lot, either because it is called a lot or because it contains loops that loop a lot -- it sends that method off to be compiled. The next time the method is called the compiled version is invoked, instead of the interpreted version. That heuristic works fine for most programs, which have small methods that are called often. It does not work for microbenchmarks.

What's happening is your main method starts running in the interpreter. After a while the HotSpot virtual machine notices that it's looping in that method and sends it off to be compiled. Meanwhile, execution continues in the interpreter. If you ever called main again, you would use the compiled version. But, of course, you never call main again, so your program runs entirely in the interpreter (with a little overhead for compiling the method).

This is called the "on stack replacement" issue, because to resolve it we need to transfer control from the interpreter to the compiled code while the interpreted frame is still on the stack. We know how to do this, and it will implemented in a release shortly after FCS.

In the meantime, if you insist on using/writing microbenchmarks like this, you can work around the problem by moving the body of main to a new method and calling it once from main to give the compiler a chance to compile the code, then calling it again in the timing bracket to see how fast HotSpot is.

Q: I'm trying to time method invocation time. I don't want there to be any extra work done, so I'm using an empty method. But when I run with HotSpot I get times that are unbelievably fast. Here's my code:

        public class EmptyMethod {
            public static void method() {
            }
            public static void runTest() {
                long before;
                long after;
                // First, figure out the time for an empty loop
                before = System.currentTimeMillis();
                for (int index = 0; index < 1*1000*1000; index += 1) {
                }
                after = System.currentTimeMillis();
                long loopTime = after - before;
                System.out.println("Loop time: " +
                                   Long.toString(loopTime) +
                                   " milliseconds");
                // Then time the method call in the loop
                before = System.currentTimeMillis();
                for (int index = 0; index < 1*1000*1000; index += 1) {
                    method();
                }
                after = System.currentTimeMillis();
                long methodTime = after - before;
                System.out.println("Method time: " +
                                   Long.toString(methodTime) +
                                   " milliseconds");
                System.out.println("Method time - Loop time: " +
                                   Long.toString(methodTime - loopTime) +
                                   " milliseconds");
            }
            public static void main(String[] arg) {
                // Warm up the virtual machine, and time it
                runTest();
                runTest();
                runTest();
            }
        }
    

A: Empty methods don't count. And you are also seeing that generated code is sensitive to alignment.

The call to the empty method is being inlined away, so there really is no call there to time. Small methods will be inlined by the compiler at their call sites. This reduces the overhead of calls to small methods. This is particularly helpful for the accessor methods use to provide data abstraction. If the method is actually empty, the inlining completely removes the call.

Code is generated into memory and executed from there. The way the code is laid out in memory makes a big difference in the way it executes. In this example on my machine, the loop that claims to call the method is better aligned and so runs faster than the loop that's trying to figure out how long it takes to run an empty loop, so I get negative numbers for methodTime-loopTime.

Q: Okay, so I'll put some random code in the body of the method so it's not empty and the inlining can't just remove it. Here's my new method (and the call site is changed to call method(17)):

            public static void method(int arg) {
		int value = arg + 25;
            }
    

A: The HotSpot compiler is smart enough not to generate code for dead variables.

In the method above, the local variable is never used, so there's no reason to compute its value. So then the method body is empty again and when the code gets compiled (and inlined, because we removed enough code to make it small enough for inlining) it turns into an empty method again.

This can be surprising to people not used to dealing with optimizing compilers, because they can be fairly clever about discovering and eliminating dead code. They can occasionally be fairly stupid about it, so don't count on the compiler do to arbitrary optimizations of your code.

Dead code elimination also extends to control flow. If the compiler can see that a particular "variable" is in fact a constant at a test, it may choose not to compile code for the branch that will never be executed. This makes it tricky to make microbenchmarks "tricky enough" to actually time what you think you are timing.

Dead code elimination is quite useful in real code. Not that people intentionally write dead code; but often the compiler discovers dead code due to inlining where constants (e.g., actual parameters to methods) replace variables, making certain control flows dead.

Q: I'm trying to benchmark object allocation and garbage collection. So I have harness like the one above, but the body of the method is:

            public static void method() {
		Object o = new Object();
            }
    

A: That's the optimal case for the HotSpot storage manager. You will get numbers that are unrealistically good.

You are allocating objects that need no initialization and dropping them on the floor instantly. (No, the compiler is not smart enough to optimize away the allocation.) Real programs do allocate a fair number of short-lived temporary objects, but they also hold on to some objects for longer than this simple test program. The HotSpot storage manager does more work for the objects that are retained for longer, so beware of trying to scale up numbers from tests like this to real systems.

Q: I have a graphics-intensive or GUI-based program. I've tried it on HotSpot and it doesn't seem to perform much better than the Java 2 SDK, and only slightly better than on JDK1.1.x implementations. Why isn't HotSpot making my graphics code go faster?

A: Graphics programs spend a lot of their time in native libraries.

The overall performance of a Java application depends on four factors:

HotSpot is a replacement for the Java 2 SDK virtual machine. The virtual machine is responsible for byte code execution, storage allocation, thread synchronization, etc. Running with the virtual machine are native code libraries that handle input and output through the operating system, especially graphics operations through the window system. The HotSpot virtual machine uses the same native code libraries that the Java 2 SDK uses, so programs that spend significant portions of their time in those native code libraries will not see their performance on HotSpot improved as much as programs that spend most of their time executing byte codes.

In addition, HotSpot is a Java 2 virtual machine, and so graphics operations go through the new Java2D APIs. These APIs are significantly more featureful than the old AWT APIs, but come with an overhead not present in JDK 1.1.x systems.

This observation about native code applies to other native libraries that come with the Java 2 SDK, or any native code libraries that you happen to use with your application.

Q: What do you recommend for benchmarking HotSpot, or any virtual machine?

A: We like to use the SPEC JVM98 benchmark. We use it for tracking our own progress over time, and we use it for comparing ourselves to other virtual machines.

The SPEC JVM98 benchmark was developed by a consortium of interested vendors under the auspices of the Standard Performance Evaluation Corporation (SPEC). It is the only industry-standard benchmark for Java platforms. The benchmark is collects the kernels of several types of programs, most of them based on real applications. The benchmark seems to have a good mix of operations and realistic behaviors (method invocations, storage allocation and lifetimes, input and output). We find that the benchmark is predictive of the performance we see across a number of real applications. It comes with an easy-to-use harness that ensures that it is run the same way on all platforms, so fair comparisons can be made between platforms. The SPEC JVM98 benchmark is available from http://www.spec.org/osg/jvm98/.

Other than that, we like benchmarking real applications. Those are usually harder to obtain, somewhat more difficult to run, and more difficult to compare one against the other.

Send feedback to hotspot-feedback@sun.com
15 June 1999