Sun Java Solaris Communities My SDN Account Join SDN
 
SDN Chat Sessions

Squeezing Performance from the Java HotSpot Virtual Machine

SDN Chat Sessions Transcripts Index

March 15, 2005
Guests: Peter Kessler and Ross Knippel
Moderator: Edward Ort (MDR-EdO)

This is a moderated forum.

MDR-EdO: Welcome to today's SDN chat on "Squeezing Performance from the Java HotSpot Virtual Machine." Our guests today are Peter Kessler and Ross Knippel. Peter is the technical lead for garbage collection (GC) in the HotSpot VM. He's ready to handle questions about storage allocation and garbage collection. Ross focuses on the HotSpot Server Compiler, and will handle your questions about the runtime compilers. Peter and Ross are also prepared to field questions about the structure of the virtual machine code, how things work, and all that stuff that goes on below the level of Java code.

Rob: I would like to know the best strategy for GC in the following situation with the Java 2 Platform, Standard Edition (J2SE) 5.0 Client VM: I have a Java Desktop application that makes calls to native code (dll) to initiate a scan operation on a high speed scanner (200 pages per minute). The native code will call back into a Java routine to get the name of the file that the scanner should save to disk. I want to do everything possible to make sure that no GC pauses take place so that the scanner always runs at it's full rated speed for max throughput.

Peter Kessler: If you just have to get the name of the file, why don't you do that first, and pass that to the native code? Since native code continues to run even during GC (except if it tries to touch Java heap memory), you won't be interrupted.

HotSpotter: Were there any improvements made to the HotSpot compilers (client and server) for JDK 5.0?

Ross Knippel: Yes, there were performance enhancements to the handling of:

  • Trig/Transcendental intrinsics
  • SIN/COS/TAN
  • LOG/LOG10
  • Idiomatic container conversion improvements for performance
  • speedup crypto patterns using: unsigned 32bit int arithmetic coded to use Java long [4850191]
  • improvements for unsigned-byte/char to int/long [5038535]
  • Byte->Long conversions
  • JumpTables (-XX:+UseJumpTables for 5.0)
  • Atomic load/store of long->long using SSE instructions
  • Improve System.arraycopy performance for x86 and amd64 using SSE instructions

murphee: Are there any ideas for getting rid of the dreaded Xmx (max. Heap) limit? And if getting rid is not possible, is it possible to add a way to increase it at runtime (even if this would have some cost and/or block the VM for a bit). The JMX MemoryManagement MBeans could be used to do this.

Peter Kessler: There isn't currently a way to get rid of -Xmx. We know how to do it technically, but there's a performance cost, so we aren't going to impose that on everyone. One thing you can do is to specify a large -Xmx and depend on the JVM to limit the amount of memory to the live data set of the application. There also isn't a way to change the -Xmx once the JVM has started, because we have a bunch of data structures that we lay out, scaled by the size of the heap, so the heap has to be contiguous, for now.

Venu: The JVM doesn't seem to return pages to the operating system after a garbage collection shrinks the heap. Do I have to enable something to get that to happen?

Peter Kessler: You might want to turn down the -XX:MaxHeapFreeRatio = flag, so that when a collection happens you don't leave as much free space (for the heap to expand into if it needs it).

Lupe: Are there any tools for watching the memory used by the JVM while my application is running? What do you use to diagnose memory usage?

Peter Kessler: I like jconsole, even though it's just a demo to inspire tool-writers. There's also just -XX:+PrintGCDetails (and -XX:+PrintGCTimeStamps) to look for patterns. That produces a lot of output that one has to post-process, which can be a problem.

HotSpotter: How much performance improvement should I expect to see if I switch from the -client compiler to the -server compiler?

Ross Knippel: If the application is long running, one might see ~30% performance improvement for the server compiler over the client compiler. The client compiler is optimized for fast startup. The server compiler is optimized for long term performance. Also, the server compiler is, right now, the only option for AMD64 bit platforms.

Simon: Are there any flags to reduce (or control) the overhead of garbage collection?

Peter Kessler: On minimizing GC overhead, you should try JDK-1.5.0 with the -XX:+UseParallelGC collector, which has "ergonomics." You can then specify the -XX:GCTimeRatio= to be the ratio your application can handle. That flag specifies the ratio of application time to GC time, so 99 (the default) asks for GC to run 1 part to the application's 99 parts.

Eduardo: I have an interactive application and garbage collection pauses are noticeable. Should I use the concurrent collector?

Peter Kessler: Absolutely! That's what it's for. We basically have two collectors: a low-pause collector (the concurrent collector, sometimes called CMS); and a throughput collector, the parallel collector. For desktop applications the concurrent collector is a good choice. One problem with the concurrent collector is that it isn't a compacting collector: it uses free lists. So it has some overhead, both in space and in collection time.

chowder: I know that the server VM won't compile code until it has run a certain number of times, and that the threshold is settable on the commandline. In general, I'm willing to suffer a long initial startup period (for compilation) if it gives me more throughput later. Should I set the compile threshold to zero, or will the server VM do a better job of compiling the code if I allow it to run in interpreted mode for a while?

Ross Knippel: Generally it's better to interpret methods before compiling them because references to unloaded classes from compiled code cause recompilation when using the server compiler. Also, profile data is collected by the interpreter for guiding the optimizations.

murphee: Could you briefly talk about the problems with dynamically changing the MaxHeap? Would it be possible to have this in the Memory MBeans, even if it had some impact, or would that cause a pause when used?

HotSpotter: Couldn't you just "realloc" the heap to get a growable heap and avoid the problems of having to specify -Xmx?

Peter Kessler: This is a longer discussion than we can have here. Come to the BOF at JavaOne and ask again. The overhead, now, would be something like the time for a full collection, and the space would be something like twice the heap size. You wouldn't like it. We probably will put the growable heap in the 64-bit JVM first, since those applications really don't know what their memory requirements are.

frankg: Not all JVM vendors use the same string patterns when they produce verbose GC (JRockit). Isn't the output from verbose GC 'standardized'? I'd like to use the same tools for HotSpot as for JRockit (and other JVMs).

Peter Kessler: I only control the output strings for the HotSpot JVM :-) If the other JVMs support JMX MBeans for monitoring, then you could write your own tools to read the values out of the MBeans and format them as you like. Would something like jstat work for you?

chowder: I've heard there's talk of modifying the server VM so that it will compile code immediately on startup (like the client VM does), and then later compile it again with optimizations. Do you expect this for Java SE 6? Will the client VM be retired if/when this change is made? Are there any advantages to the client VM other than the immediate compilation that happens?

Ross Knippel: A tiered compilation scheme is being worked on which will merger the client and server JVMs into a unified system. This will probably not be available in Java SE 6. When this happens, there will be no need to specify -client or -server. The system will use the appropriate compiler as needed. The advantage of the client is improved startup. Most, if not all, optimizations done by the client are also performed by the server compiler. There is a difference in how unloaded classes are handed, but this is mostly a start up issue.

Peter T2: If you're close to CPU bound or have a server application using a lot of threads, would you still recommend -XX:+UseParallelGC?

Peter Kessler: Yes. If you have a multiprocessor, then -XX:+UseParallelGC is the right choice (unless you want low pause times). Otherwise the collector runs as a single thread, and Amdahl's law says you lose. The -XX:+UseParallelGC collector will stop all your server threads, so at that point you are no longer CPU bound.

Rob: Anything good cooking for client-side performance in 6.0?

Peter Kessler: We can't make any promises about what will or won't make it into the final product, but we are working on a bunch of things that should help client performance. For example, we are trying to speed up our synchronization processing, which clients use a lot (either directly or through the GUI classes). The low-pause time collector is getting some "ergonomics" to help it tune itself to the application's memory demands, which should further smooth out collections and collection pauses. And don't forget that we did a bunch of stuff in 5.0 for clients, so client performance on a ramp.

Spiff: The memory overhead for small Java objects can be a killer for some applications. For example, a Triangle object which holds 3 Point3f's. Since HotSpot has an 8-byte per-object overhead plus 8-byte alignment, each of those Point3f's occupy 24 bytes, where they would occupy 12 bytes with C/C++. Are there any plans to perhaps inline (memory-wise) small objects, say if you can guarantee that you'll only refer to them through their containing parent object?

Ross Knippel: On the compiler side, we're planning on doing escape analysis, which may allow these objects to be scalarized and therefore never created. There are no current plans to inline one object into another.

HotSpotter: How does runtime compilation interfere with my application running? In fact, I've often wondered how long into a run is the compiler still compiling things?

Ross Knippel: Compilation occurs in a separate thread. While the compilation is occuring, the method being compiled continues to be executed by the interpreter. In the tiered system, the method may continue to be executed by the client compiled code. By default there are 2 compiler threads for the server compiler.

murphee: NIO Buffers: When NIO came out, it was claimed that access to those buffers would be made as fast (if not faster?) as access to arrays. I haven't been able to find conclusive answers on this. So, is this true, are there optimizations for Buffers in HotSpot? Is something in that area upcoming?

Ross Knippel: I can't remember the exact claims for the optimization of the NIO buffers. The server compiler does try to intrinsify (generate fast inline code) for NIO in 1.4.2. Do you have a specific issue? If so, we'll look into it.

Rob: How would you compare HotSpot GC with GC in the .Net framework? I'm aware .Net supports fewer options for GC, however I also hear .Net developers claim their applications suffer less from GC pauses than Java applications.

Peter Kessler: I don't know how the .Net framework garbage collectors work, so I can't comment on them. There are also lots of differences between C# and the Java language that would affect collection.

murphee: Will 6.0 or 7.0 support redefining of classes (not just method bodies)? If the full redefinition isn't supported, maybe a subset, like changing method signatures?

Ross Knippel: Probably not. The support for redefining of classes to support performance instrumentation does not change the semantics of the class. The recommended way of modifying a running application is to use class loaders to load new class definitions.

Peter Kessler: (Do I get to ask a question?) What kinds of live data sizes are people running with? 128MB, 1024MB, 16GB? And what kinds of pauses can people tolerate in their applications?

murphee: Well, I can only talk about my personal usage, and that is, for instance, 110-170 MB in my Eclipse instance; the biggest problem there is actually not GC pauses (or the GC is so quick I don't notice it), but the fact that Windows is very, very quick to swap out the memory, which, of course, causes the application to be unresponsive when that data is needed again and has to be paged in on demand.

Peter Kessler: Okay, thanks for the compliment, I think. I will suggest that you switch from Eclipse to NetBeans, and see if it has the same problem. We've done a bunch of performance work with the NetBeans folks, and their latest version really rocks.

chowder: We've got live data sets of about a gigabyte, and can normally tolerate pauses of about 15 seconds. We do have a few GUIs attached to these datasets, though, that really need to be more responsive than that.

Peter Kessler: Can you change your architecture to separate the gigabytes of data from the GUI? Or, have you tried the the concurrent collector? (-XX:+UseConcMarkSweepGC).

Spiff: I'd like to take advantage of SIMD instructions such as those found in SSE2. Any plans to either give us new static methods (say in the Math class) that can be intrinsified into SSE2 SIMD instructions? How about having HotSpot detect potentially parallelizable instructions and generate SSE2 for those?

Ross Knippel: Yes, we would like to take full advantage of the SIMD features of the hardware. But there is no schedule for when this will be done. As for new library methods, I don't know what's planned, but if you have a favorite list, please send it in.

frankg: Any stats on HotSpot running on Linux versions?

Ross Knippel: The JVM is fully implemented on Linux. Is there a specific area of performance that you are interested in? In 1.5, there were performance improvements made to the speed of object allocation on multi-processor Linux and Windows platforms. This was done by enabling Thread Local Allocation buffers on all platforms.

MDR-EdO: Well we've quickly come to the end of our session. I want to thank everyone who participated today. I thought we had an excellent range of questions. And of course, I'd like to thank our guests, Peter and Ross, for their answers.

Peter Kessler: If we haven't answered your questions, you could post them to the forum to discuss this chat on java.net.

Ross Knippel: Good bye. Thanks for the interesting set of questions. Sorry if we did not get a chance to answer your question.

MDR-EdO: Moderator signing off. The forum is now unmoderated.

Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.