|
SDN Chat Sessions Transcripts Index
March 15, 2005 This is a moderated forum. MDR-EdO: Welcome to today's SDN chat on "Squeezing Performance from the Java HotSpot Virtual Machine." Our guests today are Peter Kessler and Ross Knippel. Peter is the technical lead for garbage collection (GC) in the HotSpot VM. He's ready to handle questions about storage allocation and garbage collection. Ross focuses on the HotSpot Server Compiler, and will handle your questions about the runtime compilers. Peter and Ross are also prepared to field questions about the structure of the virtual machine code, how things work, and all that stuff that goes on below the level of Java code. Rob: I would like to know the best strategy for GC in the following situation with the Java 2 Platform, Standard Edition (J2SE) 5.0 Client VM: I have a Java Desktop application that makes calls to native code (dll) to initiate a scan operation on a high speed scanner (200 pages per minute). The native code will call back into a Java routine to get the name of the file that the scanner should save to disk. I want to do everything possible to make sure that no GC pauses take place so that the scanner always runs at it's full rated speed for max throughput. Peter Kessler: If you just have to get the name of the file, why don't you do that first, and pass that to the native code? Since native code continues to run even during GC (except if it tries to touch Java heap memory), you won't be interrupted. HotSpotter: Were there any improvements made to the HotSpot compilers (client and server) for JDK 5.0? Ross Knippel: Yes, there were performance enhancements to the handling of:
murphee: Are there any ideas for getting rid of the dreaded
Peter Kessler: There isn't currently a way to get rid of Venu: The JVM doesn't seem to return pages to the operating system after a garbage collection shrinks the heap. Do I have to enable something to get that to happen?
Peter Kessler: You might want to turn down the Lupe: Are there any tools for watching the memory used by the JVM while my application is running? What do you use to diagnose memory usage?
Peter Kessler: I like jconsole, even though it's just a demo to inspire tool-writers. There's also just
HotSpotter: How much performance improvement should I expect to see if I switch from the Ross Knippel: If the application is long running, one might see ~30% performance improvement for the server compiler over the client compiler. The client compiler is optimized for fast startup. The server compiler is optimized for long term performance. Also, the server compiler is, right now, the only option for AMD64 bit platforms. Simon: Are there any flags to reduce (or control) the overhead of garbage collection?
Peter Kessler: On minimizing GC overhead, you should try JDK-1.5.0 with the Eduardo: I have an interactive application and garbage collection pauses are noticeable. Should I use the concurrent collector? Peter Kessler: Absolutely! That's what it's for. We basically have two collectors: a low-pause collector (the concurrent collector, sometimes called CMS); and a throughput collector, the parallel collector. For desktop applications the concurrent collector is a good choice. One problem with the concurrent collector is that it isn't a compacting collector: it uses free lists. So it has some overhead, both in space and in collection time. chowder: I know that the server VM won't compile code until it has run a certain number of times, and that the threshold is settable on the commandline. In general, I'm willing to suffer a long initial startup period (for compilation) if it gives me more throughput later. Should I set the compile threshold to zero, or will the server VM do a better job of compiling the code if I allow it to run in interpreted mode for a while? Ross Knippel: Generally it's better to interpret methods before compiling them because references to unloaded classes from compiled code cause recompilation when using the server compiler. Also, profile data is collected by the interpreter for guiding the optimizations.
murphee: Could you briefly talk about the problems with dynamically changing the HotSpotter: Couldn't you just "realloc" the heap to get a growable heap and avoid the problems of having to specify -Xmx? Peter Kessler: This is a longer discussion than we can have here. Come to the BOF at JavaOne and ask again. The overhead, now, would be something like the time for a full collection, and the space would be something like twice the heap size. You wouldn't like it. We probably will put the growable heap in the 64-bit JVM first, since those applications really don't know what their memory requirements are. frankg: Not all JVM vendors use the same string patterns when they produce verbose GC (JRockit). Isn't the output from verbose GC 'standardized'? I'd like to use the same tools for HotSpot as for JRockit (and other JVMs). Peter Kessler: I only control the output strings for the HotSpot JVM :-) If the other JVMs support JMX MBeans for monitoring, then you could write your own tools to read the values out of the MBeans and format them as you like. Would something like jstat work for you? chowder: I've heard there's talk of modifying the server VM so that it will compile code immediately on startup (like the client VM does), and then later compile it again with optimizations. Do you expect this for Java SE 6? Will the client VM be retired if/when this change is made? Are there any advantages to the client VM other than the immediate compilation that happens?
Ross Knippel: A tiered compilation scheme is being worked on which will merger the client and server JVMs into a unified system. This will probably not be available in Java SE 6. When this happens, there will be no need to specify
Peter T2: If you're close to CPU bound or have a server application using a lot of threads, would you still recommend
Peter Kessler: Yes. If you have a multiprocessor, then Rob: Anything good cooking for client-side performance in 6.0? Peter Kessler: We can't make any promises about what will or won't make it into the final product, but we are working on a bunch of things that should help client performance. For example, we are trying to speed up our synchronization processing, which clients use a lot (either directly or through the GUI classes). The low-pause time collector is getting some "ergonomics" to help it tune itself to the application's memory demands, which should further smooth out collections and collection pauses. And don't forget that we did a bunch of stuff in 5.0 for clients, so client performance on a ramp. Spiff: The memory overhead for small Java objects can be a killer for some applications. For example, a Triangle object which holds 3 Point3f's. Since HotSpot has an 8-byte per-object overhead plus 8-byte alignment, each of those Point3f's occupy 24 bytes, where they would occupy 12 bytes with C/C++. Are there any plans to perhaps inline (memory-wise) small objects, say if you can guarantee that you'll only refer to them through their containing parent object? Ross Knippel: On the compiler side, we're planning on doing escape analysis, which may allow these objects to be scalarized and therefore never created. There are no current plans to inline one object into another. HotSpotter: How does runtime compilation interfere with my application running? In fact, I've often wondered how long into a run is the compiler still compiling things? Ross Knippel: Compilation occurs in a separate thread. While the compilation is occuring, the method being compiled continues to be executed by the interpreter. In the tiered system, the method may continue to be executed by the client compiled code. By default there are 2 compiler threads for the server compiler. murphee: NIO Buffers: When NIO came out, it was claimed that access to those buffers would be made as fast (if not faster?) as access to arrays. I haven't been able to find conclusive answers on this. So, is this true, are there optimizations for Buffers in HotSpot? Is something in that area upcoming? Ross Knippel: I can't remember the exact claims for the optimization of the NIO buffers. The server compiler does try to intrinsify (generate fast inline code) for NIO in 1.4.2. Do you have a specific issue? If so, we'll look into it. Rob: How would you compare HotSpot GC with GC in the .Net framework? I'm aware .Net supports fewer options for GC, however I also hear .Net developers claim their applications suffer less from GC pauses than Java applications. Peter Kessler: I don't know how the .Net framework garbage collectors work, so I can't comment on them. There are also lots of differences between C# and the Java language that would affect collection. murphee: Will 6.0 or 7.0 support redefining of classes (not just method bodies)? If the full redefinition isn't supported, maybe a subset, like changing method signatures? Ross Knippel: Probably not. The support for redefining of classes to support performance instrumentation does not change the semantics of the class. The recommended way of modifying a running application is to use class loaders to load new class definitions. Peter Kessler: (Do I get to ask a question?) What kinds of live data sizes are people running with? 128MB, 1024MB, 16GB? And what kinds of pauses can people tolerate in their applications? murphee: Well, I can only talk about my personal usage, and that is, for instance, 110-170 MB in my Eclipse instance; the biggest problem there is actually not GC pauses (or the GC is so quick I don't notice it), but the fact that Windows is very, very quick to swap out the memory, which, of course, causes the application to be unresponsive when that data is needed again and has to be paged in on demand. Peter Kessler: Okay, thanks for the compliment, I think. I will suggest that you switch from Eclipse to NetBeans, and see if it has the same problem. We've done a bunch of performance work with the NetBeans folks, and their latest version really rocks. chowder: We've got live data sets of about a gigabyte, and can normally tolerate pauses of about 15 seconds. We do have a few GUIs attached to these datasets, though, that really need to be more responsive than that.
Peter Kessler: Can you change your architecture to separate the gigabytes of data from the GUI? Or, have you tried the the concurrent collector? (
Spiff: I'd like to take advantage of SIMD instructions such as those found in SSE2. Any plans to either give us new static methods (say in the Ross Knippel: Yes, we would like to take full advantage of the SIMD features of the hardware. But there is no schedule for when this will be done. As for new library methods, I don't know what's planned, but if you have a favorite list, please send it in. frankg: Any stats on HotSpot running on Linux versions? Ross Knippel: The JVM is fully implemented on Linux. Is there a specific area of performance that you are interested in? In 1.5, there were performance improvements made to the speed of object allocation on multi-processor Linux and Windows platforms. This was done by enabling Thread Local Allocation buffers on all platforms. MDR-EdO: Well we've quickly come to the end of our session. I want to thank everyone who participated today. I thought we had an excellent range of questions. And of course, I'd like to thank our guests, Peter and Ross, for their answers. Peter Kessler: If we haven't answered your questions, you could post them to the forum to discuss this chat on java.net. Ross Knippel: Good bye. Thanks for the interesting set of questions. Sorry if we did not get a chance to answer your question. MDR-EdO: Moderator signing off. The forum is now unmoderated. |
|
| ||||||||||||