|
Java 2 Platform, Standard Edition (J2SE Platform), Java Platform Performance Engineering Sun Microsystems, Inc. Index
Introduction IntroductionSimilar to the release of Java 2 Platform, Standard Edition (J2SE) version 1.4, a design center for the release of J2SE platform, version 1.4.2 was to improve the performance and scalability of the Java platform. In order to do that, the team at Sun Microsystems, Inc. put in place a rigorous program to drive these improvements, working closely with customers and partners to determine key areas where performance improvements would have the most impact. Sun Microsystems, Inc. is also driving performance improvements through the use of various industry standard and internally developed benchmarks. These improvements span areas important to both server-side and client-side Java programs. This guide gives an overview of the performance and scalability improvements made in the J2SE version 1.4.2 release. This includes the results of various benchmarks to demonstrate improvements in existing APIs as well as an overview of key new technologies included in J2SE for the first time. Version 1.4.2 gives you the infrastructure you need for your application to:
This guide contains sample performance results from a range of applications running on different operating systems and hardware, and helps to illustrate that the performance improvements are broadly applicable to many systems and applications. New Performance and Scalability FeaturesSeveral key new features that improve scalability have been added to the J2SE platform in version 1.4.2. These include the new throughput and concurrent low-pause garbage collectors, Linux NPTL thread library support, and SSE/SSE2 register support for floating point operations. Garbage Collection ImprovementsIn the J2SE platform version 1.4.1 two new garbage collectors were introduced to make a total of four garbage collectors from which to choose. In J2SE platform version 1.4.2 the performance of the new collectors have been improved through algorithm optimizations and bug fixes, along with documentation updates to educate developers and administrators on what collectors to choose and when. For a detailed look at garbage collection and the new collectors, go to: http://java.sun.com/docs/hotspot/gc1.4.2/index.html Throughput CollectorThe throughput collector uses a parallel version of the young generation collector. It is used by passing the -XX:+UseParallelGC on the command line. The tenured generation collector is the same as the default collector. Use the throughput collector when you want to improve the performance of your application with larger numbers of processors. In the default collector garbage collection is done by one thread, and therefore garbage collection adds to the serial execution time of the application. The throughput collector uses multiple threads to execute a minor collection and so reduces the serial execution time of the application. Concurrent Low Pause CollectorThe concurrent collector is used to collect the tenured generation and does most of the collection concurrently with the execution of the application. The concurrent collector employs a separate collector thread that consumes CPU cycles during application execution, this allows the application to be paused for only short periods of time during the collection but could lower overall throughput. It is used by passing the -XX:+UseConcMarkSweepGC on the command line. Use the concurrent collector if your application would benefit from shorter garbage collector pauses and can afford to share processor resources with the garbage collector when the application is running. Typically applications which have a relatively large set of long-lived data (a large tenured generation), and run on machines with two or more processors tend to benefit from the use of this collector. However, this collector should be considered for any application with a low pause time requirement. Optimal results have been observed for interactive applications with tenured generations of a modest size on a single processor. AggressiveHeap - Server Performance OptionThe -XX:+AggresiveHeap option inspects the machine resources (size of memory and number of processors) and attempts to set various parameters to be optimal for long-running, memory allocation-intensive jobs. It was originally intended for machines with large amounts of memory and a large number of CPUs, but in the J2SE platform, version 1.4.1 and later it has shown itself to be useful even on four processor machines. The physical memory on the machines must be at least 256MB before AggresiveHeap can be used. The size of the initial heap is calculated based on the size of the physical memory and attempts to make maximal use of the physical memory for the heap (i.e., the algorithm attempts to use half of available memory, or all of possible memory less 160mb, whatever is the lesser of the two). There are several optimizations and changes in parameter values with AggressiveHeap in J2SE platform version 1.4.2 that were added in an effort to make the option more useful for general server use. AggressiveHeap is recommended for server applications requiring high performance and scalability and can greatly ease performance tuning efforts. With Sun's emphasis on low-cost enterprise computing, we renewed our commitment to performance on Sun's latest offerings. Illustration 1 shows the performance gains as measured on SPECjbb®2000 while using AggressiveHeap on J2SE platform version 1.4.1 and J2SE version 1.4.2. Although SPARC was already highly optimized in J2SE version 1.4.1 with AggressiveHeap, we made further improvements of nearly 20% in J2SE version 1.4.2.
SSE and SSE2 Instruction Sets for Floating Point ComputationJ2SE platform version 1.4.2 now uses SSE and SSE2 instruction sets for floating point computations on hardware and software platforms that support this feature. Use of the SSE and SSE2 instruction sets allows J2SE platform version 1.4.2 to have optimal performance of scientific and numerical computations and to take full advantage of new hardware and software platforms. The graph below highlights the performance gain of SSE and SSE2 instruction support as measured by SciMark 2.0, a scientific and numerical computing application performing floating point computations.
JVM Runtime OptimizationsSeveral optimizations and bug fixes are in included in the J2SE platform version1.4.2 Java Virtual Machine for the Java platform (JVM) Runtime which have improved overall performance, and in some cases, show substantial performance improvement. Note that the terms "Java virtual machine" and and "JVM" mean a virtual machine for the Java platform.
An example of such improvement was to make system dictionary reads lock-free. The system dictionary is an internal JVM machine data structure, and holds all the classes loaded by the system. It helps a lot for calls like Class.forName(), which do lookups into this data structure at the lowest level. Before this change, both readers and writers took out a lock to look at the system dictionary. Illustration 3 above highlights the performance gain of system dictionary locking improvements described above, as measured by a heavily threaded micro-benchmark running on Red Hat Linux 7.3 with traditional pre-NPTL Linux threads. The micro-benchmark measures 400 threads all accessing the system dictionary simultaneously. Light Weight Performance MonitoringMonitoring the performance of deployed Java applications can be rather challenging. Existing tools are either too intrusive or can only be enabled with a restart of the application. The Java HotSpot virtual machine included with J2SE platform version 1.4.2 includes an experimental lightweight instrumentation and monitoring interface. This interface is always on and provides for non-intrusive, real-time JVM performance monitoring in production environments. The HotSpot JVM in J2SE platform version 1.4.2 includes instrumentation for the various garbage collectors, the client and server JIT compilers, the class loader, and various configuration parameters. The instrumentation is exported though a private interface that allows for asynchronous monitoring. A set of experimental performance monitoring tools, called the jvmstat tools, is provided as a separate download from: http://java.sun.com/performance/jvmstat/ Note: The jvmstat 1.0 tools only support the HotSpot Java Virtual machine distributed with J2SE platform version 1.4.1. A release of jvmstat tools that supports the HotSpot Java Virtual Machine distributed with J2SE platform version 1.4.2 will be available shortly. The jvmstat tools provide for asynchronous sampling and display of the instrumentation exported from the HotSpot JVM. The jvmstat command line tool displays the instrumentation in textual format. The visualgc tool provide a graphical view of the garbage collection system and is useful for diagnosing Java runtime environment heap configuration and tuning issues.
The combination of instrumentation for the Java HotSpot virtual machine and the jvmstat monitoring tools provide for new and powerful mechanisms to monitor the performance of production Java applications. New Platform SupportIA64 64-bit for Windows and LinuxWith the release of J2SE platform version 1.4.2 comes a new addition to the platforms supported by the HotSpot JVM. Full 64-bit support for the Intel IA-64 architecture and the Itanium family of processors is a major addition for the J2SE platform, and is a strong example of Sun's continued focus on enterprise and network computing. With a new port to IA-64 comes the opportunity to leverage past work with the 64-bit version of the Java virtual machine for the Solaris Operating System (SPARC® Platform Edition) ensuring delivery of a reliable, scalable, high performing, and highly competitive 64-bit version of the JVM machine. Linux Thread OptimizationsThe first thing Java developers notice when running their application on Linux is that the ps command, used to display the list of processes, appears to show multiple copies of Java runtime environment running even though only one Java application was started. This is due to the implementation of the system threads library on Linux. Linux threads are implemented as a cloned process, that means each Java thread appears as a new Linux process. The advantage of this approach is that the threads implementation is simpler and stable, however the downside is that this also affects the performance of even a moderately threaded Java application on Linux. The confusing ps command display issue is fixed in the 2.0.7 version of the procps package, however the overhead with using processes for each Java thread has been, until now, the biggest challenge for adopting the Java runtime environment on the Linux platform. The scalability and signal handling issues of the Linux threads implementation is well known in the Linux community. The two most well known thread library projects that set out to solve this problem have been the NGPT (Next Generation Posix Threads) library , and a new library called NPTL (Native Posix Thread Library). The NPTL approach keeps the 1:1 thread mapping, 1 user or Java thread to 1 kernel thread, but optimizes the kernel for thread related operations, including signal handling, synchronization and thread creation speed. The NPTL library is now available in Red Hat 9 by default and this is a very exciting time for developers for the Java programming language on the Linux developers. NPTL Highlights:
The graph below highlights the performance gain of NPTL support in J2SE platform version 1.4.2 as described above, as measured by a heavily threaded instant messaging application running on Linux with NPTL support. Note: the faster the time to completion (smaller bar) the better the result.
Client-side Performance ImprovementsStart-up PerformanceWork was done in J2SE version 1.4.2 to decrease the startup time of applications. A few possible approaches were investigated and the general approach decided upon was measurement and optimization of the J2SE core libraries. New tools were built for acquiring and analyzing fine-grained performance measurements and the expensive areas of code optimized. Several performance optimizations have been made, including:
Performance measurements indicate that startup time for small command-line applications has been reduced by roughly thirty percent and for small Swing applications by roughly fifteen to twenty percent. Some of the optimizations appear to have carried over to larger applications. Additional startup time work is planned for future releases. This graph shows a startup benchmark that measures the aggregate time to load up three different industry known GUI applications. We can see that startup of J2SE1.4.2 has improved by over 30% when compared with J2SE1.4.1.
AppendixSystems Under Testx86 Test System
Test System for theSPARC® Architecture
Testing MethodologyProper statistical analysis to identify both performance regressions and gains is critical to software development and is a core component to JVM testing within Java Software. This section gives a brief look at the analysis used during the testing highlighted in this paper, and a few suggestions to point readers in the right direction for further research. Sample SizeIn order to calculate meaningful confidence intervals all of the performance testing reported in this paper requires no less then 10 test samples, or in other words, each test is run at least ten times. Ensuring a proper sample size is key to further analysis, doing so will make it possible to identify small regressions and gains, rather than simply explaining it as 'noise'. Calculate the Confidence IntervalCalculating a confidence interval adds reliability to your results and makes it possible to identify true regressions or gains when sample differences are small or are not consistent (high standard deviation). For the tests highlighted in this paper, the following computations were performed.
Benchmark DisclosureSPECjbb2000SPECjbb2000 is benchmark from the Standard Performance Evaluation Corporation (SPEC). The performance referenced is based on Sun internal software testing conforming to the testing methodologies listed above. All SPECjbb2000 comparison tests were run with the following arguments: java -server -Xms1600m -Xmx1600m -XX:+AggressiveHeap For the latest SPECjbb2000 results visit http://www.spec.org/osg/jbb2000 SciMark 2.0SciMark 2.0 is a Java benchmark for scientific and numerical computing. It measures several computational kernels and reports a composite score in approximate Mflops (Millions of floating point operations per second). All SciMark 2.0 comparison tests were run with the following arguments: java -server -Xms1600m -Xmx1600m -XX:+AggressiveHeap For more information on SciMark 2.0 visit http://math.nist.gov/scimark2/ Instant Messaging BenchmarkThis benchmark simulates a server which handles routing brief text messages between a number of users. In this case there were 400 users, each of which is serviced by one worker thread. All comparison tests were run with the following arguments: java -server -Xms1600m -Xmx1600m -XX:+AggressiveHeap Startup Time BenchmarkThe Startup Time benchmark measures the time it takes to load up an application. The definition of load up is the time from when the command is executed to the time when the application comes to a steady state. In this case, we are aggregating the startup times of JMol, TeaTimeJ, and Mailpuccino which are applications that are heavily dependent on Swing and AWT components. The measurements were taken with the client compiler and all default options. Sun Microsystems, Sun, the Sun logo, Solaris, Java, J2SE, JVM, and HotSpot are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the United States and other countries. Products bearing SPARC trademarks are based upon and architecture developed by Sun Microsystems, Inc. |
| |||||||||||||||||||
|
| ||||||||||||