|
Java ™ HotSpot Virtual Machine Performance Enhancements - JDK 7 |
In a 64-bit JVM, if the UseCompressedOops flag is set to true, the JVM asks the operating system to reserve memory for heap at a specific address. If the operating system supports such a request and reserves memory at the specified address, then zero based compressed oops are used.
Zero based compressed oops means that the narrow oop base starts at 0 instead of starting at an arbitrary address (narrow oop base is Java heap base minus one protected page size). With a zero base, the encoding and decoding of compressed oops can be optimized.
Read more about Compressed OOPS.
Escape analysis is a technique by which the Java™ Hotspot Server Compiler can analyze the scope of an object and decide whether to allocate memory on the heap or not.
The Java Hotspot Server Compiler implements the flow-insensitive escape analysis algorithm described in:
[Choi99] Jong-Deok Shoi, Manish Gupta, Mauricio Seffano,
Vugranam C. Sreedhar, Sam Midkiff,
"Escape Analysis for Java", Procedings of ACM SIGPLAN
OOPSLA Conference, November 1, 1999
The server compiler constructs a "connection graph" (CG) for the method being analyzed. The server compiler makes a pass over the nodes and determines their escape state. A node's escape state may be one of the following:
After escape analysis, the server compiler eliminates scalar replaceable object allocations and associated locks from heap. The server compiler also eliminates locks for all non globally escaping objects. It does not replace a heap allocation with a stack allocation for non globally escaping objects.
Some scenarios for escape analysis are described below:
public class Person {
private String name;
private int age;
public Person(String personName, int personAge) {
name = personName;
age = personAge;
}
public Person(Person p) { this(p.getName(), p.getAge()); }
public int getName() { return name; }
public int getAge() { return age; }
}
public class Employee {
private Person person;
// makes a defensive copy to protect against modifications by caller
public Person getPerson() { return new Person(person) };
public void printEmployeeDetail(Employee emp) {
Person person = emp.getPerson();
// this caller does not modify the object, so defensive copy was unnecessary
System.out.println ("Employee's name: " + person.getName() + "; age: " + person.getAge());
}
}
The method makes a copy to prevent modification of the original object by
the caller.
If the compiler determines that the getPerson() method is being invoked in a
loop, it will inline that method. In addition to this, by escape analysis,
if the compiler determines that the original
object is never modified, it may optimize and eliminate the call to make a
copy.
StringBuffer and Vector are synchronized because they can be
accessed by different threads. However, in most scenarios, they are used in a thread
local manner. In cases where the usage is thread local, the compiler may optimize and
remove the synchronization blocks.
The Parallel Scavenger garbage collector has been extended to take advantage of the machines with NUMA (Non Uniform Memory Access) architecture. Most modern computers are based on NUMA architecture, in which it takes different amount of time to access different parts of memory. Typically, every processor in the system has a local memory that provides low access latency and high bandwidth, and remote memory that is considerably slower to access.
In the Java HotSpot VM, the NUMA-aware allocator has been implemented to take advantage of such systems and provide automatic memory placement optimizations for Java applications. The allocator controls the eden space of the young generation of the heap, where most of the new objects are created. It divides the space into regions each of which is placed in the memory of a specific node. The allocator relies on a hypothesis that a thread that allocates the object will be the most likely to use it. To ensure the fastest access to the new object the allocator places it in the region local to the allocating thread. The regions can be dynamically resized to reflect the allocation rate of the application threads running on different nodes. That makes it possible to increase performance even of single-threaded applications. In addition to that, "from" and "to" survivor spaces of the young generation, the old generation and the permanent generation have page interleaving turned on for them. This ensures that all threads have equal access latencies to these spaces on average.
The NUMA-aware allocator is implemented for Solaris (>= 9u2) and
Linux (kernel >= 2.6.19, glibc >= 2.6.1) operating systems and can be turned on
with the -XX:+UseNUMA flag in conjunction with the selection of
the Parallel Scavenger garbage collector, which is a default for a server-class
machine and also may be turned on explicitly specifying the
-XX:+UseParallelGC option.
When evaluated against the SPEC JBB 2005 benchmark on an 8 chip Opteron machine, NUMA-aware systems showed the following performance increase:
| Copyright
©2009 Sun Microsystems, Inc. All Rights Reserved.
Feedback |
|