|
This article discusses The tool described in this article is a free, GPL-licensed Java tool originally published with the book Java Programming on Linux. It is included with the book's CD-ROM, and also available from the book's web site (relevant pointers are provided at the end of the article). There is nothing Linux-specific about this tool--it's usable with any implementation of the Java 2 Platform that supports the Java Virtual Machine1 Profiling Interface. Profiling Java ApplicationsUntil the release of the Java 2 Platform, profiling of Java application performance was a difficult task. The JVM offered a very limited profiling feature, and profiling tools were difficult to write and hard to find. The Java 2 Platform greatly improved the story with the introduction of the JVM Profiling Interface (JVMPI)--a set of native-language hooks that allow developers to capture profiling information from a running Java application.
Beginning with JDK1.2, the SDK included a sample profiling application,
Using
|
1 package com.develop.demos;
2
3 import java.io.IOException;
4
5 public class TestHprof {
6 public static String cat = null;
7 public final static int loop=5000;
8
9 public static void makeString() {
10 cat = new String();
11 for (int n=0; n<loop; n++) {
12 addToCat("more");
13 }
14 }
15
16 public static void addToCat(String more) {
17 cat = cat + more;
18 }
19
20 public static void makeStringInline() {
21 cat = new String();
22 for (int n=0; n<loop; n++) {
23 cat = cat + "more";
24 }
25 }
26
27 public static void makeStringWithLocal() {
28 String tmp = new String();
29 for (int n=0; n<loop; n++) {
30 tmp = tmp + "more";
31 }
32 cat = tmp;
33 }
34
35 public static void makeStringWithBuffer() {
36 StringBuffer sb = new StringBuffer();
37 for (int n=0; n<loop; n++) {
38 sb.append("more");
39 }
40 cat = sb.toString();
41 }
42 public static void main(String[] args) {
43 long begin = System.currentTimeMillis();
44
45 if (null !=
46 System.getProperty("WaitForProfiler")) {
47 System.out.println(
48 "Start your profiler,
then press any key to begin...");
49 try {
50 System.in.read();
51 }
52 catch (IOException ioe) {
53 }
54 }
55
56 makeString();
57 makeStringInline();
58 makeStringWithLocal();
59 makeStringWithBuffer();
60
61 long end =
62 System.currentTimeMillis();
63 System.out.println("Total run time of "
64 + (end - begin) + " milliseconds");
65 }
66 }
|
Here is a sample invocation of the program, with profiling (for purposes of this discussion, we'll run with green threads and no JIT; we discuss this in more detail later in the article):
java -green -Djava.compiler= -Xrunhprof:cpu=samples,depth=6,thread=y
com.develop.demos.TestHprof
hprof output
Briefly examining the hprof output shows that
understanding the performance of this application is a difficult problem.
We see that most of the time is being spent in StringBuffer
initialization code. To figure out which makeString methods are
most efficient requires some detective work.
We see that the StringBuffer initialization was performed on behalf
of the makeStringInline method, so
makeStringInline is responsible for 260 ticks
of the profiling counter. But that's not all;
makeStringInline appears in 17 different traces.
Tracking down its total cost from the hprof
output is tedious and difficult work.
PerfAnal
The PerfAnal tool simplifies this analysis in two ways:
java -jar PerfAnal.jar java.hprof.txt
Click to view Figure 1 full-scale
The four quadrants contain four different analyses of the data, presented
with the familiar Swing JTree interface. Let's take a closer look
at each
quadrant.
Figure 2 shows the upper-left quadrant, which ranks and analyzes CPU usage by caller.
Click to view Figure 2 full-scale
This quadrant ranks the time spent in various methods,
inclusive of subroutine calls. Not surprisingly,
main takes the most time. We can open up branches
of the tree to explore where main is spending
its time (see Figure 3).

Click to view Figure 3 full-scale
We see that main spent 35.20% of its time in
makeStringInline, 30.03% of its time in
makeStringWithLocal, and so on. Examining
the first method in more detail show us that
makeStringInline spent most of its time
in the StringBuffer constructor.
The lower left quadrant shows the same ranking we saw in the upper left quadrant, but this tree organizes the analysis by callee. It answers the question: on whose behalf is a method spending time?
Click to view Figure 4 full-scale
Figure 4 shows such an analysis in the lower-left quadrant.
Here we ask the question: why is the program spending 93.73% of its
time in the StringBuffer constructor--on whose behalf is it so
busy?
Exploring that branch of the tree shows us how that time is being
spent on behalf of the various makeString methods.
The upper right quadrant shows, yet again, the same ranking of CPU usage by method. Here, however, the usage is broken down by line number. Figure 5 shows the analysis.
Click to view Figure 5 full-scale
The analysis shows, line by line, where main (or
any other method) is spending its time, inclusive of called subroutines.
Finally, the lower right quadrant answers the question: where is the CPU really spending its time? This tree ranks method usage exclusive of subroutine calls. Figure 6 shows the analysis.
Click to view Figure 6 full-scale
We see that, when we do not include subroutine calls, line 120 of the
StringBuffer constructor is where our program spends fully 79.54%
of its time.
Each of these analyses, individually, provides important information about
where execution time is being spent. PerfAnal provides
an additional feature that allows you to correlate the information shown
in the various windows.
For example, Figure 6 showed us that 12.54% of execution time was spent
in System.arraycopy. What is taking so much time?
By right-clicking on that entry, we pop up a menu that allows us to
correlate the information (see Figure 7).
Click to view Figure 7 full-scale
By choosing Goto this Method, we position all
four quadrants to examine this method. Returning to the lower-left
quadrant, we can explore in detail (see Figure 8) who is calling
System.arraycopy.
Click to view Figure 8 full-scale
The right-click popup menu shown in Figure 7 offers two additional features.
If you choose Select a Method to Analyze, you are
presented with a dialog (see Figure 9) that allows you to select a method
by name. After you select a method, all four quadrants will be positioned
to analyze that method.
Figure 9
If you choose Select Thread(s) to Analyze, you can
choose one or more threads from a dialog (see Figure 10). The analysis
will be recomputed to report only on the selected threads.
Finally, PerfAnal allows you to save the analysis
data. The File menu (in the menu bar) gives you a
Save... item that saves all the processed information
into a textfile.
In the example discussed above, we captured the profiling data using a JVM running with green threads and with JIT compilation disabled. The choice of threading model and JIT can have an important effect on your results.
Experience with the threading model on Linux suggests that the profiling results captured with native threads cannot necessarily be trusted. I've noticed, at least for GUI applications, that much of the profiling data is charged to the wrong method running in the wrong thread: much time spent in Swing rendering is charged instead to the Motif event loop (which in fact spends most of its time waiting). This problem does not appear when running with green threads, and I do not know if it appears on other platforms.
The use of JIT will have an obvious effect on performance, but it can also affect the usefulness of the profiling output. In general, JIT-compiled code loses line number information; any methods that have been compiled will not report line numbers in the profile, so the analyses in the upper- and lower-right quadrants will not be as informative.
hprof Option Dependencies
Certain hprof options have important effects on
the PerfAnal analysis:
cpu=samples: This option is necessary to
capture the data PerfAnal needs.
depth=: This option controls the depth
of the stack traces saved during profiling. You may need to experiment
with this value: large values produce more accurate information, but
increase the memory and CPU requirements of PerfAnal.
If the value you choose for stack depth
is too small, PerfAnal will not be able to fully
ascertain CPU usage by subroutines. Notice, for example, in Figure 4 that
the times reported for the various callers add up to the time reported
for the callee. If they do not add up, you have not captured enough stack
depth. There is no right amount of stack depth--it just depends on how
far down the call stack you need to analyze. In my
own profiling, I usually start with a depth of 12 and refine it as needed.
thread=y: You must specify this option
if you want to be able to analyze the data by thread.
Source for PerfAnal is included on the CD-ROM
with Java Programming on Linux, and is also available from the
book's web site.
Java Programming on Linux, by Nathan Meyers, is devoted to discussing the use of Java in the Linux environment. It provides detailed information on JDKs and other development tools available on the Linux platform, discusses the platform-specific issues faced on Linux, and describes the many ways in which Java can be deployed on Linux.
In addition to PerfAnal, the book provides and
discusses tools for analyzing the time Java applications spend in native
code and in the kernel, and discusses Java's performance demands on CPU,
memory, and the X Window System.
To learn more about the book, visit its web site.
Nathan Meyers spent 20 years as a developer and architect for Hewlett-Packard Company, working in handheld calculators, UNIX workstations, and inkjet printers. His experience includes developing embedded systems, device driver implementation, creating development tools, definition and implementation work on the X Window System and the Common Desktop Environment, developing 2D and 3D graphics applications, UNIX application performance tuning, designing evolutionary algorithms, and implementing financial algorithms. He left HP in 1999, wrote Java Programming on Linux, and joined a dot-com startup in Portland, Oregon.
_______
1 As used on this web site, the terms "Java
virtual machine" or "JVM" mean a virtual machine for the Java
platform.
Oracle is reviewing the Sun product roadmap and will provide guidance to customers in accordance with Oracle's standard product communication policies. Any resulting features and timing of release of such features as determined by Oracle's review of roadmaps, are at the sole discretion of Oracle. All product roadmap information, whether communicated by Sun Microsystems or by Oracle, does not represent a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. It is intended for information purposes only, and may not be incorporated into any contract.
|
| ||||||||||||