PerfAnal
This article discusses The tool described in this article is a free, GPL-licensed Java tool originally published with the book Java Programming on Linux. It is included with the book's CD-ROM, and also available from the book's web site (relevant pointers are provided at the end of the article). There is nothing Linux-specific about this tool--it's usable with any implementation of the Java 2 Platform that supports the Java Virtual Machine1 Profiling Interface. Profiling Java ApplicationsUntil the release of the Java 2 Platform, profiling of Java application performance was a difficult task. The JVM offered a very limited profiling feature, and profiling tools were difficult to write and hard to find. The Java 2 Platform greatly improved the story with the introduction of the JVM Profiling Interface (JVMPI)--a set of native-language hooks that allow developers to capture profiling information from a running Java application. Beginning with JDK1.2, the SDK included a sample profiling
application,
Using
|
1 package com.develop.demos; |
Here is a sample invocation of the program, with profiling (for purposes of this discussion, we'll run with green threads and no JIT; we discuss this in more detail later in the article):
java -green -Djava.compiler=
-Xrunhprof:cpu=samples,depth=6,thread=y com.develop.demos.TestHprof
The resulting profiling output is 432 lines long, and can be obtained here. (Note that this is not the same output used for the Halloway article.)
hprof output
Briefly examining the hprof output shows that
understanding the performance of this application is a difficult
problem.
The first few lines of the ranking at the end of the output look like
this.
We see that most of the time is being spent in StringBuffer
initialization code. To figure out which makeString
methods are most efficient requires some detective work. For example, look at trace #31 in the file (which corresponds
to the top entry in the above ranking).
We see that the StringBuffer initialization was
performed on behalf of the makeStringInline method, so
makeStringInline is responsible for 260 ticks
of the profiling counter. But that's not all;
makeStringInline appears in 17 different traces.
Tracking down its total cost from the hprof
output is tedious and difficult work.
PerfAnal
The PerfAnal tool simplifies this analysis in two ways:
You can obtain the PerfAnal .jar file by right-clicking
here.
To run it on the data we just captured:
java -jar PerfAnal.jar java.hprof.txt
Click to view Figure 1 full-scale
The four quadrants contain four different analyses of the data,
presented
with the familiar Swing JTree interface. Let's take a
closer look at each
quadrant.
Figure 2 shows the upper-left quadrant, which ranks and analyzes CPU usage by caller.

Click to view Figure 2 full-scale
This quadrant ranks the time spent in various methods,
inclusive of subroutine calls. Not surprisingly,
main takes the most time. We can open up branches
of the tree to explore where main is spending
its time (see Figure 3).
Click to view Figure 3 full-scale
We see that main spent 35.20% of its time in
makeStringInline, 30.03% of its time in
makeStringWithLocal, and so on. Examining
the first method in more detail show us that
makeStringInline spent most of its time
in the StringBuffer constructor.
The lower left quadrant shows the same ranking we saw in the upper left quadrant, but this tree organizes the analysis by callee. It answers the question: on whose behalf is a method spending time?
Click to view Figure 4 full-scale
Figure 4 shows such an analysis in the lower-left quadrant.
Here we ask the question: why is the program spending 93.73% of its
time in the StringBuffer constructor--on whose behalf is
it so busy?
Exploring that branch of the tree shows us how that time is being
spent on behalf of the various makeString methods.
The upper right quadrant shows, yet again, the same ranking of CPU usage by method. Here, however, the usage is broken down by line number. Figure 5 shows the analysis.
Click to view Figure 5 full-scale
The analysis shows, line by line, where main (or
any other method) is spending its time, inclusive of called
subroutines.
Finally, the lower right quadrant answers the question: where is the CPU really spending its time? This tree ranks method usage exclusive of subroutine calls. Figure 6 shows the analysis.

We see that, when we do not include subroutine calls, line 120 of the
StringBuffer constructor is where our program spends fully
79.54%
of its time.
Each of these analyses, individually, provides important information
about
where execution time is being spent. PerfAnal provides
an additional feature that allows you to correlate the information
shown
in the various windows.
For example, Figure 6 showed us that 12.54% of execution time was
spent
in System.arraycopy. What is taking so much time?
By right-clicking on that entry, we pop up a menu that allows us to
correlate the information (see Figure 7).
Click to view Figure 7 full-scale
By choosing Goto this Method, we position all
four quadrants to examine this method. Returning to the lower-left
quadrant, we can explore in detail (see Figure 8) who is calling
System.arraycopy.
Click to view Figure 8 full-scale
The right-click popup menu shown in Figure 7 offers two additional features.
If you choose Select a Method to Analyze, you are
presented with a dialog (see Figure 9) that allows you to select a
method
by name. After you select a method, all four quadrants will be
positioned
to analyze that method.
Figure 9
If you choose Select Thread(s) to Analyze, you can
choose one or more threads from a dialog (see Figure 10). The analysis
will be recomputed to report only on the selected threads.
Finally, PerfAnal allows you to save the analysis
data. The File menu (in the menu bar) gives you a
Save... item that saves all the processed information
into a textfile. The information from all four quadrants is saved, with
the tree fully unfurled--here are the first few lines of the saved file
from this analysis:
Experience with the threading model on Linux suggests that the profiling results captured with native threads cannot necessarily be trusted. I've noticed, at least for GUI applications, that much of the profiling data is charged to the wrong method running in the wrong thread: much time spent in Swing rendering is charged instead to the Motif event loop (which in fact spends most of its time waiting). This problem does not appear when running with green threads, and I do not know if it appears on other platforms.
The use of JIT will have an obvious effect on performance, but it can also affect the usefulness of the profiling output. In general, JIT-compiled code loses line number information; any methods that have been compiled will not report line numbers in the profile, so the analyses in the upper- and lower-right quadrants will not be as informative.
hprof Option Dependencies
Certain hprof options have important effects on
the PerfAnal analysis:
cpu=samples: This option is necessary to
capture the data PerfAnal needs.
depth=: This option controls the depth
of the stack traces saved during profiling. You may need to experiment
with this value: large values produce more accurate information, but
increase the memory and CPU requirements of PerfAnal.
If the value you choose for stack depth
is too small, PerfAnal will not be able to fully
ascertain CPU usage by subroutines. Notice, for example, in Figure 4
that
the times reported for the various callers add up to the time reported
for the callee. If they do not add up, you have not captured enough
stack
depth. There is no right amount of stack depth--it just depends
on how far down the call stack you need to analyze. In my
own profiling, I usually start with a depth of 12 and refine it as
needed.
thread=y: You must specify this option
if you want to be able to analyze the data by thread.
Source for PerfAnal is included on the CD-ROM
with Java Programming on Linux, and is also available from the
book's web site.
Java Programming on Linux, by Nathan Meyers, is devoted to discussing the use of Java in the Linux environment. It provides detailed information on JDKs and other development tools available on the Linux platform, discusses the platform-specific issues faced on Linux, and describes the many ways in which Java can be deployed on Linux.
In addition to PerfAnal, the book provides and
discusses tools for analyzing the time Java applications spend in
native
code and in the kernel, and discusses Java's performance demands on
CPU,
memory, and the X Window System.
To learn more about the book, visit its web site.
Nathan Meyers spent 20 years as a developer and architect for Hewlett-Packard Company, working in handheld calculators, UNIX workstations, and inkjet printers. His experience includes developing embedded systems, device driver implementation, creating development tools, definition and implementation work on the X Window System and the Common Desktop Environment, developing 2D and 3D graphics applications, UNIX application performance tuning, designing evolutionary algorithms, and implementing financial algorithms. He left HP in 1999, wrote Java Programming on Linux, and joined a dot-com startup in Portland, Oregon.
_______
1 As used on this web site, the terms
"Java virtual machine" or "JVM" mean a virtual machine for the Java
platform.
|
| ||||||||||||