In recent years, few people have written more about the Java platform than has Sun Microsystems technology evangelist Brian Goetz. Since 2000, he has published some 75 articles on best practices, platform internals, and concurrent programming, and he is the principal author of the book Java Concurrency in Practice, a 2006 Jolt Award Finalist and the best-selling book at the 2006 JavaOne conference. Prior to joining Sun in August of 2006, he was a consultant for 15 years for his software firm, Quiotix, where, in addition to writing about Java technology, he spoke frequently at conferences and gave presentations on threading, the Java programming language memory model, garbage collection, Java technology performance myths, and other topics. In addition, he has consulted on kernel internals, device drivers, protocol implementations, compilers, server applications, web applications, scientific computing, data visualization, and enterprise infrastructure tools. He's participated in a number of open-source projects, including the Lucene text search and retrieval system, and the FindBugs static analysis toolkit. At Sun, he serves as a consultant on a wide range of topics that extend from Java concurrency to the needs of Java developers, and he contributes to the development of the Java platform. We met with him to get his thoughts on Java technology performance challenges, Java Platform, Standard Edition 6 (Java SE 6), common performance hazards, the challenges of moving from C to Java programming, and ways to write better code.
"It says: ' If the value of the Expression is null, a
The poster suggests extending the Java language to allow null
expressions here and not synchronize the block. Some
Your response?
Wrong Intuitions About Performance Problems
Most performance problems these days are consequences of architecture, not coding -- making too many database calls or serializing everything to XML back and forth a million times. These processes are usually going on outside the code you wrote and look at every day, but they are really the source of performance problems. So if you just go by what you're familiar with, you'll be looking for your keys in the kitchen. This is a mistake that developers have always been subject to, and the more complex the application, the more it depends on code you didn't write. Hence, the more likely it is that the problem is outside of your code. Performance analysis is much harder in the Java programming language than it was in C, where it is more straightforward, because C bears a significant similarity to assembly language. The mapping from C code to machine code is fairly direct. To the extent that it isn't, you can ask the compiler to show you the machine code. Java applications don't work like C. The runtime constantly modifies the code based on changing conditions and observations. It starts out interpreting the code and then compiles it. It may invalidate the compiled code and recompile it based on information from profiling data or from loading other classes. As a result, the performance characteristics of your code will vary dramatically depending on the environment the code runs in. That makes it harder to say "This code is faster than that code" because you have to account for more context to make a reasonable performance analysis. There are also nondeterministic factors such as the timing and nature of compilation, the interaction of the loaded classes, and garbage collection. So it's harder to do the kind of microperformance optimization with Java code that one can do in C. At the same time, the fact that the compilation is done at execution time means that the optimizer has far more information to work with than the C compiler does. It knows what classes are loaded and how the method being compiled has actually been used. As a result, it can make far better optimization decisions than a static compiler could. This is great for performance but means it's harder to predict the performance of a given block of code. "Write Dumb Code"
So clean, dumb code often runs faster than really clever code, contrary to what developing in C might have taught us. In C, clever source code turns into the expected idiom at the machine-code level, but it doesn't work that way in Java applications. I'm not saying that the Java compiler is too dumb to translate clever code into the appropriate machine code. It actually optimizes Java code more effectively than does C. My advice is this: Write simple straightforward code and then, if the performance is still not "good enough", optimize. But implicit in the concept of "good enough" is that you need to have clear performance metrics. Without them, you'll never know when you're done optimizing. You'll also need a realistic, repeatable, testing program in place to determine if you're meeting your metrics. Once you can test the performance of your program under actual operating conditions, then it's OK to start tweaking, because you'll know if your tweaks are helping or not. But assuming "Oh, gee, I think if I change this, it will go faster" is usually counterproductive in Java programming. Because Java code is dynamically compiled, realistic testing conditions are crucial. If you take a class out of context, it will be compiled differently than it will in your application, which means performance must be measured under realistic conditions. So performance metrics should be tied to indices that have business value -- transactions per second, mean service time, worst-case latency -- factors that your customers will perceive. Focusing on performance characteristics at the micro level is often misleading and difficult to test, because it's hard to make a realistic test case for some small bit of code that you've taken out of context. Too Much XML
Abstractions are great at helping us wrap our heads around complicated problems, because they allow us to restrict ourselves to thinking about one part at a time. But abstraction mechanisms often have costs that we overlook when we focus on system design. So using XML as an interchange format is great for integrating disparate systems. But is the performance cost acceptable when we use it as a generic serialization mechanism? Similarly, remote method calls are also convenient, but can we justify the performance cost in business terms? Sometimes the answer is a resounding yes and sometimes not, but the abstraction barriers invite us to not think about the performance implications of architectural decisions sufficiently. Then, when we do encounter performance problems, we often fall back on tweaking the code -- because the light is better there. Moving From C to Java Programming
I try to convince them that by relinquishing some control, they'll get huge productivity and reliability benefits -- and maybe even better performance as well. Some programmers see the trade-off and think it's great, while others resist it and code Java programs as if they were coding in C, thereby getting the benefits of neither. The Java language is not just a syntax: There's a design philosophy that goes with managed languages. To get the benefit of Java programming, you have to understand that you are not just programming in C with a different syntax. Advice for Beginners: Learn the Class Libraries
My advice is to take the time to understand what the class libraries can do for you. You don't have to understand the details of every little feature, but spend some time absorbing the spectrum of what they can do -- because they can make you more productive and make your programs smaller, more reliable, and easier to read and maintain. Experienced Java programmers would do well to learn what's new with each version of the platform, because each version contains library enhancements that can make their job easier. Java SE 6 Performance Improvements
It is possible for Java code to be faster than C. For example, allocation in the Java language is already much faster than it is in C. Java programming enables optimizations not possible in C because C leaves so many important factors, such as allocation and thread management, to libraries. Ironically, it's the bit-level control over pointers, which most C programmers see as their most powerful weapon, that cripples the C compiler's ability to optimize effectively. By giving up that bit of control, you enable a wealth of optimizations that are not possible in C -- and the Java compiler knows more about optimization than 99.99 percent of programmers do. Java software has always had the potential to be faster than C. The performance improvements in Java SE 6 make it clear that we're heading toward that goal. We're just beginning to apply the optimizations coming out of the Java compiler research community to production languages. If I could wave a magic wand and send out one message about Java programming, it would be this: Trust the JVM. * It's smarter than you think. Stop trying to outwit or outsmart it. Tell it what you want, and it will do its damnedest to make your application run as fast as it can. The Myth of Expensive Object Allocation
In order to get the appearance of garbage collection in C++, you use reference counting -- which requires overloading a number of operators, adding overhead to many operations -- plus, you still use The garbage collector in contemporary JVMs doesn't touch most garbage at all. In the most common collection scenario, the JVM figures out what objects are live and deals with them exclusively -- and most objects die young. So by the time they get to garbage collection, most objects that have been allocated since the last garbage collection are already dead. The garbage collector avoids a lot of work it would have to do if it were doing it one piece at a time. Similarly, the JVM can optimize away many object allocations. These are just a few examples of how moving memory management out of the libraries and into the platform enables huge performance improvements. Java SE 6: A Compelling Business Proposition
Another huge and immediate benefit of Java SE 6 are the monitoring and management improvements that give developers more insight into what's going on in their application. You can dynamically load a monitoring agent into a running application and analyze your application without having to restart it. All these features make the lives of deployers and system administrators easier. JSR 223: Scripting Integration
The proof-of-concept implementation is the JavaScript interpreter engine that's shipping with the JVM, based on the Mozilla Rhino implementation of JavaScript technology. You can call back and forth between JavaScript and Java applications and use the Java class libraries in JavaScript. JSR 223 offers a very nice integration between the Java environments and PHP, Ruby, JavaScript, or whatever your favorite scripting language is. Because of the tight integration, developers are no longer forced into an either-or choice. They don't have to choose between writing in the Java language or in, say, Python. By making it so easy to call back and forth between Java code and scripting code, you get the best of both worlds. You can develop most of an application in Java code and develop pieces in Python, which may enhance productivity because you can use the right tool at the right point. It also opens the door to a new generation of extensible applications that allow us to ship applications that have extension points that call out to scripts. So instead of customizing applications by configuring some horrible XML configuration file, which is often underdocumented and hard to do, we'll configure and customize applications by running scripts when needed. We'll insert behavior into these applications as opposed to controlling whatever options the environment gives us. Applications will be more extensible, easier to customize, and easier for developers to deliver, because we won't have to anticipate everything that every customer might want to do with an application. We can simply provide them with an extension mechanism. This will benefit developers and customers. There's a lot of talk these days about other languages on the JVM, like Scala, which can run on both the JVM and CLR virtual machine, and JRuby, which is Ruby on the JVM. These languages offer productivity advantages in certain programming domains. Integrating scripting languages with Java applications offers the best of both worlds. * As used on this web site, the terms "Java Virtual Machine" or "JVM" mean a virtual machine for the Java platform. See Also
Brian Goetz Home Page |
| |||||||||||||||||||||||||||||||||||||
|
| ||||||||||||