One of the more anticipated talks at the 2008 JavaOne conference took place on Tuesday evening: author Brian Goetz's "Let's Resync: What's New for Concurrency on the Java Platform, Standard." This session discussed some of the advancements made in JSR 166, Concurrency Utilities, which was initiated by Java technology luminary Doug Lea, in light of the upcoming JDK 7 release.
Goetz began by pointing out that many computing tasks are parallelizable, even some that are not obvious. For example, even something as simple as an SQL query contains two phases that are good candidates for parallelization: the plan selection and the post-processing phases of the query, both of which are very CPU intensive.
In order to understand parallelization better, it helps to use a real-world analogue. Let's assume that you're the web site boss and you have four workers under you: Ed, Bob, Dana, and Jill. At the same time, management tells you that you have to build a new web site on a hot new technology like JavaFX.
The traditional approach would be to split the work into an array of equal chunks -- 25% each -- and assign them to each of the four workers. That works well as long as each of the four gets the job done in the same amount of time. You can combine their efforts at the end to produce the finished result.
But what happens if Bob is a total slacker? That leaves Ed, Dana, and Jill sitting idle while Bob is still trying to finish. In the world of parallel programming, that's a big problem, as you never want any of your processors to be idle until the task is done.
For the same reason, Goetz points out that a manual division using a thread pool in J2SE 5.0 is undesirable. The clue to solving this problem more efficiently comes in knowing that you can split the work into chunks smaller than 25% and assign them to processors as they become available. This technique forms the basis of the fork-join framework.
Put succinctly, the fork-join framework supports a style of parallel programming in which problems are recursively split into smaller fragments, solved in parallel and recombined, as shown in this pseudocode:
Result solve(Problem problem) {
if (problem is small enough to quickly handle)
directly solve problem
else {
split problem into independent parts
fork new subtasks to solve each part
join all subtasks
compose result from subresults
}
}
So in the real-world web site example, you could theoretically solve the problem faster by assigning chunks of work to Ed, Bob, Dana, and Jill, where each one either works on his or her part of the task until it's completed or breaks it into smaller chunks to be reassigned around the group. This way, you can assign the next chunk to the first idle worker. Consequently, if Bob falls behind in completing his chunk of work, you can have Dana, Ed, and Jill pick up the slack by assigning more pieces to them.
And even better: with this fork-join technique, even if you hire 100 more workers, you still get a clean benefit of parallelization. In the computing realm, that means that you don't have to do any customization of your code when you're running on anything from a dual-core system to a 256-processor rack. The program automatically adapts itself to run as best it can, no matter how many processors it has.
However, an obvious question then comes up: How small should each chunk be? In other words, what is the threshold where it's no longer worthwhile to divide a task any further? There's no easy answer to that, and it varies based on the task being solved.
In this talk, however, Goetz points out that the fork-join technique by itself may not be the ideal solution. Fork-join tasks are highly CPU bound, and the most efficient techniques try to eliminate context switching in the processors. Hence, the fork-join technique can be augmented with a technique called work stealing.
Returning to the real-world analogue, this simply means that if one worker is idle, the idle worker can help an overloaded worker by "stealing" one of that person's tasks. The computing algorithm is a little more complex, as each processing resource uses a double-ended queue. However, the good news is that many of these tasks can be done for you automatically using the new ParallelArray class being developed for JDK 7.
ParallelArray is a fork-join-enabled data structure that provides a general-purpose API for performing tasks on data sets in a highly concurrent manner. Goetz explains the benefits of ParallelArray in this way: The idea is that a ParallelArray represents a collection of structurally similar data items, and you use the methods on ParallelArray to create a description of how you want to slice and dice the data. You then use the description to actually execute the array operations, which uses the fork-join framework under the hood, in parallel...
"This approach has the effect of letting you declaratively specify data selection, transformation, and post-processing operations, and letting the framework figure out a reasonable parallel execution plan, just as database systems allow you to specify data operations in SQL and hide the mechanics of how the operations are implemented. Several implementations of ParallelArray are available for different data types and sizes, including for arrays of objects and for arrays of various primitives."
In fact, here are some of the basic operations supported by ParallelArray that Goetz presented in the session:
- Filtering: selecting a subset of the elements
- Mapping: convert selected elements to another form
- Replacement: create a new
ParallelArray derived from the original
- Aggregation: combine all values into a single value
- Application: perform an action for each selected element
Another possible feature of JDK 7 that gets pointed out is BGGA closures. One goal of closures is to reduce redundant boilerplate code when working with the ParallelArray class. However, inclusion of this feature in JDK 7 is currently under debate.
If you would like more details on using the new parallelization features that are being developed for JDK 7, be sure to visit the JSR 166(x/y) page at jcp.org.
|