|
The following chapter is excerpted from JavaSpaces Principles, Patterns, and Practice, recently published by Addison-Wesley as part of the Jini Technology Series from Sun Microsystems, Inc.
Note from the authors: The book, JavaSpaces Patterns, and Practices, is a comprehensive guide to the technology, providing details of the JavaSpaces API and numerous examples that teach you how to develop advanced distributed computing applications with the technology. Chapter 1, excerpted here, provides an introduction to the JavaSpaces model and gets you started with space-based programming by showing you how to build a basic "Hello World" application. -- Eric Freeman and Susanne Hupfer
Order this book from: -- Alan Perlis, Epigrams in Programming
The JavaSpaces technology is a new tool for building distributed systems. By providing a high-level coordination mechanism for Java, it significantly eases the burden of creating such systems. JavaSpaces technology is first and foremost designed to be simple: space-based programming requires learning only a handful of operations. At the same time, it is expressive: throughout this book we will see that a large class of distributed problems can be approached using this simple framework. The benefit for you, the developer, is that the combination of these two features can significantly reduce the design effort and code needed to create collaborative and distributed applications. Before getting into the details of the JavaSpaces technology, let's take a look at why you might want to build your next application as a distributed one, as well as some of the trouble spots you might encounter along the way. 1.1 Benefits of Distributed ComputingThe 1980s slogan of Sun Microsystems, Inc., "The Network is the Computer," seems truly prophetic in light of the changes in the Internet and intranets over the last several years. By early in the new millennium, a large class of computational devices--from desktop machines to small appliances and portable devices--will be network-enabled. This trend not only impacts the way we use computers, but also changes the way we create applications for them: distributed applications are becoming the natural way to build software. "Distributed computing" is all about designing and building applications as a set of processes that are distributed across a network of machines and work together as an ensemble to solve a common problem. There are many compelling reasons for building applications this way. Performance: There is a limit to how many cycles you can squeeze out of one CPU. When you've optimized your application and still need better performance, there is only one thing left to do: add another computer. Fortunately, many problems can be decomposed into a number of smaller ones. Once decomposed, we can distribute them over one or more computers to be computed in parallel. In principle, the more computers we add, the faster the job gets done. In reality, adding processors rarely results in perfect speedup (often the overhead of communication gets in our way). Nevertheless, for a large class of problems, adding more machines to the computation can significantly reduce its running time. This class of problems is limited to those problems in which the time spent on communicating tasks and results is small compared to the time spent on computing tasks (in other words the computation/communication ratio is high). We will return to this topic in Chapter 6. Scalability: When we write a standalone application, our computational ability is limited to the power and resources of a single machine. If instead we design the application to work over any number of processors, we not only improve performance, but we also create an application that scales: If the problem is too much work for the team of computers to handle, we simply add another machine to the mix, without having to redesign our application. Our "distributed computing engine" can grow (or shrink) to match the size of the problem. Resource sharing: Data and resources are distributed, just as people are. Some computational resources are expensive (such as supercomputers or sophisticated telescopes) or difficult to redistribute (such as large or proprietary data sets); it isn't feasible for each end user to have local access. With a distributed system, however, we can support and coordinate remote access to such data and services. We could, for instance, build a distributed application that continually collects data from a telescope in California, pipes it to a supercomputer in New York for number crunching, adds the processed data to a large astronomical data set in New Mexico, and at the same time graphs the data on our workstation monitor in Connecticut. Fault tolerance and availability: Nondistributed systems typically have little tolerance for failure; if a standalone application fails, it terminates and remains unavailable until it is restarted. Distributed systems, on the other hand, can tolerate a limited amount of failure, since they are built from multiple, independent processes--if some fail, others can continue. By designing a distributed application carefully, we can reduce "down time" and maximize its availability. Elegance: For many problems, software solutions are most naturally and easily expressed as distributed systems. Solutions often resemble the dynamics of an organization (many processes working asynchronously and coordinating) more than the following of a recipe (one process following step-by-step instructions). This shouldn't be surprising, since the world at large, along with most of its organizations, is a distributed system. Instructing a single worker to sequentially assemble a car or run a government is the wrong approach; the worker would be overly complex and hard to maintain. These activities are better carried out by specialists that can handle specific parts of the larger job. In general, it is often simpler and more elegant to specify a design as a set of relatively independent services that individual processes can provide--in other words, as a distributed system. 1.2 Challenges of Distributed ComputingDespite their benefits, distributed applications can be notoriously difficult to design, build, and debug. The distributed environment introduces many complexities that aren't concerns when writing standalone applications. Perhaps the most obvious complexity is the variety of machine architectures and software platforms over which a distributed application must commonly execute. In the past, this heterogeneity problem has thwarted the development and proliferation of distributed applications: developing an application entailed porting it to every platform it would run on, as well as managing the distribution of platform-specific code to each machine. More recently, the Java virtual machine has eased this burden by providing automatic loading of class files across a network, along with a common virtual machine that runs on most platforms and allows applications to achieve "Write once, Run anywhere" status. The realities of a networked environment present many challenges beyond heterogeneity. By their very nature, distributed applications are built from multiple (potentially faulty) components that communicate over (potentially slow and unreliable) network links. These characteristics force us to address issues such as latency, synchronization, and partial failure that simply don't occur in standalone applications. These issues have an significant impact on distributed application design and development. Let's take a closer look at each one: Latency: In order to collaborate, processes in a distributed application need to communicate. Unfortunately, over networks, communication can take a long time relative to the speed of processors. This time lag, called latency, is typically several orders of magnitude greater than communication time between local processes on the same machine. As much as we'd like to sweep this disparity under the rug, ignoring it is likely to lead to poor application performance. As a designer, you must account for latency in order to write efficient applications. Synchronization: To cooperate with each other, processes in a distributed application need not only to communicate, but also to synchronize their actions. For example, a distribute algorithm might require processes to work in lock step--all need to complete one phase of an algorithm before proceeding to the next phase. Processes also need to synchronize (essentially, wait their turn) in accessing and updating shared data. Synchronizing distributed processes is challenging, since the processes are truly asynchronous--running independently at their own pace and communicating, without any centralized controller. Synchronization is an important consideration in distributed application design. Partial failure: Perhaps the greatest challenge you will face when developing distributed systems is partial failure: the longer an application runs and the more processes it includes, the more likely it is that one or more components will fail or become disconnected from the execution (due to machine crashes or network problems). From the perspective of other participants in a distributed computation, a failed process is simply "missing in action," and the reasons for failure can't be determined. Of course, in the case of a standalone application, partial failure is not an issue--if a single component fails, then the entire computation fails, and we either restart the application or reboot the machine. A distributed system, on the other hand, must be able to adapt gracefully in the face of partial failure, and it is your job as the designer to ensure that an application maintains a consistent global state (a tricky business). These challenges are often difficult to overcome and can consume a significant amount of time in any distributed programming project. These difficulties extend beyond design and initial development; they can plague a project with bugs that are difficult to diagnose. We'll spend a fair amount of time in this book discussing features and techniques the JavaSpaces technology gives us for approaching these challenges, but first we need to lay a bit of groundwork. 1.3 What Is JavaSpaces Technology?JavaSpaces technology is a high-level coordination tool for gluing processes together into a distributed application. It is a departure from conventional distributed tools, which rely on passing messages between processes or invoking methods on remote objects. JavaSpaces technology provides a fundamentally different programming model that views an application as a collection of processes cooperating via the flow of objects into and out of one or more spaces. This space-based model of distributed computing has its roots in the Linda coordination language developed by Dr. David Gelernter at Yale University. We provide several references to this work in Chapter 12. A space is a shared, network-accessible repository for objects. Processes use the repository as a persistent object storage and exchange mechanism; instead of communicating directly, they coordinate by exchanging objects through spaces. As shown in Figure 1.1, processes perform simple operations to write new objects into a space, take objects from a space, or read (make a copy of) objects in a space. When taking or reading objects, processes use a simple value-matching lookup to find the objects that matter to them. If a matching object isn't found immediately, then a process can wait until one arrives. Unlike conventional object stores, processes don't modify objects in the space or invoke their methods directly--while there, objects are just passive data. To modify an object, a process must explicitly remove it, update it, and reinsert it into the space.
Figure 1.1. Processes use spaces and simple operations to coordinate. To build space-based applications, we design distributed data structures and distributed protocols that operate over them. A distributed data structure is made up of multiple objects that are stored in one or more spaces. For example, an ordered list of items might be represented by a set of objects, each of which holds the value and position of a single list item. Representing data as a collection of objects in a shared space allows multiple processes to concurrently access and modify the data structure. Distributed protocols define the way participants in an application share and modify these data structures in a coordinated way. For example, if our ordered list represents a queue of printing tasks for multiple printers, then our protocol must specify the way printers coordinate with each other to avoid duplicating efforts. Our protocol must also handle errors: otherwise a jammed printer, for example, could cause many users to wait unnecessarily for jobs to complete, even though other printers may be available. While this is a simple example, it is representative of many of the issues that crop up in more advanced distributed protocols. Distributed protocols written using spaces have the advantage of being loosely coupled: because processes interact indirectly through a space (and not directly with other processes), data senders and receivers aren't required to know each other's identities or even to be active at the same time. Conventional network tools require that all messages be sent to a particular process (who), on a particular machine (where), at a particular time (when). Instead, using a JavaSpaces system, we can write an object into a space with the expectation that someone, somewhere, at some time, will take the object and make use of it according to the distributed protocol. Uncoupling senders and receivers leads to protocols that are simple, flexible, and reliable. For instance, in our printing example, we can drop printing requests into the space without specifying a particular printer or worrying about which printers are up and running, since any free printer can pick up a task. The JavaSpaces technology's shared, persistent object store encourages the use of distributed data structures, and its loosely coupled nature simplifies the development of distributed protocols. These topics form the major theme of this book--before diving in and building our first space-based application, let's get a better idea of the key features of the technology and how spaces can be used for a variety of distributed and collaborative applications. 1.3.1 Key FeaturesThe JavaSpaces programming interface is simple, to the point of being minimal: applications interact with a space through a handful of operations. On the one hand, this is good--it minimizes the number of operations you need to learn before writing real applications. On the other hand, it begs the question: how can we do such powerful things with only a few operations? The answer lies in the space itself, which provides a unique set of key features: Spaces are shared: Spaces are network-accessible "shared memories" that many remote processes can interact with concurrently. A space itself handles the details of concurrent access, leaving you to focus on the design of your clients and the protocols between them. The "shared memory" also allows multiple processes to simultaneously build and access distributed data structures, using objects as building blocks. Distributed data structures will be a major theme of Chapter 3. Spaces are persistent: Spaces provide reliable storage for objects. Once stored in the space, an object will remain there until a process explicitly removes it. Processes can also specify a "lease" time for an object, after which it will be automatically destroyed and removed from the space (we will cover leases in detail in Chapter 7). Because objects are persistent, they may outlive the processes that created them, remaining in the space even after the processes have terminated. This property is significant and necessary for supporting uncoupled protocols between processes. Persistence allows processes to communicate even if they run at non-overlapping times. For example, we can build a distributed "chat" application that stores messages as persistent objects in the space and allows processes to carry on a conversation even if they are never around at the same time (similar to email or voice mail). Object persistence can also be used to store preference information for an application between invocations--even if the application is run from a different location on the network each time.
Spaces are associative: Objects in a space are located via
associative lookup, rather than by memory location or by identifier.
Associative lookup provides a simple means of finding the objects you're
interested in according to their content, without having to know what the object
is called, who has it, who created it, or where it is stored. To look up an
object, we create a template (an object with some or all of its fields set to
specific values, and the others left as Spaces are transactionally secure: The JavaSpaces technology provides a transaction model that ensures that an operation on a space is atomic (either the operation is applied, or it isn't). Transactions are supported for single operations on a single space, as well as multiple operations over one or more spaces (either all the operations are applied, or none are). As we will see in Chapter 9, transactions are an important way to deal with partial failure. Spaces allow us to exchange executable content: While in the space, objects are just passive data--we can't modify them or invoke their methods. However, when we read or take an object from a space, a local copy of the object is created. Like any other local object we can modify its public fields as well as invoke its methods, even if we've never seen an object like it before. This capability gives us a powerful mechanism for extending the behavior of our applications through a space. 1.3.2 JavaSpaces Technology in ContextTo give you a sense of how distributed applications can be modeled as objects flowing into and out of spaces, let's look at a few simple use scenarios. Consider a space that has been set up to act as an "auction room" through which buyers and sellers interact. Sellers deposit for-sale items with descriptions and asking prices (in the form of objects) into the space. Buyers monitor the space for items that interest them, and whenever they find some, they write bid objects into the space. In turn, sellers monitor the space for bids on their offerings and keep track of the highest bidders; when an item's sale period expires, the seller marks the object as "sold" and writes it back into the space (or perhaps into the winning buyer's space) to close the sale. Now consider a computer animation production house. To produce an animation sequence, computer artists create a model that must then be rendered for every frame of a scene (a compute-intensive job). The rendering is often performed by a network of expensive graphics workstations. Using the JavaSpaces technology, a series of tasks--for instance, one task per frame that needs to be rendered--are written into the space. Each participating graphics workstation searches the space for a rendering task, removes it, executes it, drops the result back into the space and continues looking for more tasks. This approach scales transparently: it works the same way whether there are ten graphics workstations available or a thousand. Furthermore, the approach "load balances" dynamically: each worker picks up exactly as much work as it can handle, and if new tasks get added to the space (say another animator deposits tasks), workers will begin to compute tasks from both animation sequences. Last, consider a simple multiuser chat system. A space can serve as a "chat area" that holds all the messages making up a discussion. To "talk," a participant deposits message objects into the space. All chat members wait for new message objects to appear, read them, and display their contents. The list of attendees can also be kept in the space and gets updated whenever someone joins or leaves the conversation. Late arrivals can examine the existing message objects in the space to review previous discussion. In fact, since the space is persistent, a new participant can view the discussion long after everyone else has gone away, and participants can even come back much later to pick up the conversation where they left off. These examples illustrate some of the possible uses of spaces, from workflow systems, to parallel compute servers, to collaborative systems. While they leave lots of details to the imagination (such as how we achieve ordering on chat messages) we'll fill them in later in the book. 1.4 JavaSpaces Technology OverviewNow we are going to dive into our first example by building the obligatory "Hello World" application. Our aim here is to introduce you to the JavaSpaces programming interface, but we will save the nitty-gritty details for the next chapter. We are going to step through the construction of the application piece by piece, and then, once it is all together, make it a little more interesting. 1.4.1 Entries and Operations
A space stores entries. An entry is a collection of typed objects that
implements the
We can instantiate a
Message msg = new Message();
msg.content = "Hello World";
With an entry in hand,
we can interact with a space using a few basic operations:
JavaSpace space =
SpaceAccessor.getSpace();
space.write(msg, null, Lease.FOREVER);
Here we call the
Now that our entry exists in the space, any process with access to the
space can read it. To read an entry we use a template, which is an
entry that may have one or more of its fields set to
Message template = new Message();
That was easy. It is
important to point out that the
Message result = (Message)space.read(
template,
null, Long.MAX_VALUE);
Because the template's
System.out.println(result.content);
Sure enough, we get:
Hello World
For the sake of completeness,the
Message result = (Message)space.take(
template,
null, Long.MAX_VALUE);
We would see the same output as before. However, in this case, the entry would have been removed from the space. So, in just a few steps we've written a basic space-based "Hello World" program. Let's pull all these code fragments together into a complete application:
In this code we've kept things simple by wrapping the code in a
Before moving on, let's step back a bit--with a small bit of simple code,
we've managed to send a message using spaces. Our 1.4.2 Going FurtherLet's take our example and make it a little more interesting. In doing so, you'll get a glimpse of the key features that make the JavaSpaces technology an ideal tool for building distributed applications.
We'll begin by modifying the
We've added an Note that in all our examples, we've been violating a common practice of object-oriented programming by declaring our entry fields to be public. In fact, fields of an entry must be public in order to be useful; if they are instead declared private or protected, then processes that take or read the entry from a space won't be able to access their values. We'll return to this subject in the next chapter and explain it more thoroughly.
Now let's modify the
Following along in our
Now things become more interesting: we enter a
So, let's write a
Just as in the
So let's now run
Let's trace through the
whole scenario to understand exactly what has happened. First, we started
up
Our output indicates that, by the second time 1.5 Putting It All Together
Even though our "Hello World" example is simple, it demonstrates the key
features of space-based programming and ties together many of the topics we've
covered in this chapter. JavaSpaces technology is simple and expressive: with
very little code (and only four lines that contain JavaSpace operations) we've
implemented a simple distributed application that provides concurrent access to
a shared resource (in this case a shared object). Because spaces provide a
high-level coordination mechanism, we didn't need to worry about multithreaded
server implementation, low-level synchronization issues, or network
communication protocols--usual requirements of distributed application design.
Instead, our example concretely illustrates what we said earlier in this
chapter--that we build space-based applications by designing distributed data
structures along with distributed protocols that operate over them.
Note that the protocol is loosely coupled--
In our example, processes use entries to exchange not only data (a counter)
but also behavior. Processes that create entries also supply proper methods of
dealing with them, removing that burden from the processes that look up the
entries. When a
Our distributed protocol also relies on synchronization. Without coordinated
access to a shared resource--in this case, the counter--there would be no way to
ensure that only one process at a time has access to it, and processes could
inadvertently corrupt it by overwriting each other's changes. Here, to alter an
entry, a process must remove it, modify it, and then return it to the space.
While the process holds the entry locally, no other processes can access or
update it. Transactional security of spaces also plays a key part in
guaranteeing this exclusive access: If a process succeeds at a
This isn't to say our simple example covers everything. Although we can trust
a single space operation to be transactionally secure (either it completes or it
doesn't), there is nothing in our current example to prevent the
1.6 Advantages of JavaSpaces TechnologiesWe hope that in this introduction you've gained a sense for why you might want to build your next distributed application using spaces. If your application can be modeled as a flow of objects into and out of spaces (as many can), then the JavaSpaces technology offers a number of compelling advantages over other network-based software tools and libraries:
1.7 Chapter PreviewThis book is about building distributed and collaborative applications with the JavaSpaces technology. As with any programming methodology, a number of general principles and patterns have emerged from the use of spaces, and we will spend the bulk of this book covering them. Our aim is to help you explore new ways of thinking about, designing, and building distributed applications with spaces (and in short order so that you can quickly begin to create your own distributed applications). The following is a roadmap to what you'll find as you make your way through this book: ChaptersChapter 2--JavaSpaces Application Basics--lays the foundation you will need to understand and experiment with the examples in the rest of the book. In a tutorial style, we cover the mechanics of creating a space-based application and introduce the syntax and semantics of the JavaSpaces API and class library. Chapter 3--Building Blocks--presents basic "distributed data structures" that recur in space-based applications, and describes common paradigms for using them. Code segments are given to illustrate the examples, which include shared variables, bags, and indexed structures. This chapter lays the foundation for the next two: Synchronization and Communication. Chapter 4--Synchronization--builds upon Chapter 3 and describes techniques for synchronizing the actions of multiple processes. We start with the simple idea of a space-based semaphore and incrementally present more complex examples of synchronization, from sharing resources fairly, to controlling a group in lockstep, to managing multiple readers and writers. Chapter 5--Communication--also builds upon Chapter 3 and describes common communication patterns that can be created using distributed data structures. We first introduce space-based message passing and then explore the principles behind space-based communication (which provides a number of advantages over conventional communication libraries). We then present a "channel" as a basic distributed data structure that can be used for many common communication patterns. Chapter 6--Application Patterns--introduces several common application patterns that are used in space-based programming, including the replicated-worker pattern, the command pattern, and the marketplace pattern. In each case, we develop a simple example application that makes use of the pattern. We also provide a general discussion of more ad hoc patterns. Chapter 7--Leases--begins the book's coverage of more advanced topics. Spaces use leases as a means of allocating resources for a fixed period of time. This chapter explores how to manipulate and manage the leases created from writing entries into a space. The techniques covered for managing leases are also applicable to distributed events and transactions, which are covered in the next two chapters. Chapter 8--Distributed Events--introduces the Jini distributed event model and shows how applications can make use of remote events in conjunction with spaces. Chapter 9--Transactions--introduces the idea of a transaction as a tool for counteracting the effects of partial failure in distributed applications. This chapter covers the mechanics as well as the semantics of using transactions. Chapter 10--A Collaborative Application--explores the creation of a distributed interactive messenger service using spaces. This collaborative application makes use of the full JavaSpaces API, and also some of the advanced topics encountered in previous chapters, namely leases, events, and transactions. Chapter 11--A Parallel Application--explores parallel computing with spaces. We first building a simple compute server and then a parallel application that runs on top of it. Both are used to explore issues that arise when developing space-based parallel applications. Like the collaborative application, in this chapter we make full use of the JavaSpaces API and its advanced features. Chapter 12--Further Exploration--provides a set of references (historical and current) that you can use as a basis for further exploration. Appendices A, B, and C--contain the official Jini Entry Specification, Jini Entry Utilities Specification, and JavaSpaces Specification written by the Jini product team at Sun Microsystems, Inc. Online SupplementThe online supplement to this book can be accessed at the World Wide Web site http://java.sun.com/docs/books/jini/javaspaces. The supplement includes the following:
1.8 Exercises
1As used on this web site, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform. About the AuthorsEric Freeman is co-founder and CTO of Mirror Worlds Technologies, a Java and Jini-based software company. Dr. Freeman previously worked at Yale University on space-based systems, and is a Fellow at Yale's Center for Internet Studies. Susanne Hupfer is Director of Product Development for Mirror Worlds Technologies and a fellow of the Yale University Center for Internet Studies. Dr. Hupfer previously taught Java network programming as an Assistant Professor of Computer Science at Trinity College. Ken Arnold is the lead engineer of the JavaSpaces product at Sun. He is one of the original architects of the Jini platform and is co-author of The Java Programming Language, Second Edition. | ||||||||||
Oracle is reviewing the Sun product roadmap and will provide guidance to customers in accordance with Oracle's standard product communication policies. Any resulting features and timing of release of such features as determined by Oracle's review of roadmaps, are at the sole discretion of Oracle. All product roadmap information, whether communicated by Sun Microsystems or by Oracle, does not represent a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. It is intended for information purposes only, and may not be incorporated into any contract.
|
| ||||||||||||