Sun Java Solaris Communities My SDN Account
 
Article

High Availability for J2EE Platform-Based Applications

 
 

Topics:

  • Executive Summary: The Tests and Results at a Glance
  • Best Practices for Developers Using J2EE Technology
  • Availability for the J2EE Platform
    • Categorizing Clusters
    • Clustering Methodologies
  • Availability Test Harness
  • Methodology
  • Results
    • A Single J2EE Platform-Based Server
    • Separate Tiers for JSP and EJB Technology-Based Components
    • A Load-Balanced Cluster of J2EE Platform-Based Servers
    • Web Servers Load Balancing to a Cluster of J2EE Platform-Based Servers
  • Conclusion
  • Table of Results
  • Watchdog Script
  • Resources

Executive Summary: The Tests and Results at a Glance

The focus of our research was high availability for the Java 2 Platform, Enterprise Edition. We applied a series of benchmark tests to four server configurations, using an industry-standard workload for the J2EE platform.

Based on our data, the J2EE platform-based server was able to recover from a single node failure in approximately 1 minute, and the Web server recovered in less than 30 seconds. In a clustered J2EE platform-based environment, some clients had a disruption of service after server failure, but this was minimized by using a multitiered approach for the Web and J2EE platform-based server(s). To achieve high availability, failover of the Web server was required.

Best Practices for Developers Using J2EE Technology

Middleware plays a crucial role in delivering multitiered enterprise applications. It is critical that the middleware used is both highly scalable and highly available. This Developer's Notebook describes experiments conducted by Sun engineers to provide best practices information on J2EE platform availability for the architects and developers who use that platform.

The J2EE platform defines a standard for developing portable, multitiered enterprise applications. The platform can simplify enterprise application development and deployment by basing applications on standardized, modular components; by providing a complete set of services to those components; and by handling many details of application behavior automatically without complex programming.

Vendors of J2EE platform-based servers add value by providing services that are not part of that standard. While some of these services promote vendor lock-in and reduce portability, others, like clustering, can add value.

High-availability features, such as clustering, are becoming increasingly important. Most vendors of products based on J2EE technology provide some support for clustering J2EE platform-based servers.

The following questions arise:

  • Can we rely on high-availability services provided by J2EE platform-based servers?

  • How difficult is it to set up and maintain J2EE platform-based servers?

  • Are vendor-specific changes required to make applications based on J2EE technology highly available?

  • What is the best way to configure the J2EE platform for high availability?

First, we present an overview of the availability features provided by leading J2EE platform-based servers. We then discuss the goals of this research and the strategy used for testing. Finally, we present the findings of the preliminary testing on J2EE platform-based servers.

Availability for the J2EE Platform

No official definition exists for a J2EE platform-based cluster, and each vendor of products for this platform has a different implementation. In this article, we are referring to a set of J2EE platform-based application servers working together to provide high availability and scalability for enterprise applications.

Categorizing Clusters

Generally, clusters can be categorized as "shared nothing" or as "shared disk." In shared nothing, all nodes are independent. These clusters add manageability overhead. Shared disk clusters have a single storage device that all J2EE platform-based servers in the cluster use to load applications. This reduces maintenance, but having the shared file system highly available requires the use of devices such as RAID, storage area networks (SANs), or network-attached storage (NAS).

Clustering Methodologies

In most J2EE platform-based servers, clustering is provided at three levels:

  • HttpSession
  • Enterprise JavaBeans (EJB) components
  • Java Message Service (JMS) technology

In the J2EE 1.2 specification, JMS is not mandatory; thus, our investigation focused on the implementations of HttpSession and the clustering of EJB components.

HttpSession Clustering
Clustering HttpSessions provides Web clients with the ability to seamlessly retrieve session state from a secondary J2EE platform-based server in the event of a failure. Vendors implement HttpSession clustering in a number of ways:

  • In-memory -- Replication is usually implemented using IP multicast and/or a TCP socket. As objects are added to the HttpSession, they are serialized and sent over the wire to one or more backup servers.

  • A centralized database or file system -- Whenever a state change occurs, the entire HttpSession object is written to a persistent data store.

  • Centralized state servers -- This is a similar method to in-memory replication, but all nodes write state information to a dedicated state server.

Clustering of EJB Components
Almost all J2EE platform-based servers provide support for clustering EJB components. The biggest differentiator is support for automatic failover. Some servers do not provide it, while others allow failover of stateful session beans by using in-memory state replication.

The clustering of EJB components is usually implemented by replica-aware stubs that are generated at deployment or runtime. The stubs are aware of all the servers in the cluster and may use a load-balancing algorithm to determine where to retrieve the objects. The way that a stub works depends on the type of EJB technology.

Due to the stateless nature of stateless session beans, replica-aware objects are generally free to route requests to any server in the cluster. Some application servers allow automatic failover of calls to stateless session beans, but only when the effect is the same whether a method is called once or multiple times. In this scenario, the methods in the stateless session beans must be marked as being idempotent at deployment.

For entity beans and stateful session beans, failover and load balancing are usually supported at the "EJBHome" level. Due to the nature of entity beans, state need not be replicated across nodes; it is read from the database at the beginning of each transaction and is written at the end of the transaction.

Availability Test Harness

EJB components from ECperf software were used as the basis of the benchmark application that would stress most components of the J2EE platform and place a heavy load on the system. ECperf software provided a complete workload for testing the scalability and availability of EJB technology-based containers. It has a set of interoperating EJB components and a Java application to drive them. It also has a Web interface to the EJB components by means of JavaServer Pages (JSP) technology. As most EJB technology-based applications are accessed through JSP framework-based components and servlets, the workload for the ECperf software was driven through sample JSP pages. To do this, an HTTP load generator was used.

"MDELoad" is a Java platform-based HTTP load generator. Through the use of scripts, MDELoad enabled static HTTP loads to be generated, although it did not allow the response from one request to be used as the basis of the next request. To make the load generator more dynamic, we added the following functionality:

  • The ability to parse the HTML response and extract information

  • Interfaces to allow the plug-in of arbitrary load classes

  • Classes that mimic the behavior of the driver for the ECperf software, except that they drive JSP, not EJB, technology-based components

  • The logging of all interactions between client(s) and server(s), including the time of interaction, Web page visited, data sent to the server, and errors

We further enhanced MDELoad through the use of shell scripts, to perform end-to-end availability testing for the J2EE platform. As shown in Figure 1, the functions performed by the shell scripts included:

  • Collecting system statistics

  • Performing pre- and post-run database checks

  • Starting, failing, and recovering J2EE platform-based server(s) and/or Web server(s)

  • Determining time and impact of failure

  • Calculating recovery time

  • Checking for in-flight transactions

  • Summarizing the run
Figure 1: Availability Test Kit
Figure 1: Availability Test Kit
(Click to enlarge.)

Methodology

A summary of the methodology used while conducting experiments follows:

  1. Assemble server configuration.
  2. Deploy ECperf software into J2EE platform-based server(s).
  3. Run under load.
  4. Introduce failure.
  5. Recover and document test results.
  6. Look for simplification possibilities.
  7. Make more resilient.
  8. Repeat steps 2 through 7, with more resilient architectures.

Our main objective was to determine how to deploy applications based on J2EE technology in the simplest yet most highly available fashion. To begin, we configured a single server. We ran the system under heavy load and recorded average response times, throughput, CPU utilization, and so on. We then introduced failures to assess the implications on all tiers.

Results

This section discusses the hardware and software configurations tested. First, we present an overview of the hardware platform used for testing. We then discuss each experiment and its results.

For the first three experiments, all tiers were deployed on a single 24-CPU Sun Enterprise 6500 (E6500) server, running the Solaris 8 Operating Environment and Java 2 SDK, version 1.3.0. The E6500 was configured such that the J2EE platform-based server(s) ran in a processor set separate from the database and other processes.

In the final experiment, four systems were used for the Web/J2EE platform tier. The database was run on a separate system. Each server was dedicated to running only one Web server or J2EE platform-based server.

The following configurations were tested:

  1. A single J2EE platform-based application server
  2. Separate tiers for JSP and EJB technology-based components, without clustering
  3. A cluster of two J2EE platform-based servers
  4. Web servers load balancing to a cluster of J2EE platform-based servers

A Single J2EE Platform-Based Server

Configuration
The J2EE platform-based server ran in a processor set that contained eight CPUs. The database ran on the same machine as the J2EE platform-based server, but not in the same processor set (see Figure 2).

Figure 2: J2EE Platform-Based Single-Server Configuration
Figure 2: J2EE Platform-Based Single-Server Configuration
(Click to enlarge.)

The load generator was configured to simulate 60 interactive users for 30 minutes and to fail the J2EE platform-based server after approximately 15 minutes.

Observations: When the J2EE platform-based server failed, all in-memory state was lost. This included HTTP session state and stateful session beans. Transactions failed to complete, so changes were not committed to the database.

Once the server failed, the clients were no longer able to access services provided. If a client had a session in progress, the client received a connection refused or similar error. This did not necessarily mean that the transaction was unsuccessful. In some cases, the transactions were committed prior to the failure. Thus, the client(s) may not have been aware that an order was successful.

Recovery: Using shell scripts, we were able to automate completely the recovery of the J2EE platform-based server. We wrote a simple watchdog script to monitor the process ID of the server. When the process was killed, the script restarted it. This ensured that the server was fully functional within 1 minute of failure.

Client recovery was completely manual. Because of the chance of transactions being committed to the database, the clients had to log back in to the Web site and check the status of any orders submitted immediately prior to failure. As session state was lost, orders not committed to the database were lost completely. (Shopping carts had to be manually recreated.)

Discussion
Although a single-server configuration does not provide failover, it is the simplest way to deploy applications based on J2EE technology. Recovery from failure was completed in less than 1 minute, with some interruption to the user and transactions.

Separate Tiers for JSP and EJB Technology-Based Components
Configuration
The majority of Web sites have both static and dynamic content. Therefore, a common configuration of a J2EE technology-based system is to have the Web server exposed to the Internet and an EJB technology-based container behind a firewall (see Figure 3).

Figure 3: A Separate-Tier Configuration for JSP and EJB Technology
Figure 3: A Separate-Tier Configuration for JSP and EJB Technology
(Click to enlarge.)

We deployed the JSP technology-based components into the Web server and the EJB components into the EJB container. The Web server and EJB technology-based containers ran on the same machine, along with the database. The load generator was configured to run with 30 users for 15 minutes. Failures were introduced approximately 8 minutes into the run. For each run, either the Web server or the EJB technology-based container was killed.

EJB Technology-Based Container Failure
Observations: When the EJB technology-based container failed, any in-memory state was lost. (This affected such things as stateful session beans.) However, HTTP session state was maintained by the Web server and therefore survived. Any transactions not yet committed failed completely. Because the Web server was still running, the clients still had access to static content. However, dynamic content relies on the services of the EJB technology-based container; therefore, requests for such content resulted in exception messages being returned to the client.

Unlike the configuration for a single J2EE platform-based server, clients still had access to the site, but only to static content. Like the single-server configuration, transactions submitted from the client immediately prior to the failure may have been successful without the client knowing.

Recovery: The watchdog script recovered the service automatically in less than 1 minute.

Clients had to log in to the Web site to check the status of orders submitted just prior to service disruption. An advantage of this configuration is that shopping carts stored in HTTP session state were maintained. However, state stored in session beans was destroyed and had to be reentered.

Web Server Failure
Observations: When the Web server failed, the HTTP session state was lost, but stateful session beans survived. Transactions in process on the EJB technology-based container did complete because they were running in a separate process from the Web server.

Clients demonstrated similar behavior to that seen in the configuration for a single J2EE platform-based server, namely, they were no longer able to access the service. It seemed more likely, however, that transactions submitted immediately prior to the failure would be committed to the database. Clients would not know if the transaction completed successfully until they checked when the service restarted.

Recovery: The recovery process for the Web server is the same as for the EJB technology-based container and can be automated with the watchdog script. Recovery of the Web server happens in about 30 seconds, one-half the time required to recover an EJB technology-based container.

Clients demonstrated similar behavior to that seen in the configuration for a single J2EE platform-based server; however, shopping carts stored in the HTTP session state were lost, but state stored in session beans survived (although extra programming was needed to retrieve them). Clients had to log in to the Web site and redo any non-completed transaction.

Discussion
Separating the Web server and EJB component tiers does not provide support for automatic failover; however, presentation and business logic are cleanly separated. This separation allows EJB technology-based applications to remain safely behind the firewall, although the configuration is slightly more complicated.

This setup requires a Java technology-enabled Web server or servlet engine that can handle JSP pages, such as Tomcat, and the Web server must be configured to work with the EJB technology-based container. Also, deploying the application is a bit more time-consuming. In essence, two applications are deployed:

  • An application based on JSP pages or servlets on the Web server
  • An EJB architecture-based application in the EJB technology-based container

A Load-Balanced Cluster of J2EE Platform-Based Servers
Configuration
Two application servers were configured to participate in a cluster, and software provided with the application server was used for load balancing. Both members of the cluster were running on the same machine (see Figure 4).

Figure 4: A Configuration for a Load-Balanced Cluster of J2EE Platform-Based Servers
Figure 4: A Configuration for a Load-Balanced Cluster of J2EE Platform-Based Servers
(Click to enlarge.)

The load generator was configured to run 60 interactive users for a period of about 30 minutes and to fail one of the cluster members approximately 15 minutes into the run.

Observations: J2EE platform-based servers in the cluster replicated session state -- for both HTTP and stateful session beans -- to at least one other member in the cluster. Replication was performed using IP multicast and/or direct socket connections. Thus, in the case of a single server failure, client session state survived. However, transactions in process on the failed cluster member were lost, as there was no automatic failover.

Clients may or may not experience problems with the service. If client transactions are in process at the time of the failure, clients will receive error responses. If transactions are received just as a failure occurs, data findings suggest that approximately 30 percent of clients will notice a disruption of service. Because session state has been replicated between J2EE platform-based servers, clients may be able to continue as if nothing happened. Occasionally a transaction will be successfully committed before the client receives a response from the server.

Recovery: The watchdog script recovers the failed cluster member automatically in less than 1 minute.

If a client experiences a service disruption, the client must check the status of the last transaction. Session state is maintained, and in most cases no action is required by the client.

Discussion
This configuration maintains a quality of service when J2EE platform-based servers fail. As long as all nodes in a replication group do not fail simultaneously, session state is maintained, and client sessions seamlessly fail over to another cluster node.

One problem exists, however: The load balancer is a single point of failure. If the load balancer fails, access to the back-end J2EE platform-based servers is not possible. The next configuration fixes this problem.

Web Servers Load Balancing to a Cluster of J2EE Platform-Based Servers
Configuration
The Web servers were configured so that all requests for dynamic content were redirected to the J2EE platform-based servers. Domain Name System (DNS) round-robin was used to load balance across the Web servers, while proxy plug-ins provided load balancing across the J2EE platform-based servers (see Figure 5).

Figure 5: Configuration of Web Servers Load Balancing to Cluster of J2EE Platform-Based Servers
Figure 5: Configuration of Web Servers Load Balancing to Cluster of J2EE Platform-Based Servers
(Click to enlarge.)

We deployed the JSP pages and EJB components into the cluster of J2EE platform-based servers. We configured the Web servers, with a proxy plug-in, to redirect requests for JSP pages to the back-end cluster of J2EE platform-based servers.

The load generator was configured to run for 10 minutes, with failures introduced 5 minutes into testing for both the Web and J2EE platform-based servers.

Failure of the J2EE Platform-Based Server
Observations: Failure of a J2EE platform-based server yielded results similar to those in the configuration for a load-balanced cluster of J2EE platform-based servers, with one significant difference: Only about 8 percent of clients were affected by the failure, compared to 30 percent in the previous configuration.

Recovery: The recovery process was the same as for the previous configuration.

Web Server Failure
Observations: Failure of a Web server had a significant impact. Because DNS round-robin was used to load balance, a failure of one of the Web servers meant that every other request failed.

Recovery: The failed Web server could be recovered automatically with scripts, in less than 30 seconds.

Discussion
This configuration was perhaps the most resilient. It had no single point of failure. Empirical data suggested that failure of a J2EE platform-based server would cause a disruption of service to a small number of users. On the other hand, Web server failure would mean that one-half of the client requests would fail. This problem was fixed by using a cluster agent, such as Sun Cluster software, or a hardware-based load balancer to perform automatic failover of Web servers.

Conclusion

With the J2EE technology standard being widely adopted for developing Web-based enterprise applications, it has become increasingly important that J2EE platform-based servers be capable of delivering a reliable service. Most vendors of products based on J2EE technology have added value by providing high-availability features, such as clustering.

We have presented our observations from testing a number of topologies. Results of our experiments thus far suggest that a multitiered approach produces the highest availability. To increase availability, automatic failover of Web servers is required.

Table of Results

Confi-
guration
EJB Technology Web Client
  Recovery Time Recovery Steps Recovery Time Recovery Steps Recovery Time Recovery Steps
Single Server 1 minute Automated script Same as for EJB technology Same as for EJB technology 1 minute + time taken to complete steps Check status of trans-
actions. Redo lost trans-
actions.
Tiers for JSP + EJB Technology 1 minute Automated script 30 seconds Automated script Recovery time of Web or EJB technology + time taken to complete steps EJB technology failure, HTTP state maintained; check status of session. Web failure, same as for single server.
Clustered J2EE Platform-
Based Servers
1 minute to recover failed server Automated script Same as for EJB technology Same as for EJB technology Time taken to resubmit 50% of clients may have to redo trans-
action. Session state maintained.
Web Servers + Clustered J2EE Platform-
Based Servers (DNS Round-
Robin)
1 minute to recover failed server Automated Less than 30 seconds Automated Time taken to resubmit If J2EE platform-
based server fails, approxi-
mately 8% of clients will need to resubmit. Session state maintained. If Web server fails, 50% of requests fail.

Watchdog Script

#!/bin/ksh
# Takes a command as an argument and checks if 
# it is still running. If not it restarts it.
 
if [ $# != 1 ]; then
        echo "usage: $0 command"
        exit 1
if
 
while true
do
        PID=`prep $1`
        if [ -z $PID ]; then
                echo "Restarting $1"
                $1 &
        if
        sleep 1
done

See Also

e-docs.bea.com

Oracle Technology Network

Orion Application Server Web site

The ServerSide.com

JavaWorld

java.sun.com

Oracle is reviewing the Sun product roadmap and will provide guidance to customers in accordance with Oracle's standard product communication policies. Any resulting features and timing of release of such features as determined by Oracle's review of roadmaps, are at the sole discretion of Oracle. All product roadmap information, whether communicated by Sun Microsystems or by Oracle, does not represent a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. It is intended for information purposes only, and may not be incorporated into any contract.