Sun Java Solaris Communities My SDN Account Join SDN
 
Technical Articles and Tips

JAIN(tm) SLEE Performance Analysis

 

by Phelim O'Doherty
August 2003

1 Introduction

This document outlines the performance considerations and expectations of the JAIN SLEE specification [1] developed through the Java Community Process [2]. The most important items to highlight when calculating performance for the JSLEE are:

  1. The type of transactions being carried out in the performance test. All performance tests should ensure that transaction executes a read/write operation and the state is committed to disk and replicated, as opposed to simple read transaction. The read/write operation provides the performance overhead within any platform. The new state resulting from the write operation should also be replicated over more than one node. State keeps track of the progression of a session and maintains availability when a specific node fails.
  2. The latency is the delay required to complete an application defined unit of work. For the purposes of JSLEE, latency is the time that an event enters the system, until the time that event is processed. Processing of an event implies the executing of application logic associated with the event, replicating and committing the state changes performed by the application logic.
  3. The fault tolerance guarantees that is in the event that a component fails, a backup component or procedure can immediately take its place with no loss of service.
  4. The high availability guarantee that is the system or component is continuously operational for a desirably long length of time, measured relative to “100%”.
  5. The hardware environment of the platform upon which the performance test is executed.

2 Execution Environment Requirements

The requirements for a high performing execution platform within the core communications network can be categorized under the following sections.

2.1 Throughput Requirements

Throughput requirements define the number of operations a platform must service in a specific timeframe. The typical operator requirement for throughput is the ability to service ‘one million busy hour call attempts’. A busy hour call attempt typically consists of between two and ten events per call attempt each requiring its own unique transaction, for the purpose of this document we will take ‘a mean of five events per call’. The mean throughput requirements translate to ‘1389 transactions per second WITH state updates that are transacted and replicated to survive single node failures’ depending on the service. It is expected that this throughput could be handled on ‘three '2 to 4-way' boxes, i.e. 12 CPUs’. These transaction mean and CPU usage statistics should be used to consider throughput performance of any JSLEE implementation.

2.2 Latency Requirements

Latency requirements define the length of time it takes for an event to enter the system, be processed and update the state changed by that event. Latency is often qualified with a percentage of time (obviously network latency will grow exponentially as time progresses). The common time percentage measurement of latency is 'ninety five percent latency' or ninety five percent of the time. The typical operator requirement for latency is the ability ‘to setup a call under five hundred milliseconds’. Setting up a call typically consists of traversing two to five network nodes with a minimum of two events per node each with a unique transaction, hence between four and ten transactions must be processed in under five hundred milliseconds. Taking a mean of ‘seven transactions’ the average latency requirements should be ‘71.5 milliseconds round trip time per transaction WITH state updates that are transacted and replicated to survive single node failures for 75 percentile’ depending on the service. Taking all variables into consideration the ‘95 percentile requirement for latency should not exceed 200 milliseconds round trip time per transaction’ that is only 5 percent of the latency results should exceed 200milliseconds. These latency statistics should be used to consider network delay of any JSLEE implementation.

2.3 High Availability Requirements

High availability refers to a system or component that is continuously operational for a desirably long length of time. Availability can be measured relative to ‘100% operational’ or ‘never failing’. The typical operator requirement for high availability for a system or product is ‘five 9s’ that is ‘the system must be available 99.999% of the time’. This translates to less than ‘six minutes down-time per year’.

2.4 Hardware Requirements

JSLEE solutions scale horizontally, therefore cost is expected to scale close to linearly regardless of number of nodes and message handling capacity. JSLEE is designed to be deployed on high volume lower cost symmetric multiprocessor systems (SMP), for example 2 to 4-way systems. This enables more machines to be added to the system as more capacity is needed without the upfront investment of more expensive multiprocessor systems such as 8-way systems. More nodes in the system, enables more state replicas thus increasing system availability, for example a system with fewer large nodes, means node failure will take out more capacity. Symmetric multiprocessor systems were chosen over single processor systems to satisfy the need of increased throughput to counter balance the communications overhead associated with messaging systems. In summary, ‘2 to 4-way multiprocessor systems’ are the most price competitive systems for JSLEE architectures taking into consideration communications overhead and processing power per node.

3 References

[1] Sun Microsystems, "JAIN SLEE Specification - JSR 22", 2002. See http://jcp.org/en/jsr/detail?id=22.
[2] Sun Microsystems, “Java Community Process Program - JSR 171”, 2002. See http://jcp.org/en/jsr/detail?id=171.