Introduction Multi-threading applications are a way to scale and meet today's growing business requirements while reducing the number of systems needed. However, a multi-threaded application's scalability is limited by portions of code that cannot run in parallel; these serial components limit scalability, see Amdahl's Law and problems with I/O. Our previous paper Horizontal Scaling on a Vertical System Using Solaris Zones, described a workaround using Zones to scale Xitami/NexSRS by running copies of each in a Zone. Running a copy in each zone improved performance by more than 100% but still was not the solution to the scalability problem with Xitami. Some of the solutions were to either migrate the Long Running Web Process (LRWP) protocol to the Sun Web Server or migrate NexSRS to use Netscape Server API (NSAPI). We decided to try something totally different, implementing the LRWP protocol in Java, running in a web container. GlassFish was open sourced at around this time so we chose GlassFish to try our idea. We expected performance close to Xitami/NexSRS performance on smaller systems and expected to scale well on bigger systems. The implementation was faster than Xitami/NexSRS -- Xitami is a very small web server written in C, and is one of the top 10 web servers. Our implementation was to scale better on bigger CMT systems, but LRWP in Java was faster by 23% on a single core system and by 78% on a 4 core system, showing scalability from a single core to multiple cores while Xitami's scaled to just about 15K CPM on the 4 core system, Fig 1.
Long Running Web Process (LRWP)
LRWP is a protocol used by the Xitami web server to communicate with its peers. Peers are processes that communicate with web clients. Web clients could be browsers or other types of clients communicating over HTTP. LRWP is similar to CGI where a web client makes a request to a cgi-bin/context which allows the web container to invoke a cgi-bin executable, pass on the input from the web client to the executable, and return the output back to the web client. In LRWP, a TCP connection is established between the LRWP peer and a LRWP agent. The LRWP agent could be the web container or a process running within the web container and the LRWP peer could be any process running on a network. At connection the LRWP peer registers the web context that the peer is interested in. The web context could be any context such as LRWP in Java
To implement this protocol in Java, we needed a web container to process HTTP. As the ISV was interested in an open source web container that was free, we choose GlassFish, a Java Platform, Enterprise Edition (Java EE) application server built on top of the Apache Tomcat web container. The design was to use servlets to listen to HTTP requests and pass on requests to a LRWP agent running within the container. The LRWP agent would then pass on the request to the right LRWP peer, and pass the response back to the servlet. The LRWP agent server registers the contexts that a peer is interested in and waits for a request on the interested contexts. If a request matches a context, the servlet thread goes to sleep on a context lock and the agent passes on the request to the LRWP peer, waits for a response from the peer, wakes up the servlet thread and passes it the response. The servlet thread returns the response back to the web client.
The LRWP agent has been implemented as a web application listening to "/*" context, so that every request first comes to this application. If the request is to a context registered by a LRWP peer, the request is passed onto the LRWP peer for further processing and if the request is to a context not registered with the LRWP agent, the request is dispatched to the default servlet for further processing by using the Integration with a LRWP peer, NexSRS
Open Settlements Protocol (OSP) is an international standard for VoIP carriers that provides a secure mechanism for IP communication. An OSP server authorizes call setup between peer VoIP gateways, Fig 1. The source gateway (the originating gateway in a call setup) sends an authorization request message to the OSP server to obtain the IP address of a destination gateway that can complete the call to the dialed number. The OSP server sends an authorization response message back to the source gateway. The authorization response message contains the IP address of the destination gateway that can complete the call to the dialed number and also a digitally signed token to be used by the source gateway in a call setup. The source gateway uses the digitally-signed token to connect to the destination gateway; the destination gateway verifies the token to make sure that it's coming from a trusted source. When the call is over, the source gateway and destination gateway both send a UsageIndication message to the OSP server. This message is confirmed by a
NexSRS is a multi-threaded OSP server that is also a LRWP peer. Clients communicate with NexSRS using HTTP. NexSRS uses an external web server, to process the HTTP requests. The external web server passes on the client request to NexSRS using the LRWP protocol for processing. NexSRS connects to a LRWP agent at Improving LRWP Performance
Tuning LRWP agent Java Code The initial design was to use a multi-threaded server to act as an LRWP agent within the Code snippet of the ContextAssistantManager:
Some of the other changes that could improve performance:
Tuning GlassFish
Tuning HTTPConnector Grizzly GlassFish's HTTPConnector, Grizzly, by default uses NIO to handle connection requests from clients. New Input/Output (NIO) is the IO introduced with JDK 1.4 that provided a scalable network and file IO, and native buffer management capability. NIO introduced channels which allow streams to be channels. SocketChannel is a selectable channel and allows multiple streams to be selectable for reading and writing. It eliminates the requirement of a thread per connection. So servers can now be built with a few threads that can handle multiple client connections, enabling increased performance and eliminating thread overhead. SocketChannel can be blocking or non-blocking. Grizzly provides both blocking and non-blocking implementations and by default is non-blocking and uses 2 threads and a maximum of 5 threads to handle requests from clients. This is tunable, and increasing the number of threads to 10 gave the best performance. Increasing pool sizes also improved performance. The following pool sizes were increased:
Tuning Garbage Collection
Using the parallel collector improved GlassFish performance to 27K (Fig 1.) The pause seen with the default collector on 4 cores disappeared with the use of the parallel collector. Increasing the heap from 1400m to 3400m improved performance. Increasing it further to about 7m should see more improvements in performance. (GlassFish seemed to have a problem with 64bit JVM and we could not try this.) Tuning Solaris
Solaris 10 is tuned for performance out-of-box. The tunables that we used were
The following were added to /etc/system
Setting Running On an x86-Based System
Load Generation The load was generated using a Sun Fire V280 (2 CPUs) and ApacheBench tool. Three instances of ApacheBench were started using a script, each sending a message to a URL such as http://eagle:1080-/osp.
On the server side, an instance of GlassFish listened to requests on port 1080 and passed on the request to the LRWP agent web application listening to "/*" context. Measuring CPS Calls per second (CPS) was measured by tailing the nexus.log file -- the log files show calls per minute (CPM), which needs to be converted. ApacheBench also outputs CPS at the end of the test, and this was compared with the log file to ensure that tests ran successfully. System Performance The tests were run on a x4100 (2 cores each on 2 sockets, 8GB, 2.6Ghz) running Solaris 10. The cores were enabled/disabled using Solaris' dynamic processor configuration utility, psradm.
Improvement in Performance
GlassFish (LRWP agent in Java)/NexSRS performance exceeded Xitami/NexSRS performance from a single core to 4 cores. GlassFish/NexSRS was 23% faster on a single core, and 76% faster with 4 cores. NexSRS uses about 68% of CPU with GlassFish on a single core while using about 43% with 4 cores. With Xitami, NexSRS uses about 53% of CPU on a single core while using 31% with 4 cores. GlassFish uses an average of 25% from a single core to 4 cores, while Xitami uses about 45% on a single core to about 21% with 4 cores. GlassFish with NIO seems to use less CPU time as compared to Xitami, allowing NexSRS to scale better. Conclusions
"LRWP agent in Java with GlassFish" performs very well exceeding the "LRWP agent in C with Xitami"1 performance from a single core to 4 cores. GlassFish with NIO scales extremely well from a single core to 4 cores and could see further improvement in performance on a 64bit JVM with an increased heap. Acknowledgments
We would like to thank Satyajit Tripathi for excellent project management, managing resources efficiently across time zones with time lines, making communication very efficient. We would also like to thank the GlassFish performance team of Scott Oaks and Jean-Francois Arcand for helping to tune the HTTP Grizzly connector, and Bruce Chapman, for reviewing and providing some fine suggestions including help with the chart. About the Authors
Nagendra Nagarajayya, has been working with Sun for the last 13 years. He works as a Staff Engineer at ISV Engineering working with Independent Software Vendors (ISVs) in the tele-communications (telco) industry on issues related to architecture, performance tuning, sizing and scaling, benchmarking, porting, etc. He specializes in multi-threaded issues, concurrency and parallelism, HA, distributed computing, networking and performance tuning. Dmitry Isakbayev has worked at TransNexus since 1997 and leads all software development. TransNexus has been an innovator of commercial and open source VoIP Operations and Billing Support Systems (OSS/BSS) since 1997. Deployment of the TransNexus OSS/BSS solution provides wholesale VoIP carriers with an immediate increase in operational profits. Key features include Least Cost Routing, Quality of Service Routing, secure inter-domain VoIP peering, traffic analysis and control, management reports, new revenues from wholesale services and lower cost back-office operations. Satyajit Tripathi is a Computer Engineer with 10+ years industry experience. Practicing Project Management in Sun Microsystems. Previous experience with working on Network Identity Management, Hospital SCM System, Mobile SOA etc. Ashish Banerjee is an Independent Software Developer, having 20 years of programming experience. He is passionate about Solaris Internals and Java Technology. Ranjan Kumar is a software professional presently working with Headstrong Inc. He has extensively worked on OOPS and Open System technologies. His white paper on Network Virtualization published in IBM and is active on open source development. Project administrator for a project on sourceforge.net Vikas Gera is a distributed computing specialist with 8 years of work experience in C++ programming on Solaris platform. He enjoys learning japanese. Download
Source for LRWP in Java/Glassfish References
Glossary
LRWP – Long Running Web Process is a protocol used by Xitami web server to communicate with its peers. Peers are processes that communicate with web clients. LRWP is similar to CGI but the peer maintains the connection with the web server across requests, increasing performance. Xitami – An open source web server, written in C. GlassFish – A Java EE open source application server. User threads – Also known as fiber, these are threads as part of a user library and run in user space. A user threads library provides good performance but can be a scalability bottleneck. Xitami makes uses of its own user threads library. Solaris threads – Threads on Solaris OS. Solaris provides a 1x1 threading model. A 1x1 threading model is where every application thread has a corresponding kernel thread. Posix or Pthreads – Threads that adhere to the posix standard. Solaris OS provides a posix and a Solaris specific thread api. OSP – Open Settlements Protocol is an international standard for VoIP carriers that provides a secure mechanism for IP communication. NexSRS – An OSP application from Transnexus. LRWP Peer – Peers are processes that communicate with web clients. Web clients could be browsers or other types of clients communicating over HTTP. Peers use LRWP protocol to communicate with the web server which in turn communicates with the web client using HTTP. LRWP Agent – A process that can communicate LRWP protocol with an LRWP peer. The LRWP agent could be the web container or a component running within the web container. HTTP – Hypertext Transfer Protocol (HTTP) is a method used to transfer or convey information on the World Wide Web. Its original purpose was to provide a way to publish and retrieve HTML pages. CPM – Calls Per Minute. A call could be a VoIP, mobile or fixed line call. CPS – Calls Per Second. CPU – A central processing unit (CPU), or sometimes simply processor, is the component in a digital computer that interprets computer program instructions and processes data. CPUs provide the fundamental digital computer trait of programmability, and are one of the necessary components found in computers of any era, along with primary storage and input/output facilities. Apache Bench (ab) – ApacheBench is a command line computer program for measuring the performance of HTTP web servers, in particular the Apache HTTP Server. It was designed to give an idea of the performance that a given Apache installation can provide. In particular, it shows how many requests per second the server is capable of serving [11]. Grizzly (HTTP Connector) – The HTTP Connector used by GlassFish. NIO – A collection of Java programming language APIs that offer features for intensive I/O operations. It was introduced with the J2SE 1.4 release of Java by Sun Microsystems to complement an existing standard I/O. NIO was developed under the Java Community Process as JSR 51. ServletContext – Servlets allows a software developer to add dynamic content to a web server using the Java platform. The generated content is commonly HTML, but may be other data such as XML. Servlets are the Java counterpart to non-Java dynamic Web content technologies such as PHP, CGI and ASP.NET. Servlets can maintain state across many server transactions by using HTTP cookies, session variables or URL rewriting. There is only one ServletContext in every application. This object can be used by all the servlets to obtain application level information or container details. CMT – Today's traditional single-core processors can only process one thread at a time, spending a majority of time waiting for data from memory. In sharp contrast, chip multithreading (CMT) refers to a processor's ability to process multiple software threads. A CMT processor could implement this multithreaded capability using a variety of methods, such as (i) having multiple cores on a single chip (CMP), (ii) executing multiple threads on a single core (SMT), or (iii) combination of both CMP and SMT. Solaris Zones – Solaris Containers (including Solaris Zones) is a virtualization feature first available with Solaris 10. This is an implementation of operating system-level virtualization technology. 1Processor sets were not tried. Processor sets could help improve Xitami's performance. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||