![]() |
|||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||
Conference Tools
|
Technical Session: Best Practices for Large-Scale Web Sites -- Lessons from eBayThere may be no better example of a company who has met the demands of Internet scale than eBay, with 88.3 million active users worldwide, over 2 billion page views, and 48 billion SQL executions every day. And there may be no one more qualified to discuss those demands than Randy Shoup, architect of the eBay web site and primary architect of eBay's search infrastructure. Shoup's presentation at the 2009 JavaOne conference, titled Best Practices for Large-Scale Web Sites: Lessons from eBay, gave attendees a valuable behind-the-scenes look at how eBay approached the requirements of Internet scale. Design Goals
Architectural design goals at eBay are the following:
Best Practices
Shoup described the architecture that realized these goals, in terms of a set of best practices: Partition Everything
A system that can't be split into modules can't be scaled. By partitioning for scale, eBay breaks into manageable pieces the problem that its huge load and number of users presents. The smaller pieces can scale independently of one another. At eBay, there is no such thing as "the database" or "the application server." Every part of the problem, across every tier, is partitioned or load-balanced in some way. One partitioning strategy is to use functional segmentation. This approach segments processing into pools, services, and stages -- for example, selling, searching, bidding, checkout, and feedback. Another strategy is to segment data by entity and usage pattern -- for example, user, item, transaction, product, and account. A consequence of this segmentation is that no single database runs the entire eBay site. Instead, many different logical databases reflect the different functional parts of the site. For example, separate databases store selling information, item information, user information, recently closed transactions, and so on. As a result, eBay maintains more than 1000 different logical database instances spread over more than 400 servers. A system that is scaled into independent pieces is easier to troubleshoot because failures are more obviously isolated. The decoupled pieces are also easier to manage and can be sized to maximize the ratio of performance to cost. Breaking the problem into smaller pieces involves not only vertical segmentation but horizontal as well, chiefly in the form of load balancing and data sharding. In load balancing, all servers in a pool -- selling, search, bidding, and so on -- have equal rights. Similarly, data is split or sharded along primary data access paths - - user, item transaction, and so on. As a result of this horizontal segmentation, sessions carry no
state information. A user session flows through multiple pools --
selling, search, bidding -- but there is no session state or business
object caching in the system's application tier. State information is
not stored in Asynchrony Everywhere
When a system is segmented, asynchronous communication yields several benefits:
Asynchrony is used as much as possible at eBay. The goal is to minimize latency in the user experience. Executions that don't serve that goal are postponed when necessary. eBay uses two strategies to implement asynchrony: event queues and message multicasting. Asynchrony: Event Queue
When implementing an event queue, a primary application writes
data into a transactional database and queues the event using primary
insert/update calls such as Asynchrony: Message Multicasting
Shoup is especially qualified to describe scalability in the search implementation at eBay, because he was its chief architect. The sheer volume of the data that the site handles is a challenge to both availability and performance. Search architecture relies on both modularity and message multicasting to meet these challenges. The search index is divided into slices and stored as a grid structure composed of columns and rows. Each slice of the index is stored on a column of machines. The rows in each column represent redundant copies of the slice. The implementation must scale up when the search index becomes larger or the load increases. When the search index becomes larger, more columns are added. When load increases, more rows are added. So the grid scales both linearly and horizontally.
Message multicasting enters the search design through a search feeder and search engines. The search feeder publishes item updates by reading them from a primary database, then publishing sequenced updates to the grid by multicasting. Search engines listen for assigned messages and update an in-memory index in real time. To guarantee reliability, search engines request recovery when they see that a message in a sequence has been missed. Implementing message recovery at the application level rather than at the transport level improved scalability. Automate Everything
Scalability involves more than the ability to add nodes and functionality to a web site easily. The additional capability is not useful if it increases the management required to maintain it. To reduce the load on personnel, eBay automates as much manageability as possible. Designing systems that adapt to their environments requires creating feedback loops that learn over time. The practices of catching errors in these loops and writing log files offer additional benefits when troubleshooting. Another advantage of an automated, adaptive system is that it can often do more than a manually configured system can. It can consider more factors and can make decisions more quickly. The resulting system has lower latency and, with less human intervention, is less costly. Automation: Adaptive Configuration
One automation design pattern is adaptive configuration. With adaptive configuration, the polling frequency, number of threads, and other performance tuning is determined not by static configuration files but by load conditions. For a given event consumer, eBay defines a service-level agreement (SLA). For example, the SLA may require that 99 percent of events are processed within 15 seconds. Each event consumer then automatically adjusts to meet the SLA as load, processing time, and number of event consumers changes. Automation: Machine Learning
Another design pattern is machine learning. For example, the user's search experience automatically adapts by determining the most appropriate sale items and assembling the optimal page for a particular user and context. Feedback loops enable the system to learn and improve over time. The system collects user behavior and aggregates and analyzes it offline. It then deploys updated metadata and determines and serves an experience customized for the user. Remember, Everything Fails
To maximize availability, eBay has designed all its systems to be tolerant of failure. The design continues to be governed by the assumption that every operation will fail at some point, and every resource will be unavailable. When such failures occur, the design goal is to detect them rapidly, recover from them promptly, and continue operation as fully as possible in the meantime. Failure: Detection, Recovery, and Graceful Degradation
To detect failures, application servers log all application activity, database, and service calls on the multicast message bus. The system generates over 2 TB of log messages per day. Failure detection and notification is automated by listeners. Failure recovery is aided by a sweeping rollback capability. No changes are ever made to the site that cannot be undone, and every feature has an on/off state that can be toggled, whether for operational or business reasons. Features can be deployed in a default off state so that they can be turned on quickly across the enterprise, and turned off again if necessary. Another design pattern that implements availability is graceful degradation. An application "marks down" a resource if it is unavailable or distressed. If a degraded function is noncritical, it is removed or ignored. If it is critical, the application retries, often initiating failover to an alternate resource. If the retry fails, processing is deferred to a guaranteed asynchronous event. When resources recover, they are explicitly "marked up" and brought online in a controlled manner. Embrace Inconsistency
Finally, the architects at eBay embrace inconsistency. Like any large, scaled Internet site, eBay has found itself bound by the CAP theorem, first proposed by Eric Brewer in July 2000. Briefly stated, the theorem shows that any shared-data system can have at most two of the following properties:
The trade-off among these three properties is fundamental to all distributed systems. Designers at eBay navigated the theorem by choosing availablity and partition-tolerance while enforcing consistency policies that guaranteed eventual consistency in appropriate time frames. Typically, eBay trades off immediate consistency for availability and partition-tolerance. The rationale was that most real-world systems -- even financial systems -- do not require immediate consistency. Consistency requirements are seen as a spectrum: At one end of the spectrum is the requirement for immediate consistency, as demanded by bid and purchasing data. At the other end is a very low consistency requirement, such as demanded by user preferences. In the center are degrees of "eventual consistency," as reflected in data required for the search engine, billing, and so on. To increase immediate consistency, eBay avoids distributed transactions such as JDBC client transactions. Consistency is maximized through state machines and careful ordering of database operations. Eventual consistency is reached through asynchronous event or reconciliation batch processing. Do you have comments about this article? We welcome your participation in our community. Please keep your comments civil and on point. You may optionally provide your email address to be notified of replies - your information is not used for any other purpose. By submitting a comment, you agree to these Terms of Use. |
||||||||||||||||||||||||||||||
ContactUs | About Sun | Privacy | Terms of Use | Trademarks Conference content is subject to change. Copyright 1996 - 2009 Sun Microsystems, Inc. |
![]() |
|