Google Cluster Architecture
"Web Search for a Planet: The Google Cluster Architecture," IEEE Micro, Mar-Apr., 2003, 22-28.
The central idea of this IEEE Micro article is that Google has designed a search engine which is energy efficient, reliable, and so cost effective that it allows them to provide superior service. The article begins by pointing out that every request to a search engine requires complex computations. When Google gets a request, it reads hundreds of megabytes of data and uses tens of billions of CPU cycles. With thousands of such requests happening every second, Google's infrastructure compares in size to a supercomputer installation! Energy efficiency and price-performance ratio are the most important factors to its design. Easy parallelism is the main priority so that different queries can run on different processors (the overall index is partitioned so that a single query can use multiple processors).
Google's architecture provides reliability by using many commodity PCs to build computing clusters. The design is tailored for "best aggregate request throughput" rather than peak server response time -- reponse times are managed by parallelizing individual requests. Thus, a reliable computing infrastructure is fashioned from clusters of unreliable commodity PCs. At the software level reliability is achieved by replicating services across many different machines and automatically detecting failures.
When a user queries Google, the user's browser first identifies the nearest domain. Multiple clusters are distributed worldwide with sufficient capacity to handle query traffic. The system selects the nearest cluster. This minimizes the time required to respond to the user's query. The user's browser sends a hypertext transport protocol (HTTP) request to that cluster which processes the query. Each cluster has its own load-balancer and distributes requests across the available Google Web Servers (GWS). The GWS coordinates query and formats it into Hypertext Markup Language (HTML).
There are two major phases to query execution. In the first phase, each query word is mapped to a matching list of documents. This forms a hit list. Then the hit list is compared to relevant documents, and a relevance score is computed for each document. The relevance score determines the order of results on the output page. This was accomplished when the index server consulted an inverted index comprised of many terabytes of data. The huge amount of data makes the search process very challenging, but the final result of the first phase is an ordered list of In the second phase of the query execution, this list of identifiers is taken and computed to produce a query-specific document summary. Document servers examine each document for the title and keyword. As in the first phase, to do this, the documents are randomly distributed into smaller shards; multiple server replicas handle each shard; and requests are routed through a load balancer. Google stores dozens of copies of the Web across its cluster to insure adequate replication in all the clusters. When both phases are complete, a GWS sends the HTML to the output page and returns it to the user's browser.
By parallelizing the search over many machines, the wait is reduced to answer a query. Most accesses are read-only. When updates are done (infrequently), queries are diverted. The main thing is that the inherent parallelism of the system is aggressively exploited. Big look-ups of matching documents are transformed into smaller indices and then merged afterwards. The query stream is divided too. Machines are added to each pool to increase the capacity. The total computation is divided across CPUs and disks, and the hardware selection process focuses on machines that offer an excellent request throughput.
Google achieves software reliability by avoiding fault-tolerant hardware features. The main focus is on tolerating failures in software. Replication gives better request throughput and availability. Each internal service is replicated across many machines to overcome the inherent unreliability of machines in general: "Because we already replicate services across multiple machines to obtain sufficient capacity, this type of fault tolerance almost comes for free" (p. 24). Furthermore, CPUs are purchased which give the best performance per unit price, rather than CPUs that give absolutely the best performance overall. The use of commodity PCs reduces the cost of computation. Thus, Google can afford to use more resources per query and search a larger index of documents.
Google has found a way to increase performance without increasing the costs. Compared to other more expensive solutions (such as four-processor motherboards or SCSI disks) they get the best of both worlds. Google's servers resemble mid-range desktop PCs, except for larger disk drives. Realistically, a server does not last more than two or three years because the performance, compared to newer machines, will not be as good. Older machines are slower and it is more difficult to achieve proper load distribution and configuration in clusters that contain both old and new. So equipment costs figure prominently in the budget. Google servers are custom made and cost around $278,000 or $7,700 a month for the three years of their lives. The remaining expenses are mostly for personnel and hosting costs. Compared to other solutions, there are significantly more systems administration and repair costs, but the costs are manageable. All the machines have identical configurations, so maintenance of 1000 servers doesn't cost much more than maintenance of 100 servers. The cost of monitoring, similarly, does not increase with the size of the cluster.
Power consumption and cooling issues are the big challenge. Reduced power servers may not save money in the long run. Their reduced power consumption is desirable, but it has to come without limiting performance. What counts are watts per unit of performance, not watts alone. A low-power server may depreciate more quickly which eats up the savings. A Google 10kW rack costs only $1,500 per month. In order to be cost advantageous, a low-power server must not be more expensive than a regular server would be.
You’re 85% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.