Verified Document

Google Cluster Architecture "Web Search For A Term Paper

Google Cluster Architecture "Web Search for a Planet: The Google Cluster Architecture," IEEE Micro, Mar-Apr., 2003, 22-28.

The central idea of this IEEE Micro article is that Google has designed a search engine which is energy efficient, reliable, and so cost effective that it allows them to provide superior service. The article begins by pointing out that every request to a search engine requires complex computations. When Google gets a request, it reads hundreds of megabytes of data and uses tens of billions of CPU cycles. With thousands of such requests happening every second, Google's infrastructure compares in size to a supercomputer installation! Energy efficiency and price-performance ratio are the most important factors to its design. Easy parallelism is the main priority so that different queries can run on different processors (the overall index is partitioned so that a single query can use multiple processors).

Google's architecture provides reliability by using many commodity PCs to build computing clusters. The design is tailored for "best aggregate request throughput" rather than peak server response time -- reponse times are managed by parallelizing individual requests. Thus, a reliable computing infrastructure is fashioned from clusters of unreliable commodity PCs. At the software level reliability is achieved by replicating services across many different machines and automatically detecting failures.

When a user queries Google, the user's browser first identifies the nearest domain. Multiple clusters are distributed worldwide with sufficient capacity to handle query traffic. The system selects the nearest cluster. This minimizes the time required to respond to the user's query. The user's browser sends a hypertext transport protocol (HTTP) request to that cluster which processes the query. Each cluster has its...

The GWS coordinates query and formats it into Hypertext Markup Language (HTML).
There are two major phases to query execution. In the first phase, each query word is mapped to a matching list of documents. This forms a hit list. Then the hit list is compared to relevant documents, and a relevance score is computed for each document. The relevance score determines the order of results on the output page. This was accomplished when the index server consulted an inverted index comprised of many terabytes of data. The huge amount of data makes the search process very challenging, but the final result of the first phase is an ordered list of In the second phase of the query execution, this list of identifiers is taken and computed to produce a query-specific document summary. Document servers examine each document for the title and keyword. As in the first phase, to do this, the documents are randomly distributed into smaller shards; multiple server replicas handle each shard; and requests are routed through a load balancer. Google stores dozens of copies of the Web across its cluster to insure adequate replication in all the clusters. When both phases are complete, a GWS sends the HTML to the output page and returns it to the user's browser.

By parallelizing the search over many machines, the wait is reduced to answer a query. Most accesses are read-only. When updates are done (infrequently), queries are diverted. The main thing is that the inherent parallelism of the system is aggressively exploited. Big look-ups of matching documents are transformed into smaller indices and then merged afterwards. The query stream is divided too. Machines are added to each pool to increase the capacity. The total computation is divided across CPUs and disks, and the hardware…

Sources used in this document:
Bibliography

"Web Search for a Planet: The Google Cluster Architecture," (2003). IEEE Micro, Mar-Apr., 22-28.
Cite this Document:
Copy Bibliography Citation

Related Documents

Wide Web Is Available Around
Words: 14250 Length: 52 Document Type: Term Paper

The reward for the effort of learning is access to a vocabulary that is shared by a very large population across all industries globally" (p. 214). Moreover, according to Bell, because UML is a language rather than a methodology, practitioners who are familiar with UML can join a project at any point from anywhere in the world and become productive right away. Therefore, Web applications that are built using

National Basketball Association Marketing in
Words: 3129 Length: 8 Document Type: Book Report

The analytics that each of the league's teams marketing departments use also pinpoint the most and least interesting aspects or content (both digitally written and video-based) delivered across the website and microsites. The league marketing teams have also experimented with more interactive experiences with passive spectators, focused experiencers and absorbed identifiers through the use of social media. This also validates the findings of Napoli with regard to the egalitarianism of

Computer Technology Assessing the Evolution
Words: 876 Length: 3 Document Type: Thesis

The real-time integration to memory management that would lead to rapid advanced in superscalar memory management made possible with RISC-based microprocessors and memory however continued to be driven by IBM and their partners working in conjunction with each other on new developments (Biswas, Carley, Simpson, Middha, Barua, 2006). Implications of RISC Development on Memory Management Advances Over the first twenty five years of RISC processor and memory development the key lessons

Real Time and Fault Tolerant
Words: 3152 Length: 11 Document Type: Essay

Because the system has a lot of user-defined capabilities, users gain the flexibility to configure the system to meet their specific needs. While there was a lot of detailed information in the case study, there were some information gaps. A definition of the types of faults the system detects, such as transient, permanent, or intermittent, and how the system handled the different faults would have been helpful. Also, knowing how

Young Generation Chapter One and
Words: 10896 Length: 40 Document Type: Dissertation

Large number of respondents will require large number of questionnaires to be given to the individuals and this would have high financial implications. 1.9 Delimitations Due to the problem of fear of the information that if the Facebook members disclose will be used against them in future, I intend to guarantee them anonymity on any information given and confidentiality by ensuring no names or sensitive information is required in the questionnaires. On

Opportunities to Succeed As an
Words: 5593 Length: 20 Document Type: Term Paper

" (2000) There are other factors associated with change that enhance the ability for the independent hotels to compete as there is a segment of customers with the desire to discover for themselves what best satisfies their taste. The independent hotels offer guests "the option of maintaining their differentiation while affiliating with 'soft' brands, which reflect a defined product and offer similar service support as franchisers or chains." (Swig, 2000)

Sign Up for Unlimited Study Help

Our semester plans gives you unlimited, unrestricted access to our entire library of resources —writing tools, guides, example essays, tutorials, class notes, and more.

Get Started Now