Google data centers are the large
data center
A data center is a building, a dedicated space within a building, or a group of buildings used to house computer systems and associated components, such as telecommunications and storage systems.
Since IT operations are crucial for busines ...
facilities
Google
Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
uses to provide
their services, which combine large drives, computer nodes organized in aisles of racks, internal and external networking, environmental controls (mainly cooling and humidification control), and operations software (especially as concerns
load balancing and
fault tolerance
Fault tolerance is the ability of a system to maintain proper operation despite failures or faults in one or more of its components. This capability is essential for high-availability, mission-critical, or even life-critical systems.
Fault t ...
).
There is no official data on how many
servers are in Google data centers, but
Gartner
Gartner, Inc. is an American research and advisory firm focusing on business and technology topics. Gartner provides its products and services through research reports, conferences, and consulting. Its clients include large corporations, gover ...
estimated in a July 2016 report that Google at the time had 2.5 million servers. This number is changing as the company expands capacity and refreshes its hardware.
Locations
The locations of Google's various data centers by continent are as follows:
Hardware
Original hardware

The original hardware (circa 1998) that was used by Google when it was located at
Stanford University
Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...
included:
*
Sun Microsystems Ultra II with dual 200
MHz
The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), often described as being equivalent to one event (or cycle) per second. The hertz is an SI derived unit whose formal expression in terms of SI base u ...
processors, and 256
MB of
RAM
Ram, ram, or RAM most commonly refers to:
* A male sheep
* Random-access memory, computer memory
* Ram Trucks, US, since 2009
** List of vehicles named Dodge Ram, trucks and vans
** Ram Pickup, produced by Ram Trucks
Ram, ram, or RAM may also ref ...
. This was the main machine for the original
Backrub system.
* 2 × 300 MHz dual
Pentium II
The Pentium II is a brand of sixth-generation Intel x86 microprocessors based on the P6 (microarchitecture), P6 microarchitecture, introduced on May 7, 1997. It combined the ''P6'' microarchitecture seen on the Pentium Pro with the MMX (instruc ...
servers donated by
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
, they included 512 MB of RAM and 10 × 9
GB hard drives between the two. It was on these that the main search ran.
* F50 IBM
RS/6000 donated by
IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
, included 4 processors, 512 MB of memory and 8 × 9 GB hard disk drives.
* Two additional boxes included 3 × 9 GB hard drives and 6 x 4 GB hard disk drives respectively (the original storage for Backrub). These were attached to the Sun Ultra II.
* SSD disk expansion box with another 8 × 9 GB hard disk drives donated by IBM.
* Homemade disk box which contained 10 × 9 GB
SCSI
Small Computer System Interface (SCSI, ) is a set of standards for physically connecting and transferring data between computers and peripheral devices, best known for its use with storage devices such as hard disk drives. SCSI was introduced ...
hard disk drives.
Google Cluster
The state of Google infrastructure in 2003 was described in a report by
Luiz André Barroso,
Jeff Dean, and
Urs Hölzle as a "reliable computing infrastructure from clusters of unreliable commodity PCs".
At the time, on average, a single search query read ~100
MB of data, and consumed
CPU cycles. During peak time, Google served ~1000 queries per second. To handle this peak load, they built a compute cluster with ~15,000 commodity-class PCs instead of expensive supercomputer hardware to save money. To make up for the lower hardware reliability, they wrote
fault tolerant software.
The structure of the cluster consists of five parts. Central Google Web servers (GWS) face the public Internet. Upon receiving a user request, the Google Web server communicates with a spell checker, an advertisement server, many index servers, many document servers. Each of the four parts responds to a part of the request, and the GWS assembles their responses and serves the final response to the user.
The raw documents were ~100 TB, and the index files were ~10 TB. The index files are sharded, and each shard is served by a "pool" of index servers. Similarly, the raw documents are also sharded. Each query to the index file results in a list of document IDs, which are then sent to the document servers to retrieve the title and the keyword-in-context snippets.
There were several CPU generations in use, ranging from single-processor 533 MHz
Intel-Celeron-based servers to dual 1.4 GHz Intel
Pentium III
The Pentium III (marketed as Intel Pentium III Processor, informally PIII or P3) brand refers to Intel's 32-bit x86 desktop and mobile CPUs based on the sixth-generation P6 (microarchitecture), P6 microarchitecture introduced on February 28, 1999 ...
. Each server contained one or more hard drives, 80
GB each. Index servers have less disk space than document servers. Each rack had two
Ethernet switches, one per side. The servers on each side interconnected via a 100-Mbps. Each switch had a ~250 MB/sec uplink to a central switch that connected to all racks.
The design objectives include:
* Use low-reliability consumer hardware and make up for it with fault-tolerant software.
* Maximize parallelism, such as by splitting a single document match lookup in a large index into a
MapReduce
MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel and distributed algorithm on a cluster.
A MapReduce program is composed of a ''map'' procedure, which performs filte ...
over many small indices.
* Partition index data and computation to minimize communication and evenly balance the load across servers, because the cluster is a large shared-memory machine.
* Minimize system management overheads by developing all software in-house.
* Pick hardware that maximizes performance/price, not absolute performance.
* Pick hardware that has high
thoroughput over high
latency. This is because queries are served with massive parallelism, with very few dependent steps and minimal communication between servers, so high latency does not matter.
Due to the massive parallelism, scaling up hardware scales up the thoroughput linearly, i.e. doubling the compute cluster doubles the number of queries servable per second.
The cluster is made of server racks at 2 configurations: 40 x
1u per side with 2 sides, or 20 x
2u per side with 2 sides. The power consumption is 10 kW per rack, at a density of 400 W/ft^2, consuming 10
MWh per month, costing $1,500 per month.
Production hardware
As of 2014, Google has used a heavily customized version of
Debian
Debian () is a free and open-source software, free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kerne ...
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
. They migrated from a Red Hat-based system incrementally in 2013.
The customization goal is to purchase CPU generations that offer the best performance per dollar, not absolute performance. How this is measured is unclear, but it is likely to incorporate running costs of the entire server, and CPU power consumption could be a significant factor. Servers as of 2009–2010 consisted of custom-made open-top systems containing two processors (each with several cores
), a considerable amount of RAM spread over 8 DIMM slots housing double-height DIMMs, and at least two SATA hard disk drives connected through a non-standard ATX-sized power supply unit. The servers were open top so more servers could fit into a rack. According to CNET and a book by
John Hennessy, each server had a novel 12-volt battery to reduce costs and improve power efficiency.
[Computer Architecture, Fifth Edition: A Quantitative Approach, ; Chapter Six; 6.7 "A Google Warehouse-Scale Computer]
page 471
"Designing motherboards that only need a single 12-volt supply so that the UPS function could be supplied by standard batteries associated with each server"
According to Google, their global data center operation electrical power ranges between 500 and 681
megawatts
The watt (symbol: W) is the unit of power or radiant flux in the International System of Units (SI), equal to 1 joule per second or 1 kg⋅m2⋅s−3. It is used to quantify the rate of energy transfer. The watt is named in honor o ...
.
The combined processing power of these servers might have reached from 20 to 100
petaflops
Floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations.
For such cases, it is a more accurate measu ...
in 2008.
Network topology
Details of the Google worldwide private networks are not publicly available, but Google publications make references to the "Atlas Top 10" report that ranks Google as the third largest ISP behind
Level 3.
In order to run such a large network, with direct connections to as many ISPs as possible at the lowest possible cost, Google has a very open
peering
In computer networking, peering is a voluntary interconnection of administratively separate Internet networks for the purpose of exchanging traffic between the "down-stream" users of each network. Peering is settlement-free, also known as "bill-a ...
policy.
From this site, we can see that the Google network can be accessed from 67 public exchange points and 69 different locations across the world. As of May 2012, Google had 882 Gbit/s of public connectivity (not counting private peering agreements that Google has with the largest ISPs). This public network is used to distribute content to Google users as well as to crawl the internet to build its search indexes.
The private side of the network is a secret, but a recent disclosure from Google indicate that they use custom built high-radix
switch-routers (with a capacity of 128 × 10
Gigabit Ethernet
In computer networking, Gigabit Ethernet (GbE or 1 GigE) is the term applied to transmitting Ethernet frames at a rate of a gigabit per second. The most popular variant, 1000BASE-T, is defined by the IEEE 802.3ab standard. It came into use in ...
port) for the
wide area network
A wide area network (WAN) is a telecommunications network that extends over a large geographic area. Wide area networks are often established with leased telecommunication circuits.
Businesses, as well as schools and government entities, use ...
. Running no less than two routers per datacenter (for redundancy) we can conclude that the Google network scales in the terabit per second range (with two fully loaded routers the bi-sectional bandwidth amount to 1,280 Gbit/s).
These custom switch-routers are connected to
DWDM devices to
interconnect data centers and
point of presence
A point of presence (PoP) is an artificial demarcation point or network interface point between communicating entities. A common example is an ISP point of presence, the local access point that allows users to connect to the Internet with their ...
s (PoP) via
dark fiber.
From a datacenter view, the network starts at the rack level, where
19-inch rack
A 19-inch rack is a standardized frame or enclosure for mounting multiple electronic equipment modules. Each module has a front panel that is wide. The 19 inch dimension includes the edges or ''ears'' that protrude from each side of the ...
s are custom-made and contain 40 to 80 servers (20 to 40 1
U servers on either side, while new servers are 2U rackmount systems.
[Web Search for a Planet: The Google Cluster Architecture](_blank)
(Luiz André Barroso, Jeffrey Dean, Urs Hölzle) Each rack has an
Ethernet switch
A network switch (also called switching hub, bridging hub, Ethernet switch, and, by the IEEE, MAC bridge) is networking hardware that connects devices on a computer network by using packet switching to receive and forward data to the destinat ...
). Servers are connected via a 1 Gbit/s
Ethernet
Ethernet ( ) is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
link to the top of rack switch (TOR). TOR switches are then connected to a
gigabit
The bit is the most basic unit of information in computing and digital communication. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as ...
cluster switch using multiple gigabit or ten gigabit uplinks. The cluster switches themselves are interconnected and form the datacenter interconnect fabric (most likely using a dragonfly design rather than a classic butterfly or flattened butterfly layout).
From an operation standpoint, when a client computer attempts to connect to Google, several
DNS servers resolve
www.google.com
into multiple IP addresses via
Round Robin policy. Furthermore, this acts as the first level of
load balancing and directs the client to different Google clusters. A Google cluster has thousands of
servers, and once the client has connected to the server additional load balancing is done to send the queries to the least loaded web server. This makes Google one of the largest and most complex
content delivery network
A content delivery network (CDN) or content distribution network is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance ("speed") by distributing the service spat ...
s.
Google has numerous data centers scattered around the world. At least 12 significant Google data center installations are located in the United States. The largest known centers are located in
The Dalles, Oregon
The Dalles ( ;) formally the City of the Dalles and also called Dalles City, is an inland port, the county seat of and the largest city in Wasco County, Oregon, Wasco County, Oregon, United States. The population was 16,010 at the 2020 United ...
;
Atlanta, Georgia
Atlanta ( ) is the List of capitals in the United States, capital and List of municipalities in Georgia (U.S. state), most populous city in the U.S. state of Georgia (U.S. state), Georgia. It is the county seat, seat of Fulton County, Georg ...
;
Reston, Virginia
Reston is a census-designated place in Fairfax County, Virginia, United States, and a principal city of both Northern Virginia and the Washington metropolitan area. As of the 2020 U.S. census, Reston's population was 63,226.
Founded in 1964, Rest ...
;
Lenoir, North Carolina
Lenoir ( ) is a city in and the county seat of Caldwell County, North Carolina, United States. The population was 18,263 at the 2020 United States census, 2020 census. Lenoir is located in the foothills of the Blue Ridge Mountains. To the northe ...
; and
Moncks Corner, South Carolina
Moncks Corner is a town in and the county seat of Berkeley County, South Carolina, United States. The population was 7,885 at the 2010 United States census, 2010 census. As defined by the U.S. Census Bureau, Moncks Corner is included within the Ch ...
.
In Europe, the largest known centers are in
Eemshaven and
Groningen
Groningen ( , ; ; or ) is the capital city and main municipality of Groningen (province), Groningen province in the Netherlands. Dubbed the "capital of the north", Groningen is the largest place as well as the economic and cultural centre of ...
in the
Netherlands
, Terminology of the Low Countries, informally Holland, is a country in Northwestern Europe, with Caribbean Netherlands, overseas territories in the Caribbean. It is the largest of the four constituent countries of the Kingdom of the Nether ...
and
Mons
Mons commonly refers to:
* Mons, Belgium, a city in Belgium
* Mons pubis (mons Venus or mons veneris), in mammalian anatomy, the adipose tissue lying above the pubic bone
* Mons (planetary nomenclature), a sizable extraterrestrial mountain
* Batt ...
,
Belgium
Belgium, officially the Kingdom of Belgium, is a country in Northwestern Europe. Situated in a coastal lowland region known as the Low Countries, it is bordered by the Netherlands to the north, Germany to the east, Luxembourg to the southeas ...
.
Google's
Oceania
Oceania ( , ) is a region, geographical region including Australasia, Melanesia, Micronesia, and Polynesia. Outside of the English-speaking world, Oceania is generally considered a continent, while Mainland Australia is regarded as its co ...
Data Center is located in
Sydney
Sydney is the capital city of the States and territories of Australia, state of New South Wales and the List of cities in Australia by population, most populous city in Australia. Located on Australia's east coast, the metropolis surrounds Syd ...
,
Australia
Australia, officially the Commonwealth of Australia, is a country comprising mainland Australia, the mainland of the Australia (continent), Australian continent, the island of Tasmania and list of islands of Australia, numerous smaller isl ...
.
Data center network topology
To support
fault tolerance
Fault tolerance is the ability of a system to maintain proper operation despite failures or faults in one or more of its components. This capability is essential for high-availability, mission-critical, or even life-critical systems.
Fault t ...
, increase the scale of data centers and accommodate low-radix switches, Google has adopted various modified
Clos topologies in the past.
Project 02

One of the largest Google data centers is located in the town of
The Dalles, Oregon
The Dalles ( ;) formally the City of the Dalles and also called Dalles City, is an inland port, the county seat of and the largest city in Wasco County, Oregon, Wasco County, Oregon, United States. The population was 16,010 at the 2020 United ...
, on the
Columbia River
The Columbia River (Upper Chinook language, Upper Chinook: ' or '; Sahaptin language, Sahaptin: ''Nch’i-Wàna'' or ''Nchi wana''; Sinixt dialect'' '') is the largest river in the Pacific Northwest region of North America. The river headwater ...
, approximately 80 miles (129 km) from
Portland. Codenamed "Project 02", the complex was built in 2006 and is approximately the size of two
American football field
The rectangular field of play used for American football games measures long between the goal lines, and (53.3 yards) wide. The field may be made of grass or artificial turf. In addition, there are two end zones on each end of the field, ext ...
s, with
cooling towers four stories high.
[Markoff, John; Hansell, Saul.]
Hiding in Plain Sight, Google Seeks More Power.
''New York Times
''The New York Times'' (''NYT'') is an American daily newspaper based in New York City. ''The New York Times'' covers domestic, national, and international news, and publishes opinion pieces, investigative reports, and reviews. As one of ...
.'' June 14, 2006. Retrieved on October 15, 2008. The site was chosen to take advantage of inexpensive
hydroelectric power
Hydroelectricity, or hydroelectric power, is Electricity generation, electricity generated from hydropower (water power). Hydropower supplies 15% of the world's electricity, almost 4,210 TWh in 2023, which is more than all other Renewable energ ...
, and to tap into the region's large
surplus of
fiber optic
An optical fiber, or optical fibre, is a flexible glass or plastic fiber that can transmit light from one end to the other. Such fibers find wide usage in fiber-optic communications, where they permit transmission over longer distances and at ...
cable, a remnant of the
dot-com boom
The dot-com bubble (or dot-com boom) was a stock market bubble that ballooned during the late-1990s and peaked on Friday, March 10, 2000. This period of market growth coincided with the widespread adoption of the World Wide Web and the Intern ...
. A blueprint of the site appeared in 2008.
Summa papermill
In February 2009,
Stora Enso announced that they had sold the Summa paper mill in
Hamina
Hamina (; , , Sweden ) is a List of cities in Finland, town and a Municipalities of Finland, municipality of Finland. It is located approximately east of the country's capital Helsinki, in the Kymenlaakso Regions of Finland, region, and formerly ...
,
Finland
Finland, officially the Republic of Finland, is a Nordic country in Northern Europe. It borders Sweden to the northwest, Norway to the north, and Russia to the east, with the Gulf of Bothnia to the west and the Gulf of Finland to the south, ...
to Google for 40 million Euros. Google invested 200 million euros on the site to build a data center and announced additional 150 million euro investment in 2012. Google chose this location due to the availability and proximity of renewable energy sources.
Floating data centers
In 2013, the press revealed the existence of Google's floating data centers along the coasts of the states of California (
Treasure Island's Building 3) and Maine. The development project was maintained under tight secrecy. The data centers are 250 feet long, 72 feet wide, 16 feet deep. The patent for an in-ocean data center cooling technology was bought by Google in 2009 (along with a wave-powered ship-based data center patent in 2008). Shortly thereafter, Google declared that the two massive and secretly built infrastructures were merely "interactive learning centers,
..a space where people can learn about new technology."
Google halted work on the barges in late 2013 and began selling off the barges in 2014.
Software
Most of the
software stack
In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on ...
that Google uses on their servers was developed in-house. According to a well-known former Google employee in 2006,
C++,
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
,
Python and (more recently)
Go are favored over other programming languages. For example, the back end of Gmail is written in Java and the back end of Google Search is written in C++. Google has acknowledged that Python has played an important role from the beginning, and that it continues to do so as the system grows and evolves.
The software that runs the Google infrastructure includes:
*
Google Web Server (GWS) custom Linux-based Web server that Google uses for its online services.
* Storage systems:
**
Google File System and its successor, Colossus
**
Bigtable structured storage built upon GFS/Colossus
**
Spanner planet-scale database, supporting externally-consistent distributed transactions
**
Google F1 a distributed, quasi-
SQL
Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel")
is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
DBMS
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and ana ...
based on Spanner, substituting a custom version of MySQL.
*
Chubby lock service
*
MapReduce
MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel and distributed algorithm on a cluster.
A MapReduce program is composed of a ''map'' procedure, which performs filte ...
and
Sawzall programming language
* Indexing/search systems:
** TeraGoogle Google's large search index (launched in early 2006)
** Caffeine (Percolator) continuous indexing system (launched in 2010).
** Hummingbird major search index update, including complex search and voice search.
*
Borg
The Borg are an alien group that appear as recurring antagonists in the ''Star Trek'' fictional universe. They are Cyborg, cybernetic organisms (cyborgs) linked in a Group mind (science fiction), hive mind called "The Collective". The Borg co- ...
declarative process scheduling software
Google has developed several abstractions which it uses for storing most of its data:
*
Protocol Buffers
Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an ...
"Google's lingua franca for data", a binary serialization format which is widely used within the company.
*
SSTable (Sorted Strings Table) a persistent, ordered, immutable map from keys to values, where both keys and values are arbitrary byte strings. It is also used as one of the building blocks of Bigtable.
* RecordIO a sequence of variable sized records.
Software development practices
Most operations are read-only. When an update is required, queries are redirected to other servers, so as to simplify consistency issues. Queries are divided into sub-queries, where those sub-queries may be sent to different ducts in
parallel, thus reducing the latency time.
To lessen the effects of unavoidable
hardware failure, software is designed to be
fault tolerant. Thus, when a system goes down, data is still available on other servers, which increases reliability.
Search infrastructure
Index
Like most search engines, Google indexes documents by building a data structure known as
inverted index
In computer science, an inverted index (also referred to as a postings list, postings file, or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of d ...
. Such an index obtains a list of documents by a query word. The index is very large due to the number of documents stored in the servers.
The index is partitioned by document IDs into many pieces called
shards. Each shard is
replicated onto multiple servers. Initially, the index was being served from
hard disk drive
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...
s, as is done in traditional
information retrieval
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
(IR) systems. Google dealt with the increasing query volume by increasing number of replicas of each shard and thus increasing number of servers. Soon they found that they had enough servers to keep a copy of the whole index in main memory (although with low replication or no replication at all), and in early 2001 Google switched to an ''in-memory index'' system. This switch "radically changed many design parameters" of their search system, and allowed for a significant increase in throughput and a large decrease in latency of queries.
In June 2010, Google rolled out a next-generation indexing and serving system called "Caffeine" which can continuously crawl and update the search index. Previously, Google updated its search index in batches using a series of
MapReduce
MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel and distributed algorithm on a cluster.
A MapReduce program is composed of a ''map'' procedure, which performs filte ...
jobs. The index was separated into several layers, some of which were updated faster than the others, and the main layer wouldn't be updated for as long as two weeks. With Caffeine, the entire index is updated incrementally on a continuous basis. Later Google revealed a distributed data processing system called "Percolator" which is said to be the basis of Caffeine indexing system.
[The Register]
Google Caffeine jolts worldwide search machine
/ref>[The Register]
Google Percolator – global search jolt sans MapReduce comedown
/ref>
Server types
Google's server infrastructure is divided into several types, each assigned to a different purpose:
* Web servers coordinate the execution of queries sent by users, then format the result into an HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
page. The execution consists of sending queries to index servers, merging the results, computing their rank, retrieving a summary for each hit (using the document server), asking for suggestions from the spelling servers, and finally getting a list of advertisements from the ad server.
* Data-gathering servers are permanently dedicated to spidering the Web. Google's web crawler is known as GoogleBot. They update the index and document databases and apply Google's algorithms to assign ranks to pages.
* Each index server contains a set of index shards. They return a list of document IDs ("docid"), such that documents corresponding to a certain docid contain the query word. These servers need less disk space, but suffer the greatest CPU workload.
* Document servers store documents. Each document is stored on dozens of document servers. When performing a search, a document server returns a summary for the document based on query words. They can also fetch the complete document when asked. These servers need more disk space.
* Ad servers manage advertisements offered by services like AdWords
Google Ads, formerly known as Google Adwords, is an online advertising platform developed by Google, where advertisers bid to display brief advertisements, service offerings, product listings, and videos to web users. It can place ads in the res ...
and AdSense
Google AdSense is a program run by Google through which website publishers in the Google Network of content sites serve text, images, video, or interactive media advertisements that are targeted to the site content and audience. These adver ...
.
* Spelling servers make suggestions about the spelling of queries.
There are also "canary requests", whereby a request is first sent to one or two leaf servers to see if the response time is reasonable. If not, then the request fails. This provides security.
Security
In October 2013, ''The Washington Post
''The Washington Post'', locally known as ''The'' ''Post'' and, informally, ''WaPo'' or ''WP'', is an American daily newspaper published in Washington, D.C., the national capital. It is the most widely circulated newspaper in the Washington m ...
'' reported that the U.S. National Security Agency
The National Security Agency (NSA) is an intelligence agency of the United States Department of Defense, under the authority of the director of national intelligence (DNI). The NSA is responsible for global monitoring, collection, and proces ...
intercepted communications between Google's data centers, as part of a program named MUSCULAR
MUSCULAR (DS-200B), located in the United Kingdom, is the name of a surveillance program jointly operated by Britain's Government Communications Headquarters (GCHQ) and the U.S. National Security Agency (NSA) that was revealed by documents release ...
. This wiretapping was made possible because, at the time, Google did not encrypt data passed inside its own network. This was rectified when Google began encrypting data sent between data centers in 2013.
Environmental impact
Google's most efficient data center runs at using only fresh air cooling, requiring no electrically powered air conditioning.
In December 2016, Google announced that—starting in 2017—it would purchase enough renewable energy to match 100% of the energy usage of its data centers and offices. The commitment will make Google "the world's largest corporate buyer of renewable power, with commitments reaching 2.6 gigawatts (2,600 megawatts) of wind and solar energy".
References
Further reading
*
* Shankland, Stephen, CNET news
Google uncloaks once-secret server
." April 1, 2009.
External links
Web Search for a Planet: The Google Cluster Architecture
(Luiz André Barroso, Jeffrey Dean, Urs Hölzle)
{{Google LLC
Google buildings and structures
Data centers