
PetaBox, also stylized Petabox, is a storage unit from Capricorn Technologies and the
Internet Archive
The Internet Archive is an American 501(c)(3) organization, non-profit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized media including web ...
. It was designed by the staff of the Internet Archive and C. R. Saikley to store and process one
petabyte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
(a million gigabytes) of information.
Specifications
* Density: 1.4 petabytes/rack
* Power consumption: 3 kW/petabyte
* No air conditioning, instead uses excess heat to help heat the building
Design
Design goals of the Petabox included:
* Low power: 6 kW per rack, 60 kW for the entire storage cluster
* High density: 100+ TB/
rack
* Local computing to process the data (800 low-end PCs)
* Multi-OS possible,
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
standard
*
Colocation friendly
* Shipping container friendly: able to be run in a 20' by 8' by 8'
shipping container
A shipping container is a container with strength suitable to withstand shipment, storage, and handling. Shipping containers range from large reusable steel boxes used for intermodal shipments to the ubiquitous corrugated box design, corrugated b ...
* Easy maintenance: one
system administrator
An IT administrator, system administrator, sysadmin, or admin is a person who is responsible for the upkeep, configuration, and reliable operation of computer systems, especially multi-user computers, such as Server (computing), servers. The ...
per petabyte
* Software to automate full
mirroring
Mirroring is the behavior in which one person subconsciously imitates the gesture, idiolect, speech pattern, or attitude of another. Mirroring often occurs in social situations, particularly in the company of close friends or family, often going ...
* Easy to scale
* Inexpensive design and storage
History
The first 100 terabyte rack became operational in Amsterdam at the Internet Archive's European arm, the Stichting Internet Archive (SIA), in June 2004. The second 80 terabyte rack became operational in their main San Francisco location that same year. The Internet Archive then spun off its Petabox production to the newly-formed company Capricorn Technologies.
Between 2004 and 2007, Capricorn replicated the Internet Archive's deployment of the Petabox for major
academic institution
An academic institution is an educational institution dedicated to education and research, which grants academic degrees. See also academy and university.
Types
* Primary schools – (from French ''école primaire'') institutions where childre ...
s, digital preservationists, government agencies,
high-performance computing
High-performance computing (HPC) is the use of supercomputers and computer clusters to solve advanced computation problems.
Overview
HPC integrates systems administration (including network and security knowledge) and parallel programming into ...
(HPC) and major research sites,
medical imaging
Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology). Medical imaging seeks to revea ...
providers,
digital image repositories,
storage outsourcing sites, and other enterprises. Their largest product uses 750 gigabyte disks. In 2007, the Internet Archive data center housed approximately three petabytes of Petabox storage technology.
In 2010, the fourth version of the Petabox began operation. Each Petabox allowed for 480 TB of raw storage (240 disks of 2 TB each, set up with 24 disks per 4U high rack units and with 10 units per rack) running on
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
.
As of December 2021, the Internet Archive's Petabox storage system consists of four data centers, 745 nodes, and 28,000 spinning disks. The
Wayback Machine
The Wayback Machine is a digital archive of the World Wide Web founded by Internet Archive, an American nonprofit organization based in San Francisco, California. Launched for public access in 2001, the service allows users to go "back in ...
contains 57 petabytes of information; book, music and video collections contain an extra 42 petabytes of information, totaling 99 petabytes of "unique data".
Due to content being backed up in multiple locations, a total of 212 petabytes is used.
References
External links
Petabox overview on the Internet Archive
{{DEFAULTSORT:PetaBox
Computer enclosure
Data storage servers
Internet Archive projects