Project Gutenberg (PG) is a
volunteer effort to
digitize and archive
cultural
Culture ( ) is a concept that encompasses the social behavior, institutions, and Social norm, norms found in human societies, as well as the knowledge, beliefs, arts, laws, Social norm, customs, capabilities, Attitude (psychology), attitudes ...
works, as well as to "encourage the creation and distribution of
eBooks."
[
] It was founded in 1971 by American writer
Michael S. Hart and is the oldest
digital library
A digital library (also called an online library, an internet library, a digital repository, a library without walls, or a digital collection) is an online database of digital resources that can include text, still images, audio, video, digital ...
.
Most of the items in its collection are the full texts of
book
A book is a structured presentation of recorded information, primarily verbal and graphical, through a medium. Originally physical, electronic books and audiobooks are now existent. Physical books are objects that contain printed material, ...
s or individual stories in the
public domain
The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
. All files can be accessed for free under an
open format layout, available on almost any computer. , Project Gutenberg had reached over 75,999 items in its collection of free eBooks.
The releases are available in
plain text
In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects ( floating-point numbers, images, etc.). It may also include a lim ...
as well as other formats, such as
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
,
PDF,
EPUB
EPUB is an e-book file format that uses the ".epub" file extension. The term is short for ''electronic publication'' and is sometimes stylized as ''ePUB''. EPUB is supported by many e-readers, and compatible software is available for most smart ...
,
MOBI, and
Plucker wherever possible. Most releases are in the
English language
English is a West Germanic language that developed in early medieval England and has since become a English as a lingua franca, global lingua franca. The namesake of the language is the Angles (tribe), Angles, one of the Germanic peoples th ...
, but many non-English works are also available. There are multiple affiliated projects that provide additional content, including region- and language-specific works. Project Gutenberg is closely affiliated with
Distributed Proofreaders, an Internet-based community for proofreading scanned texts.
Project Gutenberg is named after the inventor
Johannes Gutenberg
Johannes Gensfleisch zur Laden zum Gutenberg ( – 3 February 1468) was a German inventor and Artisan, craftsman who invented the movable type, movable-type printing press. Though movable type was already in use in East Asia, Gutenberg's inven ...
, whose works in developing printing technology led to an increase in the mass availability of books and other text.
History
Michael S. Hart began Project Gutenberg in 1971 with the digitization of the
United States Declaration of Independence. Hart, a student at the
University of Illinois, obtained access to a
Xerox Sigma V mainframe computer
A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterprise ...
in the university's Materials Research Lab. Through friendly operators, he received an account with a virtually unlimited amount of
computer time; its value at that time has since been variously estimated at $100,000 or $100,000,000. Hart explained he wanted to "give back" this gift by doing something one could consider to be of great value. His initial goal was to make the 10,000 most consulted books available to the public at little or no charge by the end of the 20th century.
This particular computer was one of the 15
nodes on
ARPANET
The Advanced Research Projects Agency Network (ARPANET) was the first wide-area packet-switched network with distributed control and one of the first computer networks to implement the TCP/IP protocol suite. Both technologies became the tec ...
, the computer network that would become the
Internet
The Internet (or internet) is the Global network, global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a internetworking, network of networks ...
. Hart believed one day the general public would be able to access computers and decided to make works of literature available in electronic form for free. He used a copy of the United States Declaration of Independence in his backpack, and this became the first Project Gutenberg
e-text. He named the project for
Johannes Gutenberg
Johannes Gensfleisch zur Laden zum Gutenberg ( – 3 February 1468) was a German inventor and Artisan, craftsman who invented the movable type, movable-type printing press. Though movable type was already in use in East Asia, Gutenberg's inven ...
, the fifteenth century German printer who propelled the
movable type
Movable type (US English; moveable type in British English) is the system and technology of printing and typography that uses movable Sort (typesetting), components to reproduce the elements of a document (usually individual alphanumeric charac ...
printing press
A printing press is a mechanical device for applying pressure to an inked surface resting upon a printing, print medium (such as paper or cloth), thereby transferring the ink. It marked a dramatic improvement on earlier printing methods in whi ...
revolution.
By the mid-1990s, Hart was running Project Gutenberg from
Illinois Benedictine College. More volunteers had joined the effort. He manually entered all of the text until 1989 when
image scanners and
optical character recognition
Optical character recognition or optical character reader (OCR) is the electronics, electronic or machine, mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo ...
software improved and became more available, making
book scanning more feasible. Hart later came to an arrangement with
Carnegie Mellon University
Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania, United States. The institution was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools. In 1912, it became the Carnegie Institu ...
, which agreed to administer Project Gutenberg's finances. As the volume of e-texts increased, volunteers began to take over the project's day-to-day operations that Hart had run.
Italian volunteer Pietro Di Miceli developed and administered the first Project Gutenberg website and started the development of the Project online Catalog. In his ten years in this role (1994–2004), the Project web pages won a number of awards, often being featured in "best of the Web" listings, contributing to the project's popularity.
Starting in 2004, an improved online catalog made Project Gutenberg content easier to browse, access and
hyperlink
In computing, a hyperlink, or simply a link, is a digital reference providing direct access to Data (computing), data by a user (computing), user's point and click, clicking or touchscreen, tapping. A hyperlink points to a whole document or to ...
. Project Gutenberg is now hosted by
ibiblio at the
University of North Carolina at Chapel Hill.
Hart died on 6 September 2011 at his home in Urbana, Illinois, at the age of 64.
CD and DVD project
In August 2003, Project Gutenberg created a
CD containing approximately 600 of the "best" e-books from the collection. The CD was available for download as an
ISO image. When users were unable to download the CD, they could request to have a copy sent to them, free of charge.
In December 2003, a
DVD was created containing nearly 10,000 items. At the time, this represented almost the entire collection. In early 2004, the DVD also became available by mail.
In July 2007, a new edition of the DVD was released containing over 17,000 books, and in April 2010, a dual-layer DVD was released, containing nearly 30,000 items.
The majority of the DVDs, and all of the CDs mailed by the project, were recorded on recordable media by volunteers. However, the new dual layer DVDs were manufactured, as it proved more economical than having volunteers burn them. , the project has mailed approximately 40,000 discs. As of 2017, the delivery of free CDs had been discontinued, though the ISO image was still available for download.
Scope of collection

, Project Gutenberg claimed over 75,999 items in its collection, with an average of over 30 new
e-book
An ebook (short for electronic book), also spelled as e-book or eBook, is a book publication made available in electronic form, consisting of text, images, or both, readable on the flat-panel display of computers or other electronic devices. Al ...
s being added each week. These are primarily works of
literature
Literature is any collection of Writing, written work, but it is also used more narrowly for writings specifically considered to be an art form, especially novels, Play (theatre), plays, and poetry, poems. It includes both print and Electroni ...
from the
Western cultural tradition. In addition to literature such as novels, poetry, short stories and drama, Project Gutenberg also has
cookbooks,
reference work
A reference work is a document, such as a Academic publishing#Scholarly paper, paper, book or periodical literature, periodical (or their electronic publishing, electronic equivalents), to which one can refer for information. The information ...
s and issues of periodicals. The Project Gutenberg collection also has a few non-text items such as audio files, movies, and music-notation files. Most releases are in English, but there are also significant numbers in many other languages.
Whenever possible, Gutenberg releases are available in
plain text
In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects ( floating-point numbers, images, etc.). It may also include a lim ...
, mainly using
US-ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
character encoding
Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
but frequently extended to
ISO-8859-1
ISO/IEC 8859-1:1998, ''Information technology—8-bit computing, 8-bit single-byte coded graphic character (computing), character sets—Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character enc ...
(needed to represent accented characters in French and
Scharfes s in German, for example). Besides being copyright-free, the requirement for a
Latin
Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
(
character set) text version of the release had been a criterion of Michael Hart's since the founding of Project Gutenberg, as he believed it was the format most likely to be readable in the extended future. Out of necessity, this criterion has had to be extended further for the sizable collection of texts in East Asian languages such as Chinese and Japanese now in the collection, where
UTF-8 is used instead.
Other formats may be released as well when submitted by volunteers. The most common non-ASCII format is
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
, which allows markup and illustrations to be included. Some project members and users have requested more advanced formats, believing them to be easier to read. But some formats that are not easily editable, such as
PDF, are generally not considered to fit with the goals of Project Gutenberg. Also Project Gutenberg has two options for master formats that can be submitted (from which all other files are generated): customized versions of the
Text Encoding Initiative standard (since 2005) and
reStructuredText (since 2011).
Beginning in 2009, the Project Gutenberg catalog began offering auto-generated alternate file formats, including HTML (when not already provided),
EPUB
EPUB is an e-book file format that uses the ".epub" file extension. The term is short for ''electronic publication'' and is sometimes stylized as ''ePUB''. EPUB is supported by many e-readers, and compatible software is available for most smart ...
and
plucker.
Ideals
Michael Hart said in 2004, "The mission of Project Gutenberg is simple: 'To encourage the creation and distribution of ebooks.
His goal was "to provide as many e-books in as many formats as possible for the entire world to read in as many languages as possible".
Likewise, a project slogan is to "break down the bars of ignorance and illiteracy", because its volunteers aim to continue spreading public
literacy and appreciation for the literary heritage just as
public libraries began to do in the late 19th century.
Project Gutenberg is intentionally decentralized; there is no selection policy dictating what texts to add. Instead, individual volunteers work on what they are interested in or have available. The Project Gutenberg collection is intended to preserve items for the long term, so they cannot be lost by any one localized accident. In an effort to ensure this, the entire collection is backed-up regularly and
mirrored on servers in many different locations.
Copyright
Project Gutenberg is careful to verify the status of its eBooks according to
United States copyright law. Material is added to the Project Gutenberg archive only after it has received a copyright clearance, and records of these clearances are saved for future reference. Project Gutenberg does not claim new copyright on titles it publishes. Instead, it encourages their free reproduction and distribution.
[
Most books in the Project Gutenberg collection are distributed as ]public domain
The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
under United States copyright law. There are also a few copyrighted texts, such as those of science fiction
Science fiction (often shortened to sci-fi or abbreviated SF) is a genre of speculative fiction that deals with imaginative and futuristic concepts. These concepts may include information technology and robotics, biological manipulations, space ...
author Cory Doctorow, that Project Gutenberg distributes with permission. These are subject to further restrictions as specified by the copyright holder, although they generally tend to be licensed under Creative Commons.
"Project Gutenberg" is a trademark
A trademark (also written trade mark or trade-mark) is a form of intellectual property that consists of a word, phrase, symbol, design, or a combination that identifies a Good (economics and accounting), product or Service (economics), service f ...
of the organization, and the mark cannot be used in commercial or modified redistributions of public domain texts from the project. There is no legal impediment to the reselling of works in the public domain if all references to Project Gutenberg are removed, but Gutenberg contributors have questioned the appropriateness of directly and commercially reusing content that has been formatted by volunteers. There have been instances of books being stripped of attribution to the project and sold for profit in the Kindle Store and other booksellers, one being the 1906 book ''Fox Trapping''.
From 2018 to 2021, the Project Gutenberg website was not accessible within Germany
Germany, officially the Federal Republic of Germany, is a country in Central Europe. It lies between the Baltic Sea and the North Sea to the north and the Alps to the south. Its sixteen States of Germany, constituent states have a total popu ...
, as a result of a court order from S. Fischer Verlag regarding the works of Heinrich Mann, Thomas Mann
Paul Thomas Mann ( , ; ; 6 June 1875 – 12 August 1955) was a German novelist, short story writer, social critic, philanthropist, essayist, and the 1929 Nobel Prize in Literature laureate. His highly symbolic and ironic epic novels and novell ...
and Alfred Döblin. Although they were in the public domain in the United States, the German court (Frankfurt am Main Regional Court) recognized the infringement of copyrights still active in Germany, and asserted that the Project Gutenberg website was under German jurisdiction because it hosts content in the German language and is accessible in Germany. This judgment was confirmed by the Frankfurt Court of Appeal on 30 April 2019 (11 U 27/18). The Frankfurt Court of Appeal has not given permission for a further appeal to the Federal Court of Justice (Bundesgerichtshof), however, an application for permission to appeal has been filed with the Federal Court of Justice. As of 4 October 2020 that application was still pending (Federal Court of Justice I ZR 97/19). According to Project Gutenberg Literary Archive Foundation, "In October 2021, the parties reached a settlement agreement. Under the terms of the agreement, Project Gutenberg eBooks by the three authors will be blocked from Germany until their German copyright expires. Under the terms of the settlement, the all-Germany block is no longer in place. Other terms of the settlement are confidential."
The Project Gutenberg website has been blocked in Italy
Italy, officially the Italian Republic, is a country in Southern Europe, Southern and Western Europe, Western Europe. It consists of Italian Peninsula, a peninsula that extends into the Mediterranean Sea, with the Alps on its northern land b ...
since May 2020, as part of a larger effort to block websites that publish newspapers and journals that are protected by copyright
A copyright is a type of intellectual property that gives its owner the exclusive legal right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, ...
in Italy.
Criticism
The text files use the format of plain text
In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects ( floating-point numbers, images, etc.). It may also include a lim ...
encoded in UTF-8 and are typically wrapped at 65–70 characters, with paragraphs separated by a double line break. In recent decades, the resulting appearance and the lack of a markup possibility have often been perceived as bland and as a drawback of this format. Project Gutenberg attempts to address this by making many texts available in HTML, ePub, and PDF versions as well. HTML versions of older texts are autogenerated versions. Another not-for-profit project, Standard Ebooks, aims to address these issues with its collection of public domain titles that are formatted and styled. It corrects issues related to design and typography.
In December 1994, Project Gutenberg was criticized by the Text Encoding Initiative for failing to include documentation or discussion of the decisions unavoidable in preparing a text, or in some cases, not documenting which of several (conflicting) versions of a text has been the one digitized.
The selection of works (and editions) available has been determined by popularity, ease of scanning, being out of copyright, and other factors; this would be difficult to avoid in any crowd-sourced project.
In March 2004, an initiative was begun by Michael Hart and John S. Guagliardo to provide low-cost intellectual properties. The initial name for this project was ''Project Gutenberg 2'' (PG II), which created controversy among PG volunteers because of the re-use of the project's trademarked name for a commercial venture.
Project Gutenberg Literary Archive Foundation
In 2000, a non-profit corporation, the Project Gutenberg Literary Archive Foundation, Inc. 501(c)(3)
A 501(c)(3) organization is a United States corporation, Trust (business), trust, unincorporated association or other type of organization exempt from federal income tax under section 501(c)(3) of Title 26 of the United States Code. It is one of ...
EIN: 64-6221541 was chartered in Mississippi
Mississippi ( ) is a U.S. state, state in the Southeastern United States, Southeastern and Deep South regions of the United States. It borders Tennessee to the north, Alabama to the east, the Gulf of Mexico to the south, Louisiana to the s ...
, United States
The United States of America (USA), also known as the United States (U.S.) or America, is a country primarily located in North America. It is a federal republic of 50 U.S. state, states and a federal capital district, Washington, D.C. The 48 ...
, to handle the project's legal needs. Donations to it are tax-deductible.
Gregory B. Newby, while assistant professor at UNC School of Information and Library Science, and a long-time Project Gutenberg volunteer, in 2001, became the foundation's first CEO, later Arctic Region Supercomputing Center Director, later Compute Canada's Chief Technology Officer.
Partners
* Project Gutenberg Consortia Center specializes in collections of collections. These do not have the editorial oversight or consistent formatting of the main Project Gutenberg. Thematic collections, as well as numerous languages, are featured. This is sponsored by ''worldlibrary.net'', which hosts ''self.gutenberg.org'', a self-publishing portal.
* ibiblio, at the University of North Carolina at Chapel Hill, now hosts ''Project Gutenberg''
* Distributed Proofreaders: In 2000, Charles Franks founded Distributed Proofreaders (DP), which allowed the proofreading of scanned texts to be distributed among many volunteers over the Internet. This effort increased the number and variety of texts being added to Project Gutenberg, as well as making it easier for new volunteers to start contributing. DP became officially affiliated with Project Gutenberg in 2002. , the 36,000+ DP-contributed books comprised almost two-thirds of the nearly books in Project Gutenberg.
Sister projects
All sister projects are independent organizations that share the same ideals and have been given permission to use the ''Project Gutenberg'' trademark. They often have a particular national or linguistic focus.
List of sister projects
* Project Gutenberg Australia hosts many texts that are public domain according to copyright law of Australia, but still under copyright (or of uncertain status) in the United States, with a focus on Australian writers and books about Australia.
* Project Gutenberg Canada. digital library for Canadian public domain texts.
* Projekt Gutenberg-DE claims copyright for its product and limits access to browsable web-versions of its texts.
* Project Gutenberg Europe is run by Project Rastko in Serbia
, image_flag = Flag of Serbia.svg
, national_motto =
, image_coat = Coat of arms of Serbia.svg
, national_anthem = ()
, image_map =
, map_caption = Location of Serbia (gree ...
. It aims at being a Project Gutenberg for all of Europe, and began posting projects in 2005. It uses the Distributed Proofreaders software to quickly produce etexts.
* Project Gutenberg Luxembourg publishes mostly, but not exclusively, books that are written in Luxembourgish.
* , started by Finnish Project Gutenberg volunteers, derives its name from the Finnish philologist
Philology () is the study of language in oral and written historical sources. It is the intersection of textual criticism, literary criticism, history, and linguistics with strong ties to etymology. Philology is also defined as the study of ...
Elias Lönnrot (1802–1884)
* Project Gutenberg of the Philippines aims to "make as many books available to as many people as possible, with a special focus on the Philippines and Philippine languages".
* Project Gutenberg Russia (Rutenberg) aims to collect public domain books in Slavic languages, particularly in Russian. The discussion of the project and its legal side began in April 2012. The word Rutenberg is a combination of words "Russia" and "Gutenberg".
* ''Project Gutenberg Self Publishing Portal'' also known as ''Project Gutenberg Self-Publishing Press'', by the ''Project Gutenberg Consortia Center'' Unlike the Gutenberg Project itself, Project Gutenberg Self-Publishing allows submission of texts never published before, including self-published ebooks. Launched in 2012, also owns the "gutenberg.us" domain.
* Project Gutenberg of Taiwan seeks to archive copyright free books with a special focus on Taiwan in English, Mandarin and Taiwan-based languages. It is a special project of Forumosa.com[
]
* Projekt Runeberg, Nordic literature
* ''ReadingRoo.ms'', the home of the ''Project Gutenberg PrePrints''
* Distributed Proofreaders Canada, a separate entity, launched in December 2007, by David Jones and Michael Shepard.
* Faded Page ''Distributed Proofreaders Canada'' public domain book archive
Affiliates
* The Internet Archive, previous long-time backup distribution site, and previous main host site.
* Librivox.org, new audiobooks main partner
See also
* Aozora Bunko
* Miguel de Cervantes Virtual Library
* Chinese Text Project
* Google Books
Google Books (previously known as Google Book Search, Google Print, and by its code-name Project Ocean) is a service from Google that searches the full text of books and magazines that Google has scanned, converted to text using optical charac ...
* HathiTrust
HathiTrust Digital Library is a large-scale collaborative repository of digital content from research libraries. Its holdings include content digitized via Google Books and the Internet Archive digitization initiatives, as well as content digit ...
* Internet Archive
The Internet Archive is an American 501(c)(3) organization, non-profit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized media including web ...
* LibriVox—free online audiobook library, with many texts used from Project Gutenberg
* List of digital library projects
* On-line Guitar Archive
The On-line Guitar Archive (OLGA) was the first Internet library of guitar and bass tablature, or "tabs". Born from a collection of guitarist internet-forum archives, it was a useful resource for musicians of all genres for over a decade.
History ...
* Open Content Alliance
* Project Runeberg, for books significant to the culture and history of the Nordic countries.
* Runivers, for Russian historical documents
* Sefaria, for Jewish texts
* Standard Ebooks
* Virtual volunteering
* Wikisource
Wikisource is an online wiki-based digital library of free-content source text, textual sources operated by the Wikimedia Foundation. Wikisource is the name of the project as a whole; it is also the name for each instance of that project, one f ...
References
Further reading
*
Marie Lebert
Project Gutenberg (1971-2009)
', 2009
Net des Etudes Francaises
University of Toronto
The University of Toronto (UToronto or U of T) is a public university, public research university whose main campus is located on the grounds that surround Queen's Park (Toronto), Queen's Park in Toronto, Ontario, Canada. It was founded by ...
) via: ''Project Gutenberg''
*
Marie Lebert
A History of EBooks
', 26 August 2009 ( http://www.etudes-francaises.net/ Net des Etudes Francaises">!-- http://barthes.enssib.fr/translatio/miroir-nef/index.html -->http://www.etudes-francaises.net/ Net des Etudes Francaises University of Toronto
The University of Toronto (UToronto or U of T) is a public university, public research university whose main campus is located on the grounds that surround Queen's Park (Toronto), Queen's Park in Toronto, Ontario, Canada. It was founded by ...
) via: ''Project Gutenberg''
External links
*
*
Distributed Proofreaders
nbsp;– a worldwide group of volunteer editors that is now the main source of eBooks for Project Gutenberg
*
Project Gutenberg News
nbsp;– Official News for Gutenberg.org. Includes th
Newsletter Archives
1989–present.
** Project Gutenberg Monthl
Newsletter
*
*
{{Authority control
1971 establishments in the United States
Accessible information
American book websites
American digital libraries
Benedictine University
Ebook suppliers
Gutenberg
Full-text scholarly online databases
Johannes Gutenberg
Mass digitization
MediaWiki websites
Multilingual websites
Open access projects
Organizations established in 1971
Public domain