End Of Term Web Archive
   HOME

TheInfoList



OR:

The End of Term
Web Archive The WARC (Web ARChive) archive format specifies a method for combining multiple digital resources into an aggregate archive file together with related information. These combined resources are saved as a WARC computer file, file which can be rep ...
is an archival project that preserves U.S. federal government websites during administration changes.


Background

The End of Term Web Archive was set up following a 2008 announcement from National Archives and Records Administration (
NARA The National Archives and Records Administration (NARA) is an independent agency of the United States government within the executive branch, charged with the preservation and documentation of government and historical records. It is also task ...
) that they would not be archiving government websites during transition, after carrying out such crawls in 2000 and 2004. The 2004 federal web harvest can be accessed alongside congressional web harvests, beginning with the
109th United States Congress The 109th United States Congress was a meeting of the legislative branch of the United States federal government, composed of the United States Senate and the United States House of Representatives, from January 3, 2005, to January 3, 2007, du ...
, a
National Archives
The first project partners were the
Library of Congress The Library of Congress (LOC) is a research library in Washington, D.C., serving as the library and research service for the United States Congress and the ''de facto'' national library of the United States. It also administers Copyright law o ...
,
George Washington University The George Washington University (GW or GWU) is a Private university, private University charter#Federal, federally-chartered research university in Washington, D.C., United States. Originally named Columbian College, it was chartered in 1821 by ...
Libraries,
Stanford University Libraries The Stanford University Libraries (SUL), formerly known as "Stanford University Libraries and Academic Information Resources" ("SULAIR"), is the library system of Stanford University in California. It encompasses more than 24 libraries in all. S ...
,
University of North Texas Libraries The University of North Texas Libraries is an American academic research library system that serves the constituent colleges and schools of University of North Texas in Denton. The phrase "University of North Texas Libraries" encompasses thre ...
, the
US Government Publishing Office The United States Government Publishing Office (USGPO or GPO), formerly the United States Government Printing Office, is an agency of the Legislature, legislative branch of the Federal government of the United States, United States federal gove ...
,
California Digital Library The California Digital Library (CDL) was founded by the University of California in 1997. Under the leadership of then UC President Richard C. Atkinson, the CDL's original mission was to forge a better system for scholarly information management ...
and the
Internet Archive The Internet Archive is an American 501(c)(3) organization, non-profit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized media including web ...
, all members of the
International Internet Preservation Consortium The International Internet Preservation Consortium is an international organization of libraries and other organizations established to coordinate efforts to preserve internet content for the future. It was founded in July 2003 by 12 participat ...
. The project was initially sketched out after a General Assembly of the IIPC in 2008. NARA and the Environmental Data & Governance Initiative
EDGI
joined the 2020/21 project.


The project

The project archives websites and documents for public access and research use. Data from archiving 2008, 2012, 2016, and 2020 End of Term datasets can be downloaded in bulk. As of February 2025, the 2004 datasets are still being inventoried and there is plan to move a copy of all datasets into Amazon Web Services. A UNT study into the risk to document files found that 83% of
PDF Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
s on the .gov domain in 2008 were missing four years later. This is consistent with the requirement to manage websites, but their status means that changes may be of interest to the public and watchdog groups. Evidence of the demand for continued access to historical web material can be found in an announcement made by the EPA in response to concerns about changes in 2017, stating that pages from the previous administration would be carefully archived. These
snapshot Snapshot, snapshots or snap shot may refer to: * Snapshot (photography), a photograph taken without preparation Computing * Snapshot (computer storage), the state of a system at a particular point in time * Snapshot (file format) or SNP, a file ...
pages were clearly marked to distinguish them from contemporary content. The archive prioritizes sites administering areas regarded as likely to be updated or removed over the period of transition. The public are encouraged to nominate important sites and these are combined with broad crawls of government domains to create the collection. Although it is extensive - the 2016 crawl preserved 11,382 sites - it stops short of being comprehensive. Researchers have used these collections to examine the history of climate change policy and reuse of suspended U.S. government Twitter accounts. The 2024 crawl began in January 2024, with
URL Nomination Tool
developed by the University of North Texas.


See also

*
International Internet Preservation Consortium The International Internet Preservation Consortium is an international organization of libraries and other organizations established to coordinate efforts to preserve internet content for the future. It was founded in July 2003 by 12 participat ...
*
List of Web archiving initiatives This article contains a list of Web archiving initiatives worldwide. For easier reading, the information is divided in three tables: web archiving initiatives, archived data, and access methods. Some of these initiatives may or may not make u ...


References


External links

* * * Web archiving initiatives Online archives of the United States 2008 establishments in the United States Libraries established in 2008 United States presidential transitions E-government in the United States Government-owned websites of the United States {{United States presidential transitions