HOME

TheInfoList



OR:

archive.today (formerly archive.is) is a
web archiving Web archiving is the process of collecting, preserving, and providing access to material from the World Wide Web. The aim is to ensure that information is preserved in an archival format for research and the public. Web archivists typically ...
website that saves snapshots on demand. It has support for
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
-heavy sites such as
Google Maps Google Maps is a web mapping platform and consumer application offered by Google. It offers satellite imagery, aerial photography, street maps, 360° interactive panorama, interactive panoramic views of streets (Google Street View, Street View ...
and
Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
. Archive.today records two snapshots: one replicates the original webpage including any functional live links; the other is a
screenshot A screenshot (also known as screen capture or screen grab) is an analog or digital image that shows the contents of a computer display. A screenshot is created by a (film) camera shooting the screen or the operating system An operating sys ...
of the page.


History

Archive.today was founded in 2012. The site originally branded itself as archive.today, but changed the primary
mirror A mirror, also known as a looking glass, is an object that Reflection (physics), reflects an image. Light that bounces off a mirror forms an image of whatever is in front of it, which is then focused through the lens of the eye or a camera ...
to archive.is in May 2015. It began to deprecate the archive.is domain in favor of other mirrors in January 2019. In 2021, archive.today had saved about 500 million pages.


Features

Archive.today can capture individual pages in response to explicit user requests. Since its beginning, it has supported crawling pages with
URL A uniform resource locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identi ...
s containing the now-deprecated hash-bang fragment (). Archive.today records only text and images, excluding
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
, RTF,
spreadsheet A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in c ...
( xls or ods) and other non-static content. However, videos for certain sites, like
X (formerly Twitter) Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, imag ...
, are saved. It keeps track of the history of snapshots saved, requesting confirmation before adding a new snapshot of an already saved page. Pages are captured at a browser width of 1,024 pixels. CSS is converted to inline CSS, removing
responsive web design Responsive web design (RWD) or responsive design is an approach to web design that aims to make web pages render well on a variety of devices and window or screen sizes from minimum to maximum display size to ensure usability and satisfactio ...
and selectors such as :hover and :active. Content generated using
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
during the crawling process appears in a frozen state.JavaScript-generated loading animation of
Dailymotion Dailymotion is a French online video platform, online video sharing platform owned by Canal+ S.A., Canal+. Prior to 2024, the company was owned by Vivendi. North American launch partners included Vice Media, Bloomberg L.P., Bloomberg, and Hears ...
vide
appearing in a frozen state
/ref> HTML class names are preserved inside the old-class attribute. When text is selected, a JavaScript applet generates a URL fragment seen in the browser's
address bar In a web browser, the address bar (also location bar or URL bar) is the element that shows the current URL. The user can type a URL into it to navigate to a chosen website. In most modern browsers, non-URLs are automatically sent to a search eng ...
that automatically highlights that portion of the text when visited again. Web pages can be duplicated from archive.today to web.archive.org as second-level backup, but archive.today does not save its snapshots in WARC format. The reverse—from web.archive.org to archive.today—is also possible, but the copy usually takes more time than a direct capture. Historically, website owners had the option to opt out of
Wayback Machine The Wayback Machine is a digital archive of the World Wide Web founded by Internet Archive, an American nonprofit organization based in San Francisco, California. Launched for public access in 2001, the service allows users to go "back in ...
through the use of the
robots exclusion standard robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit. The standard, dev ...
(robots.txt), and these exclusions were also applied retroactively. Archive.today does not obey robots.txt because it acts "as a direct agent of the human user." As of 2019, the
Wayback Machine The Wayback Machine is a digital archive of the World Wide Web founded by Internet Archive, an American nonprofit organization based in San Francisco, California. Launched for public access in 2001, the service allows users to go "back in ...
also no longer obeys robots.txt. The research toolbar enables advanced keywords operators, using as the
wildcard character In software, a wildcard character is a kind of placeholder represented by a single character (computing), character, such as an asterisk (), which can be interpreted as a number of literal characters or an empty string. It is often used in file ...
. A couple of
quotation mark Quotation marks are punctuation marks used in pairs in various writing systems to identify direct speech, a quotation, or a phrase. The pair consists of an opening quotation mark and a closing quotation mark, which may or may not be the sam ...
s address the search to an exact sequence of keywords present in the title or in the body of the webpage, whereas the ''insite'' operator restricts it to a specific Internet domain. Once a web page is archived, it cannot be deleted directly by any Internet user. Removing advertisements, popups or expanding links from archived pages is possible by asking the owner to do it on his blog. While saving a dynamic list, archive.today search box shows only a result that links the previous and the following section of the list (e.g. 20 links for page). The other web pages saved are filtered, and sometimes may be found by one of their occurrences. The search feature is backed by Google CustomSearch. If it delivers no results, archive.today attempts to utilize
Yandex Search Yandex Search () is a search engine owned by the company Yandex, based in Russia. In January 2015, Yandex Search generated 51.2% of all of the search traffic in Russia according to . In February 2024, Yandex N.V. announced the sale of the majo ...
. While saving a page, a list of URLs for individual page elements and their content sizes, HTTP statuses and
MIME type In information and communications technology, a media type, content type or MIME type is a two-part identifier for file formats and content formats. Their purpose is comparable to filename extensions and uniform type identifiers, in that they ide ...
s is shown. This list can only be viewed during the crawling process. Users can download archived pages as a ZIP file, except pages archived when archive.today changed their browser engine from PhantomJS to
Chromium Chromium is a chemical element; it has Symbol (chemistry), symbol Cr and atomic number 24. It is the first element in Group 6 element, group 6. It is a steely-grey, Luster (mineralogy), lustrous, hard, and brittle transition metal. Chromium ...
(non-headless). In July 2013, Archive.today began supporting the
API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
of the
Memento Project Memento is a United States ''National Digital Information Infrastructure and Preservation Program (NDIIPP)''–funded project aimed at making Web archiving, Web-archived content more readily discoverable and accessible to the public. Technical ...
.


Worldwide availability


Australia and New Zealand

In March 2019, the site was blocked for six months by several internet providers in
Australia Australia, officially the Commonwealth of Australia, is a country comprising mainland Australia, the mainland of the Australia (continent), Australian continent, the island of Tasmania and list of islands of Australia, numerous smaller isl ...
and
New Zealand New Zealand () is an island country in the southwestern Pacific Ocean. It consists of two main landmasses—the North Island () and the South Island ()—and List of islands of New Zealand, over 600 smaller islands. It is the List of isla ...
in the aftermath of the Christchurch mosque shootings in an attempt to limit distribution of the footage of the attack.


China

According to GreatFire.org, archive.today has been blocked in mainland China archive.li archive.fo as well as archive.ph


Finland

On 21 July 2015, the operators blocked access to the service from all Finnish
IP address An Internet Protocol address (IP address) is a numerical label such as that is assigned to a device connected to a computer network that uses the Internet Protocol for communication. IP addresses serve two main functions: network interface i ...
es, stating on Twitter that they did this in order to avoid escalating a dispute they allegedly had with the Finnish government.


Russia

In 2016, the Russian communications agency
Roskomnadzor The Federal Service for Supervision of Communications, Information Technology and Mass Media, abbreviated as ''Roskomnadzor'' (RKN), is the Russian federal executive agency responsible for monitoring, controlling and censoring Russian mass media. ...
began blocking access to archive.is from Russia.


Cloudflare DNS availability

Since May 2018
Cloudflare Cloudflare, Inc., is an American company that provides content delivery network services, cybersecurity, DDoS mitigation, wide area network services, reverse proxies, Domain Name Service, ICANN-accredited domain registration, and other se ...
's 1.1.1.1 DNS service would not resolve archive.today's web addresses, making it inaccessible to users of the Cloudflare DNS service. Both organizations claimed the other was responsible for the issue. Cloudflare staff stated that the problem was on archive.today's DNS infrastructure, as its authoritative nameservers return invalid records when Cloudflare's network systems made requests to archive.today. archive.today countered that the issue was due to Cloudflare requests not being compliant with DNS standards, as Cloudflare does not send EDNS Client Subnet information in its DNS requests.


See also

*
Digital preservation In library science, library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and appli ...
*
List of Web archiving initiatives This article contains a list of Web archiving initiatives worldwide. For easier reading, the information is divided in three tables: web archiving initiatives, archived data, and access methods. Some of these initiatives may or may not make u ...
*
Link rot Link rot (also called link death, link breaking, or reference rot) is the phenomenon of hyperlinks tending over time to cease to point to their originally targeted file, web page, or server due to that resource being relocated to a new address ...
*
Perma.cc Perma.cc is a web archiving service for legal and academic citations founded by the Harvard Library Innovation Lab in 2013. Concept Perma.cc was created in response to studies showing high incidences of link rot in both academic publications an ...
*
Wayback Machine The Wayback Machine is a digital archive of the World Wide Web founded by Internet Archive, an American nonprofit organization based in San Francisco, California. Launched for public access in 2001, the service allows users to go "back in ...
*
Web archiving Web archiving is the process of collecting, preserving, and providing access to material from the World Wide Web. The aim is to ensure that information is preserved in an archival format for research and the public. Web archivists typically ...
*
WebCite WebCite is an intermittently available archive site, originally designed to digitally preserve scientific and educationally important material on the web by taking snapshots of Internet contents as they existed at the time when a blogger or ...


References


External links

*
FAQ
at Archive.today
archive.today
at Archive Team
wiki A wiki ( ) is a form of hypertext publication on the internet which is collaboratively edited and managed by its audience directly through a web browser. A typical wiki contains multiple pages that can either be edited by the public or l ...

"archive.today: On the trail of the mysterious guerrilla archivist of the Internet"
''Gyrovague'', 5 August 2023 {{Authority control History of the Internet Internet properties established in 2012 Tor onion services Web archiving initiatives