HOME

TheInfoList



OR:

archive.today (or archive.is) is a web archiving site, founded in 2012, that saves snapshots on demand, and has support for
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, often ...
-heavy sites such as
Google Maps Google Maps is a web mapping platform and consumer application offered by Google. It offers satellite imagery, aerial photography, street maps, 360° interactive panoramic views of streets (Street View), real-time traffic conditions, and rout ...
and
progressive web app A progressive web application (PWA), commonly known as a progressive web app, is a type of application software delivered through the web, built using common web technologies including HTML, CSS, JavaScript, and WebAssembly. It is intended to wor ...
s such as
Twitter Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, an ...
. archive.today records two snapshots: one replicates the original webpage including any functional live links; the other is a
screenshot screenshot (also known as screen capture or screen grab) is a digital image that shows the contents of a computer display. A screenshot is created by the operating system or software running on the device powering the display. Additionally, s ...
of the page.


Features


Functionality

archive.today can capture individual pages in response to explicit user requests. Since its beginning, archive.today has supported crawling pages with
URL A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifi ...
s containing the now-deprecated hash-bang fragment (). archive.today records only text and images, excluding XML, RTF,
spreadsheet A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in ce ...
( xls or ods) and other non-static content. However, videos for certain sites, like
Twitter Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, an ...
, are saved. It keeps track of the history of snapshots saved, requesting confirmation before adding a new snapshot of an already saved page. Pages are captured at a browser width of 1,024 pixels. CSS is converted to inline CSS, removing responsive web design and selectors such as :hover and :active. Content generated using
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, often ...
during the crawling process appears in a frozen state.JavaScript-generated loading animation of
Dailymotion Dailymotion is a French video-sharing technology platform owned by Vivendi. North American launch partners included Vice Media, Bloomberg and Hearst Digital Media. It is among the earliest known platforms to support HD (720p) resolution video ...
vide
appearing in a frozen state
/ref> HTML class names are preserved inside the old-class attribute. When text is selected, a JavaScript applet generates a
URL fragment In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier p ...
seen in the browser's
address bar In a web browser, the address bar (also location bar or URL bar) is the element that shows the current URL. The user can type a URL into it to navigate to a chosen website. In most modern browsers, non-URLs are automatically sent to a search eng ...
that automatically highlights that portion of the text when visited again. Web pages cannot be duplicated from archive.today to web.archive.org as second-level backup, as archive.today places an exclusion for Wayback Machine and does not save its snapshots in WARC format. The reverse—from web.archive.org to archive.today—is possible, but the copy usually takes more time than a direct capture. Some web sites get deleted from
Internet Archive The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge". It provides free public access to collections of digitized materials, including websites, software applications/games, music, ...
's listings retroactively or blocked from being saved due to their robots.txt file, but archive.today does not use this. The research toolbar enables advanced keywords operators, using as the
wildcard character In software, a wildcard character is a kind of placeholder represented by a single character, such as an asterisk (), which can be interpreted as a number of literal characters or an empty string. It is often used in file searches so the full na ...
. A couple of
quotation mark Quotation marks (also known as quotes, quote marks, speech marks, inverted commas, or talking marks) are punctuation marks used in pairs in various writing systems to set off direct speech, a quotation, or a phrase. The pair consists of an ...
s address the search to an exact sequence of keywords present in the title or in the body of the webpage, whereas the ''insite'' operator restricts it to a specific Internet domain. Once a web page is archived, it cannot be deleted directly by any Internet user. Removing advertisements, popups or expanding links from archived pages is possible by asking the owner to do it on his blog. While saving a
dynamic list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whic ...
, archive.today searchbox shows only a result that links the previous and the following section of the list (e.g. 20 links for page). The other web pages saved are filtered, and sometimes may be found by one of their occurrences. The search feature is backed by Google CustomSearch. If it delivers no results, archive.today attempts to utilize
Yandex Search Yandex Search () is a search engine. It is owned by Yandex, based in Russia. In January 2015, Yandex Search generated 51.2% of all of the search traffic in Russia according to . About The search technology provides local search results in mor ...
. While saving a page, a list of URLs for individual page elements and their content sizes, HTTP statuses and
MIME type A media type (also known as a MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication ...
s is shown. This list can only be viewed during the crawling process. One can download archived pages as a ZIP file, except pages archived when archive.today changed their browser engine from PhantomJS to
Chromium Chromium is a chemical element with the symbol Cr and atomic number 24. It is the first element in group 6. It is a steely-grey, lustrous, hard, and brittle transition metal. Chromium metal is valued for its high corrosion resistance and hardne ...
. , archive.today supports the
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
of the
Memento Project Memento is a United States ''National Digital Information Infrastructure and Preservation Program ( NDIIPP)''–funded project aimed at making Web-archived content more readily discoverable and accessible to the public. Technical description ...
.


History

archive.today was founded in 2012. The site originally branded itself as archive.today, but in May 2015, changed the primary mirror to archive.is. In January 2019, it began to
deprecate In several fields, especially computing, deprecation is the discouragement of use of some terminology, feature, design, or practice, typically because it has been superseded or is no longer considered efficient or safe, without completely removin ...
the archive.is domain in favor of the archive.today mirror.


Worldwide availability


Australia

In March 2019, the site was blocked for six months by several Australian internet providers in the aftermath of the
Christchurch mosque shootings On 15 March 2019, two consecutive mass shootings occurred in a terrorist attack on two mosques in Christchurch, New Zealand. The attacks, carried out by a lone gunman who entered both mosques during Friday prayer, began at the Al Noor Mosque ...
in an attempt to limit distribution of the footage of the attack. It has since been unblocked.


China

According to
GreatFire GreatFire (GreatFire.org) is a website that monitors the status of websites censored by the Great Firewall of China and helps Chinese Internet users circumvent the censorship and blockage of websites in China. Through an investigation by the Asso ...
.org, archive.today has been blocked in China archive.li archive.fo as well as archive.ph


Finland

On 21 July 2015, the operators blocked access to the service from all Finnish
IP address An Internet Protocol address (IP address) is a numerical label such as that is connected to a computer network that uses the Internet Protocol for communication.. Updated by . An IP address serves two main functions: network interface ident ...
es, stating on Twitter that they did this in order to avoid escalating a dispute they allegedly had with the Finnish government. It has since been unblocked.


Russia

In
Russia Russia (, , ), or the Russian Federation, is a transcontinental country spanning Eastern Europe and Northern Asia. It is the largest country in the world, with its internationally recognised territory covering , and encompassing one-eight ...
, only HTTP access is possible; HTTPS connections are blocked.


Cloudflare DNS availability

, it has not been possible to reach the site when using Cloudflare's 1.1.1.1 DNS service. Cloudflare staff have stated that the problem is on the end of archive.today, as its authoritative nameservers return invalid records when queried from within Cloudflare's network because archive.today returns invalid data to DNS requests coming via Cloudflare's DNS servers. archive.today's reasoning for this is the fact Cloudflare does not send
EDNS Client Subnet EDNS Client Subnet (ECS) is an option in the Extension Mechanisms for DNS that allows a recursive DNS resolver to specify the subnetwork for the host or client on whose behalf it is making a DNS query. This is generally intended to help speed up t ...
information in its DNS requests.


See also

*
Digital preservation In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and ...
* List of Web archiving initiatives *
Link rot Link rot (also called link death, link breaking, or reference rot) is the phenomenon of hyperlinks tending over time to cease to point to their originally targeted file, web page, or server due to that resource being relocated to a new address o ...
*
Perma.cc Perma.cc is a web archiving service for legal and academic citations founded by the Harvard Library Innovation Lab in 2013. Concept Perma.cc was created in response to studies showing high incidences of link rot in both academic publications an ...
*
Wayback Machine The Wayback Machine is a digital archive of the World Wide Web founded by the Internet Archive, a nonprofit based in San Francisco, California. Created in 1996 and launched to the public in 2001, it allows the user to go "back in time" and se ...
*
Web archiving Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated captur ...
*
WebCite WebCite was an on-demand archive site, designed to digitally preserve scientific and educationally important material on the web by taking snapshots of Internet contents as they existed at the time when a blogger or a scholar cited or quoted f ...


References


External links

* * {{Authority control History of the Internet Internet properties established in 2012 Tor onion services Web archiving initiatives