archive.today (formerly archive.is) is a
web archiving
Web archiving is the process of collecting, preserving, and providing access to material from the World Wide Web. The aim is to ensure that information is preserved in an archival format for research and the public.
Web archivists typically ...
website that saves
snapshots on demand. It has support for
JavaScript
JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior.
Web browsers have ...
-heavy sites such as
Google Maps
Google Maps is a web mapping platform and consumer application offered by Google. It offers satellite imagery, aerial photography, street maps, 360° interactive panorama, interactive panoramic views of streets (Google Street View, Street View ...
and
Twitter
Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
.
Archive.today records two snapshots: one replicates the original webpage including any functional live links; the other is a
screenshot
A screenshot (also known as screen capture or screen grab) is an analog or digital image that shows the contents of a computer display. A screenshot is created by a (film) camera shooting the screen or the operating system
An operating sys ...
of the page.
History
Archive.today was founded in 2012. The site originally branded itself as archive.today, but changed the primary
mirror
A mirror, also known as a looking glass, is an object that Reflection (physics), reflects an image. Light that bounces off a mirror forms an image of whatever is in front of it, which is then focused through the lens of the eye or a camera ...
to archive.is in May 2015. It began to deprecate the archive.is domain in favor of other mirrors in January 2019.
In 2021, archive.today had saved about 500 million pages.
Features
Archive.today can capture individual pages in response to explicit user requests.
Since its beginning, it has supported
crawling pages with
URL
A uniform resource locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identi ...
s containing the now-deprecated
hash-bang fragment ().
Archive.today records only text and images, excluding
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
,
RTF,
spreadsheet
A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in c ...
(
xls or
ods) and other
non-static content. However, videos for certain sites, like
X (formerly Twitter)
Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, imag ...
, are saved. It keeps track of the history of snapshots saved, requesting confirmation before adding a new snapshot of an already saved page.
Pages are captured at a browser width of 1,024 pixels.
CSS is converted to
inline CSS, removing
responsive web design
Responsive web design (RWD) or responsive design is an approach to web design that aims to make web pages render well on a variety of devices and window or screen sizes from minimum to maximum display size to ensure usability and satisfactio ...
and selectors such as
:hover
and
:active
. Content generated using
JavaScript
JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior.
Web browsers have ...
during the crawling process appears in a frozen state.
[JavaScript-generated loading animation of ]Dailymotion
Dailymotion is a French online video platform, online video sharing platform owned by Canal+ S.A., Canal+. Prior to 2024, the company was owned by Vivendi. North American launch partners included Vice Media, Bloomberg L.P., Bloomberg, and Hears ...
vide
appearing in a frozen state
/ref>
HTML class names are preserved inside the old-class
attribute.
When text is selected, a JavaScript applet generates a URL fragment seen in the browser's address bar
In a web browser, the address bar (also location bar or URL bar) is the element that shows the current URL. The user can type a URL into it to navigate to a chosen website. In most modern browsers, non-URLs are automatically sent to a search eng ...
that automatically highlights that portion of the text when visited again.
Web pages can be duplicated from archive.today to web.archive.org as second-level backup, but archive.today does not save its snapshots in WARC format. The reverse—from web.archive.org to archive.today—is also possible, but the copy usually takes more time than a direct capture. Historically, website owners had the option to opt out of Wayback Machine
The Wayback Machine is a digital archive of the World Wide Web founded by Internet Archive, an American nonprofit organization based in San Francisco, California. Launched for public access in 2001, the service allows users to go "back in ...
through the use of the robots exclusion standard
robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.
The standard, dev ...
(robots.txt), and these exclusions were also applied retroactively. Archive.today does not obey robots.txt because it acts "as a direct agent of the human user."[ As of 2019, the ]Wayback Machine
The Wayback Machine is a digital archive of the World Wide Web founded by Internet Archive, an American nonprofit organization based in San Francisco, California. Launched for public access in 2001, the service allows users to go "back in ...
also no longer obeys robots.txt.
The research toolbar enables advanced keywords operators, using as the wildcard character
In software, a wildcard character is a kind of placeholder represented by a single character (computing), character, such as an asterisk (), which can be interpreted as a number of literal characters or an empty string. It is often used in file ...
. A couple of quotation mark
Quotation marks are punctuation marks used in pairs in various writing systems to identify direct speech, a quotation, or a phrase. The pair consists of an opening quotation mark and a closing quotation mark, which may or may not be the sam ...
s address the search to an exact sequence of keywords present in the title or in the body of the webpage, whereas the ''insite'' operator restricts it to a specific Internet domain.
Once a web page is archived, it cannot be deleted directly by any Internet user.
Removing advertisements, popups or expanding links from archived pages is possible by asking the owner to do it on his blog.
While saving a dynamic list, archive.today search box shows only a result that links the previous and the following section of the list (e.g. 20 links for page). The other web pages saved are filtered, and sometimes may be found by one of their occurrences.
The search feature is backed by Google CustomSearch. If it delivers no results, archive.today attempts to utilize Yandex Search
Yandex Search () is a search engine owned by the company Yandex, based in Russia. In January 2015, Yandex Search generated 51.2% of all of the search traffic in Russia according to .
In February 2024, Yandex N.V. announced the sale of the majo ...
.
While saving a page, a list of URLs for individual page elements and their content sizes, HTTP statuses and MIME type
In information and communications technology, a media type, content type or MIME type is a two-part identifier for file formats and content formats. Their purpose is comparable to filename extensions and uniform type identifiers, in that they ide ...
s is shown. This list can only be viewed during the crawling process.
Users can download archived pages as a ZIP file, except pages archived when archive.today changed their browser engine from PhantomJS to Chromium
Chromium is a chemical element; it has Symbol (chemistry), symbol Cr and atomic number 24. It is the first element in Group 6 element, group 6. It is a steely-grey, Luster (mineralogy), lustrous, hard, and brittle transition metal.
Chromium ...
(non-headless).
In July 2013, Archive.today began supporting the API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
of the Memento Project
Memento is a United States ''National Digital Information Infrastructure and Preservation Program (NDIIPP)''–funded project aimed at making Web archiving, Web-archived content more readily discoverable and accessible to the public.
Technical ...
.
Worldwide availability
Australia and New Zealand
In March 2019, the site was blocked for six months by several internet providers in Australia
Australia, officially the Commonwealth of Australia, is a country comprising mainland Australia, the mainland of the Australia (continent), Australian continent, the island of Tasmania and list of islands of Australia, numerous smaller isl ...
and New Zealand
New Zealand () is an island country in the southwestern Pacific Ocean. It consists of two main landmasses—the North Island () and the South Island ()—and List of islands of New Zealand, over 600 smaller islands. It is the List of isla ...
in the aftermath of the Christchurch mosque shootings in an attempt to limit distribution of the footage of the attack.
China
According to GreatFire.org, archive.today has been blocked in mainland China archive.li archive.fo as well as archive.ph
Finland
On 21 July 2015, the operators blocked access to the service from all Finnish IP address
An Internet Protocol address (IP address) is a numerical label such as that is assigned to a device connected to a computer network that uses the Internet Protocol for communication. IP addresses serve two main functions: network interface i ...
es, stating on Twitter that they did this in order to avoid escalating a dispute they allegedly had with the Finnish government.
Russia
In 2016, the Russian communications agency Roskomnadzor
The Federal Service for Supervision of Communications, Information Technology and Mass Media, abbreviated as ''Roskomnadzor'' (RKN), is the Russian federal executive agency responsible for monitoring, controlling and censoring Russian mass media. ...
began blocking access to archive.is from Russia.
Cloudflare DNS availability
Since May 2018 Cloudflare
Cloudflare, Inc., is an American company that provides content delivery network services, cybersecurity, DDoS mitigation, wide area network services, reverse proxies, Domain Name Service, ICANN-accredited domain registration, and other se ...
's 1.1.1.1 DNS service would not resolve archive.today's web addresses, making it inaccessible to users of the Cloudflare DNS service. Both organizations claimed the other was responsible for the issue. Cloudflare staff stated that the problem was on archive.today's DNS infrastructure, as its authoritative nameservers return invalid records when Cloudflare's network systems made requests to archive.today. archive.today countered that the issue was due to Cloudflare requests not being compliant with DNS standards, as Cloudflare does not send EDNS Client Subnet information in its DNS requests.
See also
* Digital preservation
In library science, library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and appli ...
* List of Web archiving initiatives
This article contains a list of Web archiving initiatives worldwide. For easier reading, the information is divided in three tables: web archiving initiatives, archived data, and access methods.
Some of these initiatives may or may not make u ...
* Link rot
Link rot (also called link death, link breaking, or reference rot) is the phenomenon of hyperlinks tending over time to cease to point to their originally targeted file, web page, or server due to that resource being relocated to a new address ...
* Perma.cc
Perma.cc is a web archiving service for legal and academic citations founded by the Harvard Library Innovation Lab in 2013.
Concept
Perma.cc was created in response to studies showing high incidences of link rot in both academic publications an ...
* Wayback Machine
The Wayback Machine is a digital archive of the World Wide Web founded by Internet Archive, an American nonprofit organization based in San Francisco, California. Launched for public access in 2001, the service allows users to go "back in ...
* Web archiving
Web archiving is the process of collecting, preserving, and providing access to material from the World Wide Web. The aim is to ensure that information is preserved in an archival format for research and the public.
Web archivists typically ...
* WebCite
WebCite is an intermittently available archive site, originally designed to digitally preserve scientific and educationally important material on the web by taking snapshots of Internet contents as they existed at the time when a blogger or ...
References
External links
*
FAQ
at Archive.today
archive.today
at Archive Team wiki
A wiki ( ) is a form of hypertext publication on the internet which is collaboratively edited and managed by its audience directly through a web browser. A typical wiki contains multiple pages that can either be edited by the public or l ...
"archive.today: On the trail of the mysterious guerrilla archivist of the Internet"
''Gyrovague'', 5 August 2023
{{Authority control
History of the Internet
Internet properties established in 2012
Tor onion services
Web archiving initiatives