HOME

TheInfoList



OR:

reCAPTCHA reCAPTCHA Inc. is a CAPTCHA system owned by Google. It enables web hosts to distinguish between human and automated access to websites. The original version asked users to decipher hard-to-read text or match images. Version 2 also asked users ...
Inc. is a
CAPTCHA Completely Automated Public Turing Test to tell Computers and Humans Apart (CAPTCHA) ( ) is a type of challenge–response authentication, challenge–response turing test used in computing to determine whether the user is human in order to de ...
system owned by
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
. It enables web hosts to distinguish between human and automated access to websites. The original version asked users to decipher hard-to-read text or match images. Version 2 also asked users to decipher text or match images if the analysis of cookies and canvas rendering suggested the page was being downloaded automatically. Since version 3, reCAPTCHA will never interrupt users and is intended to run automatically when users load pages or click buttons. The original iteration of the service was a
mass collaboration Mass collaboration is a form of collective action that occurs when large numbers of people work independently on a single project, often modular in its nature. Such projects typically take place on the internet using social software and computer-s ...
platform designed for the digitization of books, particularly those that were too illegible to be scanned by computers. The verification prompts utilized pairs of words from scanned pages, with one known word used as a control for verification, and the second used to
crowdsource Crowdsourcing involves a large group of dispersed participants contributing or producing goods and services, goods or services—including ideas, Voting, votes, Microwork, micro-tasks, and finances—for payment or as volunteers. Contemporary ...
the reading of an uncertain word. reCAPTCHA was originally developed by
Luis von Ahn Luis von Ahn (; born 19 August 1978) is a Guatemalan-American entrepreneur and software developer. He is the founder of the company reCAPTCHA, which was sold to Google in 2009, and the co-founder and CEO of Duolingo. For these projects and othe ...
, David Abraham,
Manuel Blum Manuel Blum (born 26 April 1938) is a Venezuelan-born American computer scientist who received the Turing Award in 1995 "In recognition of his contributions to the foundations of computational complexity theory and its application to cryptography ...
, Michael Crawford, Ben Maurer, Colin McMillen, and Edison Tan at Carnegie Mellon University's main
Pittsburgh Pittsburgh ( ) is a city in Allegheny County, Pennsylvania, United States, and its county seat. It is the List of municipalities in Pennsylvania#Municipalities, second-most populous city in Pennsylvania (after Philadelphia) and the List of Un ...
campus. It was acquired by
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
in September 2009. The system helped to digitize the archives of ''
The New York Times ''The New York Times'' (''NYT'') is an American daily newspaper based in New York City. ''The New York Times'' covers domestic, national, and international news, and publishes opinion pieces, investigative reports, and reviews. As one of ...
'', and was subsequently used by
Google Books Google Books (previously known as Google Book Search, Google Print, and by its code-name Project Ocean) is a service from Google that searches the full text of books and magazines that Google has scanned, converted to text using optical charac ...
for similar purposes. The system was reported as displaying over 100 million CAPTCHAs every day, on sites such as
Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
, TicketMaster, Twitter,
4chan 4chan is an anonymous English-language imageboard website. Launched by Christopher "moot" Poole in October 2003, the site hosts boards dedicated to a wide variety of topics, from video games and television to literature, cooking, weapons, mu ...
, CNN.com,
StumbleUpon StumbleUpon was a website, browser extension, toolbar, and mobile app with a "Stumble!" button that, when pushed, opened a semi-random website or video that matched the user's interests, similar to a random web search engine. Users were able to ...
,
Craigslist Craigslist (stylized as craigslist) is a privately held American company operating a classified advertisements website with sections devoted to jobs, housing, for sale, items wanted, services, community service, gigs, résumés, and discussi ...
(since June 2008), and the U.S. National Telecommunications and Information Administration's digital TV converter box coupon program website (as part of the US DTV transition). In 2014, Google pivoted the service away from its original concept, with a focus on reducing the amount of user interaction needed to verify a user, and only presenting human recognition challenges (such as identifying images in a set that satisfy a specific prompt) if behavioral analysis suspects that the user may be a bot. In October 2023, it was found that OpenAI's
GPT-4 Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model trained and created by OpenAI and the fourth in its series of GPT foundation models. It was launched on March 14, 2023, and made publicly available via the p ...
chatbot could solve CAPTCHAs. The service has been criticized for lack of security and accessibility while collecting user data, with a 2023 study estimating the collective cost of human time spent solving CAPTCHAs as $6.1 billion in wages.


Origin

Distributed Proofreaders Distributed Proofreaders (commonly abbreviated as DP or PGDP) is a web-based project that supports the development of e-texts for Project Gutenberg by allowing many people to work together in proofreading drafts of e-texts for errors. the site ...
was the first project to volunteer its time to decipher scanned text that could not be read by
optical character recognition Optical character recognition or optical character reader (OCR) is the electronics, electronic or machine, mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo ...
(OCR) programs. It works with
Project Gutenberg Project Gutenberg (PG) is a volunteer effort to digitize and archive cultural works, as well as to "encourage the creation and distribution of eBooks." It was founded in 1971 by American writer Michael S. Hart and is the oldest digital li ...
to digitize
public domain The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
material and uses methods quite different from reCAPTCHA. The reCAPTCHA program originated with Guatemalan
computer scientist A computer scientist is a scientist who specializes in the academic study of computer science. Computer scientists typically work on the theoretical side of computation. Although computer scientists can also focus their work and research on ...
Luis von Ahn Luis von Ahn (; born 19 August 1978) is a Guatemalan-American entrepreneur and software developer. He is the founder of the company reCAPTCHA, which was sold to Google in 2009, and the co-founder and CEO of Duolingo. For these projects and othe ...
, and was aided by a
MacArthur Fellowship The MacArthur Fellows Program, also known as the MacArthur Fellowship and colloquially called the "Genius Grant", is a prize awarded annually by the MacArthur Foundation, John D. and Catherine T. MacArthur Foundation to typically between 20 and ...
. An early CAPTCHA developer, he realized "he had unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles".


Operation


reCAPTCHA v1 (human-assisted OCR)

Scanned text is subjected to analysis by two different OCRs. Any word that is deciphered differently by the two OCR programs or that is not in an English dictionary is marked as "suspicious" and converted into a CAPTCHA. The suspicious word is displayed, out of context, sometimes along with a control word already known. If the human types the control word correctly, then the response to the questionable word is accepted as probably valid. If enough users were to correctly type the control word, but incorrectly type the second word which OCR had failed to recognize, then the digital version of documents could end up containing the incorrect word. The identification performed by each OCR program is given a value of 0.5 points, and each interpretation by a human is given a full point. Once a given identification hits 2.5 points, the word is considered valid. Those words that are consistently given a single identity by human judges are later recycled as control words. If the first three guesses match each other but do not match either of the OCRs, they are considered a correct answer, and the word becomes a control word. When six users reject a word before any correct spelling is chosen, the word is discarded as unreadable. The original reCAPTCHA method was designed to show the questionable words separately, as out-of-context correction, rather than in use, such as within a phrase of five words from the original document. Also, the control word might mislead the context for the second word, such as a request of "/metal/ /fife/" being entered as "metal file" due to the logical connection of filing with a metal tool being considered more common than the musical instrument "
fife Fife ( , ; ; ) is a council areas of Scotland, council area and lieutenancy areas of Scotland, lieutenancy area in Scotland. A peninsula, it is bordered by the Firth of Tay to the north, the North Sea to the east, the Firth of Forth to the s ...
". In 2012, reCAPTCHA began using photographs taken from
Google Street View Google Street View is a technology featured in Google Maps and Google Earth that provides interactive panoramas from positions along many streets in the world. It was launched in 2007 in several cities in the United States, and has since expa ...
project, in addition to scanned words. It will ask the user to identify images of crosswalks, street lights, and other objects. It has been hypothesized that the data is used by
Waymo Waymo LLC, formerly known as the Google Self-Driving Car Project, is an American autonomous driving technology company headquartered in Mountain View, California. It is a subsidiary of Google's parent company (Alphabet Inc., Alphabet Inc). T ...
(a Google subsidiary) to train autonomous vehicles, though an unnamed representative has denied this, claiming the data was only being used to improve Google Maps as of mid-2021. Google charges for the use of reCAPTCHA on websites that make over a million reCAPTCHA queries a month. reCAPTCHA v1 was declared end-of-life and shut down on March 31, 2018.


reCAPTCHA v2 (checkbox)

In 2013, reCAPTCHA began implementing behavioral analysis of the browser's interactions to predict whether the user was a human or a bot. The following year, Google began to deploy a new reCAPTCHA API, featuring the "no CAPTCHA reCAPTCHA"—where users deemed to be of low risk only need to click a single
checkbox A checkbox (check box, tickbox, tick box) is a graphical widget that allows the user to make a binary choice, i.e. a choice between one of two possible mutually exclusive options. For example, the user may have to answer 'yes' (checked) or 'n ...
to verify their identity. A CAPTCHA may still be presented if the system is uncertain of the user's risk; Google also introduced a new type of CAPTCHA challenge designed to be more accessible to mobile users, where the user must select images matching a specific prompt from a grid.


reCAPTCHA v3 and reCAPTCHA Enterprise (invisible)

In 2017, Google introduced a new "invisible" reCAPTCHA, where verification occurs in the background, and no challenges are displayed at all if the user is deemed to be of low risk. According to former Google "
click fraud Click fraud is a type of ad fraud that occurs on the Internet in pay per click (PPC) online advertising. In this type of advertising, the owners of websites that post the ads are paid based on how many site visitors click on the ads. Fraud occurs ...
czar"
Shuman Ghosemajumder Shuman Ghosemajumder (born 1974) is a Canadian technologist, entrepreneur, and author. He is the former click fraud czar at Google, the author of works on technology and business including the Open Music Model, and co-founder of TeachAids. He wa ...
, this capability "creates a new sort of challenge that very advanced bots can still get around, but introduces a lot less friction to the legitimate human."


Implementation

The reCAPTCHA tests are displayed from the central site of the reCAPTCHA project, which supplies the words to be deciphered. This is done through a
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
with the server making a callback to reCAPTCHA after the request has been submitted. The reCAPTCHA project provides libraries for various programming languages and applications to make this process easier. reCAPTCHA is a free-of-charge service provided to websites for assistance with the decipherment, but the reCAPTCHA software is not
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
. Also, reCAPTCHA offers plugins for several web-application platforms including
ASP.NET ASP.NET is a server-side web-application framework designed for web development to produce dynamic web pages. It was developed by Microsoft to allow programmers to build dynamic web sites, applications and services. The name stands for Ac ...
,
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
, and
PHP PHP is a general-purpose scripting language geared towards web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by the PHP Group. ...
, to ease the implementation of the service.


Security

The main purpose of a
CAPTCHA Completely Automated Public Turing Test to tell Computers and Humans Apart (CAPTCHA) ( ) is a type of challenge–response authentication, challenge–response turing test used in computing to determine whether the user is human in order to de ...
system is to block spambots while allowing human users. On December 14, 2009, Jonathan Wilkins released a paper describing weaknesses in reCAPTCHA that allowed bots to achieve a solve rate of 18%. On August 1, 2010, Chad Houck gave a presentation to the
DEF CON DEF CON (also written as DEFCON, Defcon, or DC) is a Computer security conference, hacker convention held annually in Las Vegas Valley, Las Vegas, Nevada. The first DEF CON took place in June 1993 and today many attendees at DEF CON include comp ...
18 Hacking Conference detailing a method to reverse the distortion added to images which allowed a computer program to determine a valid response 10% of the time. The reCAPTCHA system was modified on July 21, 2010, before Houck was to speak on his method. Houck modified his method to what he described as an "easier" CAPTCHA to determine a valid response 31.8% of the time. Houck also mentioned security defenses in the system, including a high-security lockout if an invalid response is given 32 times in a row. On May 26, 2012, Adam, C-P, and Jeffball of DC949 gave a presentation at the LayerOne hacker conference detailing how they were able to achieve an automated solution with an accuracy rate of 99.1%. Their tactic was to use techniques from machine learning, a subfield of artificial intelligence, to analyze the audio version of reCAPTCHA which is available for the visually impaired. Google released a new version of reCAPTCHA just hours before their talk, making major changes to both the audio and visual versions of their service. In this release, the audio version was increased in length from 8 seconds to 30 seconds and is much more difficult to understand, both for humans as well as bots. In response to this update and the following one, the members of DC949 released two more versions of Stiltwalker which beat reCAPTCHA with an accuracy of 60.95% and 59.4% respectively. After each successive break, Google updated reCAPTCHA within a few days. According to DC949, they often reverted to features that had been previously hacked. On June 27, 2012, Claudia Cruz, Fernando Uceda, and Leobardo Reyes published a paper showing a system running on reCAPTCHA images with an accuracy of 82%. The authors have not said if their system can solve recent reCAPTCHA images, although they claim their work to be intelligent OCR and robust to some, if not all changes in the image database. In an August 2012 presentation given at BsidesLV 2012, DC949 called the latest version "unfathomably impossible for humans"—they were not able to solve them manually either. The web accessibility organization WebAIM reported in May 2012, "Over 90% of respondents creen reader usersfind CAPTCHA to be very or somewhat difficult".


Criticism

The original iteration of reCAPTCHA was criticized as being a source of
unpaid work Unpaid labor or unpaid work is defined as labor or work that does not receive any direct remuneration. This is a form of non-market work which can fall into one of two categories: (1) unpaid work that is placed within the production boundary of ...
to assist in transcribing efforts. Google profits from reCAPTCHA users as free workers to improve its AI research. A 13-month study published in 2023, "Dazed & Confused: A Large-Scale Real-World User Study of reCAPTCHAv2," found that reCAPTCHA provides little security against bots and is primarily a tool to track user data, and has cost society an estimated 819 million hours of unpaid human labor.


Privacy

The current iteration of the system has been criticized for its reliance on
tracking cookies HTTP cookie (also called web cookie, Internet cookie, browser cookie, or simply cookie) is a small block of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user's web br ...
and promotion of
vendor lock-in In economics, vendor lock-in, also known as proprietary lock-in or customer lockin, makes a customer dependent on a vendor for products, unable to use another vendor without substantial switching costs. The use of open standards and alternati ...
with Google services; administrators are encouraged to include reCAPTCHA tracking code on all pages of their website to analyze the behavior and "risk" of users, which determines the level of friction presented when a reCAPTCHA prompt is used. Google stated in its
privacy policy A privacy policy is a statement or legal document (in privacy law) that discloses some or all of the ways a party gathers, uses, discloses, and manages a customer or client's data. Personal information can be anything that can be used to identify ...
that user data collected in this manner is not used for personalized advertising. It was also discovered that the system favors those who have an active
Google account A Google Account is a user account that is required for access, authentication and authorization to certain online Google services. It is also often used as single sign-on for third party services. Usage A Google Account is required for Gmail, ...
login, and displays a higher risk towards those using anonymizing proxies and VPN services. Concerns were raised regarding privacy when Google announced reCAPTCHA v3.0, as it allows Google to track users on non-Google websites. In April 2020,
Cloudflare Cloudflare, Inc., is an American company that provides content delivery network services, cybersecurity, DDoS mitigation, wide area network services, reverse proxies, Domain Name Service, ICANN-accredited domain registration, and other se ...
switched from reCAPTCHA to hCaptcha, citing privacy concerns over Google's potential use of the data they recollect through reCAPTCHA for
targeted advertising Targeted advertising or data-driven marketing is a form of advertising, including online advertising, that is directed towards an audience with certain traits, based on the product or person the advertiser is promoting. These traits can either ...
and to cut down on operating costs since a considerable portion of Cloudflare's customers are non-paying customers. In response, Google told ''
PC Magazine ''PC Magazine'' (shortened as ''PCMag'') is an American computer magazine published by Ziff Davis. A print edition was published from 1982 to January 2009. Publication of online editions started in late 1994 and continues . Overview ''PC Mag ...
'' that the data from reCAPTCHA is never used for personalized advertising purposes.


Accessibility

Google's help center states that reCAPTCHA is not supported for the deafblind community, effectively locking such users out of all pages that use the service. However, reCAPTCHA does currently have the longest list of accessibility considerations of any CAPTCHA service.


Interface

In one of the variants of CAPTCHA challenges, images are not incrementally highlighted, but fade out when clicked, and replaced with a new image fading in, resembling
whack-a-mole ''Whac-A-Mole'' is a Japanese arcade game that was created in 1975 by the amusements manufacturer TOGO in Japan, where it was originally known as or . A typical ''Whac-A-Mole'' machine consists of a waist-level cabinet with a play area and dis ...
. Criticism has been aimed at the long duration taken for the images to fade out and in.


Derivative projects

reCAPTCHA also created the Mailhide project, which protects
email address An email address identifies an email box to which messages are delivered. While early messaging systems used a variety of formats for addressing, today, email addresses follow a set of specific rules originally standardized by the Internet Enginee ...
es on web pages from being harvested by
spammers This is a list of individuals and organizations noteworthy for engaging in bulk electronic spamming, either on their own behalf or on behalf of others. It is not a list of all spammersonly those whose actions have attracted substantial independent ...
. By default, the email address was converted into a format that did not allow a crawler to see the full email address; for example, "[email protected]" would have been converted to "[email protected]". The visitor would then click on the "..." and solve the CAPTCHA to obtain the full email address. One could also edit the pop-up code so that none of the addresses were visible. Mailhide was discontinued in 2018 because it relied on reCAPTCHA v1.


References


Further reading

* *


External links

* {{Google LLC 2007 in computing 2007 introductions 2009 mergers and acquisitions American inventions Access control software Anti-spam Carnegie Mellon University Crowdsourcing Google acquisitions Human-based computation Optical character recognition Turing tests Web technology Tracking