HOME

TheInfoList



OR:

Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive
language model A language model is a probability distribution over sequences of words. Given any sequence of words of length , a language model assigns a probability P(w_1,\ldots,w_m) to the whole sequence. Language models generate probabilities by training on ...
that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt. The architecture is a standard transformer network (with a few engineering tweaks) with the unprecedented size of 2048-
token Token may refer to: Arts, entertainment, and media * Token, a game piece or counter, used in some games * The Tokens, a vocal music group * Tolkien Black, a recurring character on the animated television series ''South Park,'' formerly known as ...
-long context and 175 billion parameters (requiring 800 GB of storage). The training method is "generative pretraining", meaning that it is trained to predict what the next token is. The model demonstrated strong few-shot learning on many text-based tasks. It is the third-generation language prediction model in the GPT-n series (and the successor to GPT-2) created by
OpenAI OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
, a San Francisco-based
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machine A machine is a physical system using Power (physics), power to apply Force, forces and control Motion, moveme ...
research laboratory. GPT-3, which was introduced in May 2020, and was in beta testing as of July 2020, is part of a trend in
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
(NLP) systems of pre-trained language representations. The quality of the text generated by GPT-3 is so high that it can be difficult to determine whether or not it was written by a human, which has both benefits and risks. Thirty-one OpenAI researchers and engineers presented the original May 28, 2020 paper introducing GPT-3. In their paper, they warned of GPT-3's potential dangers and called for research to mitigate risk.
David Chalmers David John Chalmers (; born 20 April 1966) is an Australian philosopher and cognitive scientist specializing in the areas of philosophy of mind and philosophy of language. He is a professor of philosophy and neural science at New York Univer ...
, an Australian philosopher, described GPT-3 as "one of the most interesting and important AI systems ever produced." An April 2022 review in ''The New York Times'' described GPT-3's capabilities as being able to write original prose with fluency equivalent to that of a human.
Microsoft Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
announced on September 22, 2020 that it had licensed "exclusive" use of GPT-3; others can still use the public API to receive output, but only Microsoft has access to GPT-3's underlying model.


Background

According to ''
The Economist ''The Economist'' is a British weekly newspaper printed in demitab format and published digitally. It focuses on current affairs, international business, politics, technology, and culture. Based in London, the newspaper is owned by The Econ ...
'', improved algorithms, powerful computers, and an increase in digitized data have fueled a revolution in
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, with new techniques in the 2010s resulting in "rapid improvements in tasks" including manipulating language. Software models are trained to learn by using thousands or millions of examples in a "structure... loosely based on the neural architecture of the brain". One architecture used in
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
(NLP) is a
neural network A neural network is a network or neural circuit, circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up ...
based on a deep learning model that was first introduced in 2017—the
Transformer A transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple circuits. A varying current in any coil of the transformer produces a varying magnetic flux in the transformer' ...
. GPT-n models are based on this Transformer-based deep learning neural network architecture. There are a number of NLP systems capable of processing, mining, organizing, connecting and contrasting textual input, as well as correctly answering questions. On June 11, 2018, OpenAI researchers and engineers posted their original paper on
generative model In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is incons ...
s—language models—artificial intelligence systems—that could be pre-trained with an enormous and diverse
corpus of text In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical ...
via datasets, in a process they called
generative pre-training Generative may refer to: * Generative actor, a person who instigates social change * Generative art, art that has been created using an autonomous system that is frequently, but not necessarily, implemented using a computer * Generative music, ...
(GP). The authors described how language understanding performances in natural language processing (NLP) were improved in GPT-n through a process of "generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task." This eliminated the need for human supervision and for time-intensive hand-labeling. In February 2020, Microsoft introduced its Turing Natural Language Generation (T-NLG), which was claimed to be the "largest language model ever published at 17 billion parameters." It performed better than any other language model at a variety of tasks which included summarizing texts and answering questions.


Training and capabilities

On May 28, 2020, an
arXiv arXiv (pronounced " archive"—the X represents the Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not peer review. It consists o ...
preprint by a group of 31 engineers and researchers at OpenAI described the development of GPT-3, a third-generation "state-of-the-art language model". The team increased the capacity of GPT-3 by over two orders of magnitude from that of its predecessor, GPT-2, making GPT-3 the largest non-sparse language model to date. (In a sparse model, many of its parameters are set to a constant value, so even if there are more total parameters, there is less meaningful information.) Four preprints were released between May 28 and July 22, 2020. Because GPT-3 is structurally similar to its predecessors, its greater accuracy is attributed to its increased capacity and greater number of parameters. GPT-3's capacity is ten times larger than that of Microsoft's Turing NLG, the next largest NLP model known at the time. Sixty percent of the weighted pre-training dataset for GPT-3 comes from a filtered version of
Common Crawl Common Crawl is a nonprofit organization, nonprofit 501(c) organization#501.28c.29.283.29, 501(c)(3) organization that web crawler, crawls the web and freely provides its archives and datasets to the public. Common Crawl's Web archiving, web arch ...
consisting of 410 billion byte-pair-encoded tokens. Other sources are 19 billion tokens from WebText2 representing 22% of the weighted total, 12 billion tokens from Books1 representing 8%, 55 billion tokens from Books2 representing 8%, and 3 billion tokens from Wikipedia representing 3%. GPT-3 was trained on hundreds of billions of words and is also capable of coding in CSS, JSX, and Python, among others. Since GPT-3's training data was all-encompassing, it does not require further training for distinct language tasks. The training data contains occasional toxic language and GPT-3 occasionally generates toxic language as a result of mimicking its training data. A study from the University of Washington found that GPT-3 produced toxic language at a toxicity level comparable to the similar natural language processing models of GPT-2 and CTRL. GPT-3 produced less toxic language compared to its predecessor model, GPT-1, although it produced both more generations and a higher toxicity of toxic language compared to CTRL Wiki, a language model trained entirely on Wikipedia data. On June 11, 2020,
OpenAI OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
announced that users could request access to its user-friendly GPT-3 API—a "machine learning toolset"—to help OpenAI "explore the strengths and limits" of this new technology. The invitation described how this API had a general-purpose "text in, text out" interface that can complete almost "any English language task", instead of the usual single use-case. According to one user, who had access to a private early release of the OpenAI GPT-3 API, GPT-3 was "eerily good" at writing "amazingly coherent text" with only a few simple prompts. In an initial experiment 80 US subjects were asked to judge if short ~200 word articles were written by humans or GPT-3. The participants judged correctly 52% of the time, doing only slightly better than random guessing. On November 18, 2021, OpenAI announced that enough safeguards had been implemented that access to its API would be unrestricted. OpenAI provided developers with a content moderation tool that helps them abide by OpenAI's content policy. On January 27, 2022, OpenAI announced that its newest GPT-3 language models, collectively referred to as InstructGPT, was now the default language model used on their API. According to OpenAI, InstructGPT produced content that was better aligned to user intentions by following instructions better, generating fewer made-up facts, and producing somewhat less toxic content. Because GPT-3 can "generate news articles which human evaluators have difficulty distinguishing from articles written by humans," GPT-3 has the "potential to advance both the beneficial and harmful applications of language models." In their May 28, 2020 paper, the researchers described in detail the potential "harmful effects of GPT-3" which include "misinformation, spam,
phishing Phishing is a type of social engineering where an attacker sends a fraudulent (e.g., spoofed, fake, or otherwise deceptive) message designed to trick a person into revealing sensitive information to the attacker or to deploy malicious softwar ...
, abuse of legal and governmental processes, fraudulent academic essay writing and social engineering
pretexting Pretexting is a type of social engineering attack that involves a situation, or pretext, created by an attacker in order to lure a victim into a vulnerable situation and to trick them into giving private information, specifically information that t ...
". The authors draw attention to these dangers to call for research on risk mitigation. GPT-3 is capable of performing zero-shot, few-shot and one-shot learning. In June 2022, Almira Osmanovic Thunström wrote that GPT-3 was the primary author on an article on itself, that they had submitted it for publication, and that it had been pre-published while waiting for completion of its review.


Reception


Applications

* GPT-3, specifically the Codex model, is the basis for GitHub Copilot, a code completion and generation software that can be used in various code editors and IDEs. * GPT-3 is used in certain Microsoft products to translate conventional language into formal computer code. * GPT-3 has been used in CodexDB to generate query-specific code for SQL processing. * GPT-3 has been used by Jason Rohrer in a retro-themed chatbot project named "Project December", which is accessible online and allows users to converse with several AIs using GPT-3 technology. * GPT-3 was used by ''
The Guardian ''The Guardian'' is a British daily newspaper A newspaper is a periodical publication containing written information about current events and is often typed in black ink with a white or gray background. Newspapers can cover a wide ...
'' to write an article about AI being harmless to human beings. It was fed some ideas and produced eight different essays, which were ultimately merged into one article. * GPT-3 was used in '' AI Dungeon'', which generates text-based adventure games. Later it was replaced by a competing model after OpenAI changed their policy regarding generated content. * GPT-3 is used in Copy.ai, an AI copywriting app for marketers and business owners. * GPT-3 is used in Jasper.ai, a content generator designed to assist marketers and copyeditors. * GPT-3 is used in jamie, a AI meeting assistant that takes notes in meetings and summarizes them.


Reviews

* In a July 2020 review in ''
The New York Times ''The New York Times'' (''the Times'', ''NYT'', or the Gray Lady) is a daily newspaper based in New York City with a worldwide readership reported in 2020 to comprise a declining 840,000 paid print subscribers, and a growing 6 million paid ...
'', Farhad Manjoo said that GPT-3's ability to generate computer code, poetry, and prose is not just "amazing", "spooky", and "humbling", but also "more than a little terrifying". * ''Daily Nous'' presented a series of articles by nine philosophers on GPT-3. Australian philosopher
David Chalmers David John Chalmers (; born 20 April 1966) is an Australian philosopher and cognitive scientist specializing in the areas of philosophy of mind and philosophy of language. He is a professor of philosophy and neural science at New York Univer ...
described GPT-3 as "one of the most interesting and important AI systems ever produced". * A review in ''
Wired ''Wired'' (stylized as ''WIRED'') is a monthly American magazine, published in print and online editions, that focuses on how emerging technologies affect culture, the economy, and politics. Owned by Condé Nast, it is headquartered in San Fran ...
'' said that GPT-3 was "provoking chills across
Silicon Valley Silicon Valley is a region in Northern California that serves as a global center for high technology and innovation. Located in the southern part of the San Francisco Bay Area, it corresponds roughly to the geographical areas San Mateo Count ...
". * The ''
National Law Review ''The National Law Review'' is an American law journal, daily legal news website and legal analysis content-aggregating database. In both 2020 and 2021, the National Law Review published over 20,000 legal news articles and experienced an uptick ...
'' said that GPT-3 is an "impressive step in the larger process", with OpenAI and others finding "useful applications for all of this power" while continuing to "work toward a more
general A general officer is an officer of high rank in the armies, and in some nations' air forces, space forces, and marines or naval infantry. In some usages the term "general officer" refers to a rank above colonel."general, adj. and n.". O ...
intelligence". * An article in the ''
MIT Technology Review ''MIT Technology Review'' is a bimonthly magazine wholly owned by the Massachusetts Institute of Technology, and editorially independent of the university. It was founded in 1899 as ''The Technology Review'', and was re-launched without "The" in ...
,'' cowritten by Deep Learning critic Gary Marcus, stated that GPT-3's "comprehension of the world is often seriously off, which means you can never really trust what it says." According to the authors, GPT-3 models relationships between words without having an understanding of the meaning behind each word. * Jerome Pesenti, head of the Facebook AI lab, said GPT-3 is "unsafe," pointing to the sexist,
racist Racism is the belief that groups of humans possess different behavioral traits corresponding to inherited attributes and can be divided based on the superiority of one race over another. It may also mean prejudice, discrimination, or antagonis ...
and other biased and negative language generated by the system when it was asked to discuss Jews, women, black people, and the
Holocaust The Holocaust, also known as the Shoah, was the genocide of European Jews during World War II. Between 1941 and 1945, Nazi Germany and its collaborators systematically murdered some six million Jews across German-occupied Europe; ...
. * Nabla, a French start-up specializing in healthcare technology, tested GPT-3 as a medical chatbot, though OpenAI itself warned against such use. As expected, GPT-3 showed several limitations. For example, while testing GPT-3 responses about mental health issues, the AI advised a simulated patient to commit suicide. *
Noam Chomsky Avram Noam Chomsky (born December 7, 1928) is an American public intellectual: a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes called "the father of modern linguistics", Chomsky is ...
expressed his skepticism about GPT-3's scientific value: "It's not a language model. It works just as well for impossible languages as for actual languages. It is therefore refuted, if intended as a language model, by normal scientific criteria. ..Perhaps it's useful for some purpose, but it seems to tell us nothing about language or cognition generally." * Luciano Floridi and Massimo Chiriatti highlighted the risk of "cheap production of good, semantic artefacts". OpenAI's Sam Altman himself criticized what he called "GPT-3 hype", acknowledging GPT-3 "has serious weakness and sometimes makes very silly mistakes... AI is going to change the world, but GPT-3 is just a very early glimpse."


Criticism

GPT-3's builder,
OpenAI OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
, was initially founded as a non-profit in 2015. In 2019, OpenAI did not publicly release GPT-3's precursor model, breaking from OpenAI's previous open-source practices, citing concerns that the model would perpetuate fake news. OpenAI eventually released a version of GPT-2 that was 8% of the original model's size. In the same year, OpenAI restructured to be a for-profit company. In 2020, Microsoft announced the company had exclusive licensing of GPT-3 for Microsoft's products and services following a multi-billion dollar investment in OpenAI. The agreement permits OpenAI to offer a public-facing API such that users can send text to GPT-3 to receive the model's output, but only Microsoft will have access to GPT-3's source code. Large language models, such as GPT-3, have come under criticism from Google's AI ethics researchers for the environmental impact of training and storing the models, detailed in a paper co-authored by Timnit Gebru and
Emily M. Bender Emily M. Bender (born 1973) is an American linguist who is a professor at the University of Washington. She specializes in computational linguistics and natural language processing. She is also the director of the University of Washington's Comp ...
in 2021. The growing use of automated writing technologies based on GPT-3 and other language generators, has raised concerns regarding academic integrity and raised the stakes of how universities and schools will gauge what constitutes academic misconduct such as plagiarism. GPT was built with data from the
Common Crawl Common Crawl is a nonprofit organization, nonprofit 501(c) organization#501.28c.29.283.29, 501(c)(3) organization that web crawler, crawls the web and freely provides its archives and datasets to the public. Common Crawl's Web archiving, web arch ...
dataset, a conglomerate of copyrighted articles, internet posts, web pages, and books scraped from 60 million domains over a period of 12 years. ''TechCrunch'' reports this training data includes copyrighted material from the BBC, ''The New York Times'',
Reddit Reddit (; stylized in all lowercase as reddit) is an American social news news aggregator, aggregation, Review site#Rating site, content rating, and Internet forum, discussion website. Registered users (commonly referred to as "Redditors") subm ...
, the full text of online books, and more. In its response to a 2019 Request for Comments on Intellectual Property Protection for Artificial Intelligence Innovation from the United States Patent and Trademark Office ("USPTO"), OpenAI argued that "Under current law, training AI systems
uch as its GPT models Uch ( pa, ; ur, ), frequently referred to as Uch Sharīf ( pa, ; ur, ; ''"Noble Uch"''), is a historic city in the southern part of Pakistan's Punjab province. Uch may have been founded as Alexandria on the Indus, a town founded by Alexan ...
constitutes
fair use Fair use is a doctrine in United States law that permits limited use of copyrighted material without having to first acquire permission from the copyright holder. Fair use is one of the limitations to copyright intended to balance the intere ...
," but that "given the lack of
case law Case law, also used interchangeably with common law, is law that is based on precedents, that is the judicial decisions from previous cases, rather than law based on constitutions, statutes, or regulations. Case law uses the detailed facts of a ...
on point, OpenAI and other AI developers like us face substantial legal uncertainty and compliance costs."


See also

*
BERT (language model) Bidirectional Encoder Representations from Transformers (BERT) is a transformer-based machine learning technique for natural language processing (NLP) pre-training developed by Google. BERT was created and published in 2018 by Jacob Devlin and his ...
* LaMDA *
Natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
*
Wu Dao Wu Dao () is a multimodal artificial intelligence developed by the Beijing Academy of Artificial Intelligence (BAAI). Wu Dao 1.0 was first announced on January 11, 2021; an improved version, Wu Dao 2.0, was announced on May 31. It has been compa ...
* ChatGPT


References

{{Existential risk from artificial intelligence Language modeling Deep learning software applications Unsupervised learning OpenAI