Generative Pre-trained Transformer 4 (GPT-4) is a
multimodal large language model
A large language model (LLM) is a language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabelled text using self-supervised learning. LLMs emerged around 2018 an ...
created by
OpenAI
OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
and the fourth in
its GPT series.
It was released on March 14, 2023, and has been made publicly available in a limited form via
ChatGPT Plus, with access to its commercial
API being provided via a waitlist.
As a
transformer
A transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple circuits. A varying current in any coil of the transformer produces a varying magnetic flux in the transformer' ...
, GPT-4 was pretrained to predict the next
token
Token may refer to:
Arts, entertainment, and media
* Token, a game piece or counter, used in some games
* The Tokens, a vocal music group
* Tolkien Black, a recurring character on the animated television series ''South Park,'' formerly known as ...
(using both public data and "data licensed from third-party providers"), and was then fine-tuned with
reinforcement learning from human and AI feedback for
human alignment and policy compliance.
Observers reported the GPT-4 based version of ChatGPT to be an improvement on the previous (GPT-3.5 based) ChatGPT, with the caveat that GPT-4 retains some of the same problems.
Unlike the predecessors, GPT-4 can take images as well as text as input.
OpenAI has declined to reveal technical information such as the size of the GPT-4 model.
Background
OpenAI published their first paper on GPT in 2018, called "Improving Language Understanding by Generative Pre-Training." They also released GPT-1, a model based on the Transformer architecture that was trained on a large corpus of books. The next year, they introduced GPT-2, a larger model that could generate coherent text. In 2020, they introduced GPT-3, a model with 100 times the number of parameters as GPT-2, that could perform various tasks with few examples. GPT-3 was further improved into GPT-3.5, which was used to create
ChatGPT.
Capabilities
OpenAI stated that GPT-4 is "more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5." They produced two versions of GPT-4, with context windows of 8,192 and 32,768 tokens, a significant improvement over GPT-3.5 and GPT-3, which were limited to 4,096 and 2,049 tokens respectively. Unlike its predecessors, GPT-4 is a multimodal model: it can take images as well as text as input;
this gives it the ability to describe the humor in unusual images, summarize screen-shot text, and answer exam questions that contain diagrams.
To gain further control over GPT-4, OpenAI introduced the "system message", a directive in natural language given to GPT-4 in order to specify its tone of voice and task. For example, the system message can instruct the model to "be a Shakespearean pirate", in which case it will respond in rhyming, Shakespearean prose, or request it to "always write the output of
tsresponse in
JSON
JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other s ...
", in which case the model will do so, adding keys and values as it sees fit to match the structure of its reply. In the examples provided by OpenAI, GPT-4 refused to deviate from its system message despite requests to do otherwise by the user during the conversation.
Aptitude on standardized tests
GPT-4 demonstrates aptitude on several standardized tests. OpenAI claims that in their own testing the model received a score of 1410 on the
SAT (94th
percentile), 163 on the
LSAT (88th percentile), and 298 on the
Uniform Bar Exam (90th percentile). In contrast, OpenAI claims that GPT-3.5 received scores for the same exams in the 82nd,
40th, and 10th percentiles respectively.
Medical knowledge
Researchers from Microsoft tested GPT-4 on medical problems and found "that GPT-4, without any specialized prompt crafting, exceeds the passing score on
USMLE
The United States Medical Licensing Examination (USMLE) is a three-step examination program for medical licensure in the United States sponsored by the Federation of State Medical Boards (FSMB) and the National Board of Medical Examiners (NBME). ...
by over 20 points and outperforms earlier general-purpose models (GPT-3.5) as well as models specifically fine-tuned on medical knowledge (
Med-PaLM
Palm most commonly refers to:
* Palm of the hand, the central region of the front of the hand
* Palm plants, of family Arecaceae
** List of Arecaceae genera
* Several other plants known as "palm"
Palm or Palms may also refer to:
Music
* Palm (ba ...
, a prompt-tuned version of Flan-PaLM 540B)".
Training
OpenAI did not release the technical details of GPT-4; the technical report explicitly refrained from specifying the model size, architecture, or hardware used during either training or
inference
Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that ...
. While the report described that the model was trained using a combination of first
supervised learning
Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
on a large
dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the d ...
, then
reinforcement learning using both human and AI feedback, it did not provide details of the training, including the process by which the training dataset was constructed, the computing power required, or any
hyperparameters
In Bayesian statistics, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis.
For example, if one is using a beta distribution to m ...
such as the
learning rate
In machine learning and statistics, the learning rate is a Hyperparameter (machine learning), tuning parameter in an Mathematical optimization, optimization algorithm that determines the step size at each iteration while moving toward a minimum of ...
, epoch count, or
optimizer(s) used. The report claimed that "the competitive landscape and the safety implications of large-scale models" were factors that influenced this decision.
Alignment
According to their report, OpenAI conducted internal adversarial testing on and before GPT-4's launch date with dedicated
red teams composed of researchers and industry professionals to mitigate potential vulnerabilities. As part of these efforts, they granted the Alignment Research Center (ARC) early access to the models to assess
power-seeking risks. As an example, ARC found that the model was able to encourage a
TaskRabbit crowdsource worker to solve a
CAPTCHA
A CAPTCHA ( , a contrived acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart") is a type of challenge–response test used in computing to determine whether the user is human.
The term was coined in 2003 b ...
for it, but was not able to autonomously replicate or acquire resources.
In order to properly refuse harmful prompts, outputs from GPT-4 were tweaked using the model itself as a tool. A GPT-4 classifier serving as a rule-based reward model (RBRM) would take prompts, the corresponding output from the GPT-4 policy model, and a human-written set of rules to classify the output according to the rubric. GPT-4 was then rewarded for refusing to respond to harmful prompts as classified by the RBRM.
Reception
U.S. Representatives
Don Beyer
Donald Sternoff Beyer Jr. (; born June 20, 1950) is an American businessman, diplomat, and politician serving as the U.S. representative for since 2015. A member of the Democratic Party, his district is in the heart of Northern Virginia and in ...
and
Ted Lieu
Ted W. Lieu (; born March 29, 1969) is an American politician and Air Force Reserve Command colonel who has represented California's 33rd congressional district in the U.S. House of Representatives since 2015. The district includes much of we ...
confirmed to the
New York Times
''The New York Times'' (''the Times'', ''NYT'', or the Gray Lady) is a daily newspaper based in New York City with a worldwide readership reported in 2020 to comprise a declining 840,000 paid print subscribers, and a growing 6 million paid ...
that
Sam Altman
Samuel H. Altman ( ; born April 22, 1985) is an American entrepreneur, investor, programmer, and blogger. He is the CEO of OpenAI and the former president of Y Combinator.
Early life and education
Altman grew up in St. Louis, Missouri; his mo ...
, CEO of OpenAI, visited
Congress
A congress is a formal meeting of the representatives of different countries, constituent states, organizations, trade unions, political parties, or other groups. The term originated in Late Middle English to denote an encounter (meeting of ...
in January 2023 to demonstrate GPT-4 and its improved "security controls" compared to other AI models.
According to ''
Vox'', GPT-4 "impressed observers with its markedly improved performance across reasoning, retention, and coding."
''
Mashable
Mashable is a digital media platform, news website and entertainment company founded by Pete Cashmore in 2005.
History
Mashable was founded by Pete Cashmore while living in Aberdeen, Scotland, in July 2005. Early iterations of the site were ...
'' agreed that GPT-4 was usually a significant improvement, but also judged that GPT-3 would occasionally give better answers in a side-by-side comparison.
Microsoft Research tested the model behind GPT-4 and concluded that "it could reasonably be viewed as an early (yet still incomplete) version of an
artificial general intelligence
Artificial general intelligence (AGI) is the ability of an intelligent agent to understand or learn any intellectual task that a human being can.
It is a primary goal of some artificial intelligence research and a common topic in science fict ...
(AGI) system".
AI safety concerns
In late March 2023, an open letter from the
Future of Life Institute signed by various AI researchers and tech executives called for the pausing of all training of AIs stronger than GPT-4 for 6 months, citing
AI safety
AI is artificial intelligence, intellectual ability in machines and robots.
Ai, AI or A.I. may also refer to:
Animals
* Ai (chimpanzee), an individual experimental subject in Japan
* Ai (sloth) or the pale-throated sloth, northern Amazonian mam ...
concerns amid a race of progress in the field. The signatories, which included figures such as AI pioneer
Yoshua Bengio,
Apple
An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus '' Malus''. The tree originated in Central Asia, where its wild ances ...
co-founder
Steve Wozniak
Stephen Gary Wozniak (; born August 11, 1950), also known by his nickname "Woz", is an American electronics engineer, computer programmer, philanthropist, inventor, and technology entrepreneur. In 1976, with business partner Steve Jobs, he c ...
, and
Tesla CEO
Elon Musk
Elon Reeve Musk ( ; born June 28, 1971) is a business magnate and investor. He is the founder, CEO and chief engineer of SpaceX; angel investor, CEO and product architect of Tesla, Inc.; owner and CEO of Twitter, Inc.; founder of The ...
, expressed concern about both near-term and
existential risks of AI development such as a potential
AI singularity
The technological singularity—or simply the singularity—is a hypothetical future point in time at which technological growth becomes uncontrollable and irreversible, resulting in unforeseeable changes to human civilization. According to the ...
. OpenAI CEO Sam Altman did not sign the letter, arguing that OpenAI already prioritizes safety.
Criticisms
While OpenAI released both the weights of the neural network and the technical details of GPT-2, and, although not releasing the weights, did release the technical details of GPT-3, OpenAI did not reveal either the weights or the technical details of GPT-4. This decision has been criticized by other AI researchers, who argue that it hinders open research into GPT-4's biases and safety.
Sasha Luccioni, a research scientist at
HuggingFace
Hugging Face, Inc. is an American company that develops tools for building applications using machine learning. It is most notable for its Transformers library built for natural language processing applications and its platform that allows users ...
, argued that the model was a "dead end" for the scientific community due to its closed nature, which prevents others from building upon GPT-4's improvements. HuggingFace co-founder Thomas Wolf argued that with GPT-4, "OpenAI is now a fully closed company with scientific communication akin to press releases for products".
Like its predecessor, GPT-4 has been known to
"hallucinate". The model has also been criticized for generating hateful, biased, and racist information.
Usage
ChatGPT Plus
ChatGPT Plus is a GPT-4 backed version of ChatGPT
available for a 20 USD per month subscription fee (the original version is backed by GPT-3.5). OpenAI also makes GPT-4 available to a select group of applicants through their GPT-4 API waitlist; after being accepted, an additional fee of 0.03 USD per 1000 tokens in the initial text provided to the model ("prompt"), and 0.06 USD per 1000 tokens that the model generates ("completion"), is required to use the version of the model with a 8192-token context window; for the 32768-token version, those prices are doubled.
Duolingo
Duolingo
Duolingo ( ) is an American educational technology company which produces learning apps and provides language certification.
On its main app, users can practice vocabulary, grammar, pronunciation and listening skills using spaced repetition. ...
integrated GPT-4 in their application through two new features, "Roleplay" and "Explain My Answer". The first version of this update is aimed only at English speakers who are learning French or Spanish, with plans to extend the features to other languages in the future.
Miðeind ehf
Iceland
Iceland ( is, Ísland; ) is a Nordic island country in the North Atlantic Ocean and in the Arctic Ocean. Iceland is the most sparsely populated country in Europe. Iceland's capital and largest city is Reykjavík, which (along with its ...
ic start-up Miðeind ehf, which works on
language preservation
Language preservation is the preservation of endangered or dead languages. With language death, studies in linguistics, anthropology, prehistory and psychology lose diversity. As history is remembered with the help of historic preservation, ...
, was selected by OpenAI as one of six companies to participate in an early beta test program of the new model.
Khan Academy
Khan Academy
Khan Academy is an American non-profit educational organization created in 2008 by Sal Khan. Its goal is creating a set of online tools that help educate students. The organization produces short lessons in the form of videos. Its website also i ...
uses GPT-4 to create a tutoring chatbot, which the organization names "Khanmigo". While it is in the "research phase", access to the chatbot is provided free to the students and teachers of 500 school districts who have "partnered" with Khan Academy. Public access is only offered to a limited number of users selected from a waitlist; after acceptance, a 20 USD per month fee is required to use the technology.
Be My Eyes
Be My Eyes
Be My Eyes is a Danish mobile app that aims to help blind and visually impaired people to recognize objects and cope with everyday situations. An online community of sighted volunteers receive photos or videos from randomly assigned affected in ...
, which helps visually impaired people to identify objects and navigate their surroundings, was the first app to incorporate GPT-4's image recognition capabilities, through a new "Virtual Volunteer" feature. The feature is an alternative to relying on human volunteers for the same tasks. The ''Be My Eyes'' "Virtual Volunteer" is in beta testing.
GitHub Copilot
GitHub Copilot announced a GPT-4 powered assistant named "Copilot X". The product provides another chat-style interface to GPT-4, allowing the programmer to receive answers to questions like "how do I vertically center a
div?". A feature termed "context-aware conversations" allows the user to highlight a portion of code within
Visual Studio Code
Visual Studio Code, also commonly referred to as VS Code, is a source-code editor made by Microsoft with the Electron Framework, for Windows, Linux and macOS. Features include support for debugging, syntax highlighting, intelligent code compl ...
and direct GPT-4 to perform actions on it, such as the writing of unit tests. Another feature allows summaries, or "code walkthroughs", to be autogenerated by GPT-4 for
pull requests submitted to GitHub. Copilot X also provides terminal integration, which allows the user to ask GPT-4 to generate shell commands based on natural language requests. , while GitHub provides access to a limited number of people selected through a waitlist, the release date as well as the cost of the product are still to be announced.
Microsoft Bing
Microsoft 365 Copilot
On 17 March 2023, Microsoft announced further integration of GPT-4 into its products, revealing
Microsoft 365 Copilot
Microsoft Copilot is a chatbot developed by Microsoft and launched on February 7, 2023. Based on a large language model, it is able to cite sources, create poems, and write both lyrics and music for songs generated by its Suno AI plugin. It i ...
, "embedded in the apps millions of people use everyday:
Word
A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consen ...
,
Excel
ExCeL London (an abbreviation for Exhibition Centre London) is an exhibition centre, international convention centre and former hospital in the Custom House area of Newham, East London. It is situated on a site on the northern quay of the ...
,
PowerPoint
Microsoft PowerPoint is a presentation program, created by Robert Gaskins and Dennis Austin at a software company named Forethought, Inc. It was released on April 20, 1987, initially for Macintosh computers only. Microsoft acquired PowerPoi ...
,
Outlook
Outlook or The Outlook may refer to:
Computing
* Microsoft Outlook, an e-mail and personal information management software product from Microsoft
* Outlook.com, a web mail service from Microsoft
* Outlook on the web, a suite of web applications ...
,
Teams, and more".
Stripe
Stripe
Stripe, striped, or stripes may refer to:
Decorations
* Stripe (pattern), a line or band that differs in colour or tone from an adjacent surface
* Racing stripe, a vehicle decoration
* Service stripe, a decoration of the U.S. military
Entertainme ...
utilizes GPT-4 to help with fraud detection, and to try to improve other aspects of the user experience.
Potential applications
Multimodal AI models such as GPT-4 may offer benefits for
personalized medicine
Personalized medicine, also referred to as precision medicine, is a medical model that separates people into different groups—with medical decisions, practices, interventions and/or products being tailored to the individual patient based on the ...
(tailoring treatments and interventions to individual patients based on their unique genetic and environmental factors) as well as remote healthcare by being able to act as virtual health assistants, or by helping to identify the most effective approaches. They may also help in the areas of digital clinical trials, pandemic surveillance, and digital twin technology.
References
{{Differentiable computing
OpenAI
Large language models
2023 software