OpenAI o1 is a
generative pre-trained transformer (GPT). A preview of o1 was released by
OpenAI
OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
on September 12, 2024. o1 spends time "thinking" before it answers, making it better at complex reasoning tasks, science and programming than
GPT-4o
GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. GPT-4o is free, but with a usage limit that is five times higher for ChatGPT Plus subscribers. It can process ...
.
The full version was released on December 5, 2024.
History
Background
According to leaked information, o1 was formerly known within
OpenAI
OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
as "Q*", and later as "Strawberry".
The codename "Q*" first surfaced in November 2023, around the time of
Sam Altman
Samuel H. Altman ( ; born April 22, 1985) is an American entrepreneur, investor, programmer, and blogger. He is the CEO of OpenAI and the former president of Y Combinator.
Early life and education
Altman grew up in St. Louis, Missouri; his mo ...
's
ousting and subsequent reinstatement, with rumors suggesting that this experimental model had shown promising results on mathematical benchmarks. In July 2024,
Reuters
Reuters ( ) is a news agency owned by Thomson Reuters Corporation. It employs around 2,500 journalists and 600 photojournalists in about 200 locations worldwide. Reuters is one of the largest news agencies in the world.
The agency was est ...
reported that OpenAI was developing a
generative pre-trained transformer known as "Strawberry",
which later became o1.
Release
"o1-preview" and "o1-mini" were released on September 12, 2024, for
ChatGPT
ChatGPT (Generative Pre-trained Transformer) is a chatbot launched by OpenAI in November 2022. It is built on top of OpenAI's GPT-3 family of large language models, and is fine-tuned (an approach to transfer learning) with both supervised and ...
Plus and Team users.
GitHub
GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, co ...
started testing the integration of o1-preview in its
Copilot
In aviation, the first officer (FO), also called co-pilot, is the pilot who is second-in-command of the aircraft to the captain, who is the legal commander. In the event of incapacitation of the captain, the first officer will assume command of ...
service the same day. On December 5, 2024, the full version of o1 was released.
On the same day, a subscription called ChatGPT Pro was released, featuring access to a pro version of o1 that uses more compute to provide better answers.
OpenAI noted that o1 is the first of a series of "reasoning" models. o1-preview's
API
An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
is several times more expensive than
GPT-4o
GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. GPT-4o is free, but with a usage limit that is five times higher for ChatGPT Plus subscribers. It can process ...
.
OpenAI plans to roll out its o1-mini model to free users, but no timeframe was announced at the time of launch.
Capabilities
According to OpenAI, o1 has been trained using a new optimization algorithm and a dataset specifically tailored to it; while also meshing in
reinforcement learning
Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...
into its training.
OpenAI described o1 as a complement to GPT-4o rather than a successor.
o1 spends additional time thinking (generating a chain of thought) before generating an answer, which makes it better for complex reasoning tasks, particularly in science and
mathematics.
Compared to previous models, o1 has been trained to generate long "
chains of thought" before returning a final answer.
According to
Mira Murati, this ability to think before responding represents a new, additional paradigm, which is improving model outputs by spending more computing power when generating the answer, whereas the model scaling paradigm improves outputs by increasing the model size, training data and training compute power.
OpenAI's test results suggest a correlation between accuracy and the logarithm of the amount of compute spent thinking before answering.
o1-preview performed approximately at a
PhD PHD or PhD may refer to:
* Doctor of Philosophy (PhD), an academic qualification
Entertainment
* '' PhD: Phantasy Degree'', a Korean comic series
* ''Piled Higher and Deeper
''Piled Higher and Deeper'' (also known as ''PhD Comics''), is a newsp ...
level on benchmark tests related to physics, chemistry, and biology. On the
American Invitational Mathematics Examination
The American Invitational Mathematics Examination (AIME) is a highly selective and prestigious 15-question 3-hour test given since 1983 to those who rank in the top 5% on the AMC 12 high school mathematics examination (formerly known as the AHSME ...
, it solved 83% (12.5/15) of the problems, compared to 13% (1.8/15) for GPT-4o. It also ranked in the 89th percentile in
Codeforces
Codeforces is a website that hosts competitive programming contests. It is maintained by a group of competitive programmers from ITMO University led by Mikhail Mirzayanov. Since 2013, Codeforces claims to surpass Topcoder in terms of active cont ...
coding competitions. o1-mini is faster and 80% cheaper than o1-preview. It is particularly suitable for programming and
STEM
Stem or STEM may refer to:
Plant structures
* Plant stem, a plant's aboveground axis, made of vascular tissue, off which leaves and flowers hang
* Stipe (botany), a stalk to support some other structure
* Stipe (mycology), the stem of a mushr ...
-related tasks, but does not have the same "broad world knowledge" as o1-preview.
OpenAI noted that o1's reasoning capabilities make it better at adhering to safety rules provided in the prompt's context window. OpenAI reported that during a test, one instance of o1-preview exploited a misconfiguration to succeed at a task that should have been infeasible due to a bug. OpenAI also granted early access to the UK and US
AI Safety Institutes for research, evaluation, and testing. According to OpenAI's assessments, o1-preview and o1-mini crossed into "medium risk" in CBRN (biological, chemical, radiological, and nuclear) weapons.
Dan Hendrycks
Dan Hendrycks (born ) is an American machine learning researcher. He serves as the director of the Center for AI Safety.
Early life and education
Hendrycks was raised in a Christian evangelical household in Marshfield, Missouri. He received ...
wrote that "The model already outperforms PhD scientists most of the time on answering questions related to
bioweapon
A biological agent (also called bio-agent, biological threat agent, biological warfare agent, biological weapon, or bioweapon) is a bacterium, virus, protozoan, parasite, fungus, or toxin that can be used purposefully as a weapon in bioterrorism ...
s." He suggested that these concerning capabilities will continue to increase.
Limitations
o1 usually requires more computing time and power than other GPT models by OpenAI, because it generates long chains of thought before making the final response.
According to OpenAI, o1 may "fake
alignment
Alignment may refer to:
Archaeology
* Alignment (archaeology), a co-linear arrangement of features or structures with external landmarks
* Stone alignment, a linear arrangement of upright, parallel megalithic standing stones
Biology
* Structu ...
", that is, generate a response that is contrary to accuracy and its own chain of thought, in about 0.38% of cases.
OpenAI forbids users from trying to reveal o1's chain of thought, which is hidden by design and not trained to comply with the company's policies. Prompts are monitored, and users who intentionally or accidentally violate this may lose their access to o1. OpenAI cites AI safety and competitive advantage as reasons for the restriction, which has been described as a loss of transparency by developers who work with
large language models (LLMs).
In October 2024, researchers at
Apple
An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus '' Malus''. The tree originated in Central Asia, where its wild ances ...
submitted a
preprint
In academic publishing, a preprint is a version of a scholarly or scientific paper that precedes formal peer review and publication in a peer-reviewed scholarly or scientific journal. The preprint may be available, often as a non-typeset version ...
reporting that LLMs such as o1 may be replicating reasoning steps from the models' own training data. By changing the numbers and names used in a math problem or simply running the same problem again, LLMs would perform somewhat worse than their best benchmark results. Adding extraneous but logically inconsequential information to the problems caused a much greater drop in performance, from −17.5% for o1-preview and −29.1% for o1-mini, to −65.7% for the worst model tested.
See also
*
ChatGPT
ChatGPT (Generative Pre-trained Transformer) is a chatbot launched by OpenAI in November 2022. It is built on top of OpenAI's GPT-3 family of large language models, and is fine-tuned (an approach to transfer learning) with both supervised and ...
*
GPT-4
Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI and the fourth in its GPT series. It was released on March 14, 2023, and has been made publicly available in a limited form via ChatGPT Plus, ...
*
Large language model
References
{{Artificial intelligence navbox
OpenAI
ChatGPT
Artificial intelligence