OpenAI o1 is a

reflective Reflection is the change in direction of a wavefront at an interface between two different media so that the wavefront returns into the medium from which it originated. Common examples include the reflection of light, sound and water waves. The ...

generative pre-trained transformer A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an Neural network (machine learning), artificial neural network that is used in natural ...

(GPT). A preview of o1 was released by

OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...

on September 12, 2024. o1 spends time "thinking" before it answers, making it better at complex reasoning tasks, science and programming than

GPT-4o GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. It can process and generate text, images and audio. GPT-4o is free, but ChatGPT Plus subscribers have higher ...

. The full version was released to

ChatGPT ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other Multimodal learning, multimodal models to create human-like re ...

users on December 5, 2024.

History

Background

According to leaked information, o1 was formerly known within

as "Q*", and later as "Strawberry". The codename "Q*" first surfaced in November 2023, around the time of

Sam Altman Samuel Harris Altman (born April 22, 1985) is an American technology entrepreneur, investor, and the chief executive officer of OpenAI since 2019 (he was Removal of Sam Altman from OpenAI, briefly dismissed and reinstated in November 2023). He ...

's ousting and subsequent reinstatement, with rumors suggesting that this experimental model had shown promising results on mathematical benchmarks. In July 2024,

Reuters Reuters ( ) is a news agency owned by Thomson Reuters. It employs around 2,500 journalists and 600 photojournalists in about 200 locations worldwide writing in 16 languages. Reuters is one of the largest news agencies in the world. The agency ...

reported that OpenAI was developing a

known as "Strawberry", which later became o1.

Release

"o1-preview" and "o1-mini" were released on September 12, 2024, for ChatGPT Plus and Team users.

GitHub GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...

started testing the integration of o1-preview in its

Copilot In aviation, the first officer (FO), also called co-pilot, is a Aircraft pilot, pilot in addition to the Pilot in command, captain, who is the legal commander. In the event of incapacitation of the captain, the first officer will assume command ...

service the same day. On December 5, 2024, the full version of o1 was released. On the same day, a subscription called ChatGPT Pro was released, featuring access to a pro version of o1 that uses more compute to provide better answers. In January 2025, o1 was integrated into

Microsoft Copilot Microsoft Copilot (or simply Copilot) is a generative artificial intelligence chatbot developed by Microsoft. Based on the GPT-4 series of large language models, it was launched in 2023 as Microsoft's primary replacement for the discontinued C ...

. o1-preview's

API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...

is several times more expensive than

. As of January 2025, API usage for the full o1 model is limited to developers on usage tier 5. OpenAI noted that o1 is the first of a series of "reasoning" models. OpenAI shared in December 2024 benchmark results for its successor, o3 (the name o2 was skipped to avoid trademark conflict with the mobile carrier brand named O2). In March 2025, OpenAI released the o1-pro API, its most expensive AI model to date. The pricing is set at $150 per 1 million input tokens and $600 per 1 million output tokens.

Capabilities

According to OpenAI, o1 has been trained using a new optimization algorithm and a dataset specifically tailored to it; while also meshing in

reinforcement learning Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...

into its training. OpenAI described o1 as a complement to GPT-4o rather than a successor. o1 spends additional time thinking (generating a chain of thought) before generating an answer, which makes it better for complex reasoning tasks, particularly in science and

mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...

. Compared to previous models, o1 has been trained to generate long " chains of thought" before returning a final answer. According to Mira Murati, this ability to think before responding represents a new, additional paradigm, which is improving model outputs by spending more computing power when generating the answer, whereas the model scaling paradigm improves outputs by increasing the model size, training data and training compute power. OpenAI's test results suggest a correlation between accuracy and the logarithm of the amount of compute spent thinking before answering. o1-preview performed approximately at a

PhD A Doctor of Philosophy (PhD, DPhil; or ) is a terminal degree that usually denotes the highest level of academic achievement in a given discipline and is awarded following a course of graduate study and original research. The name of the deg ...

level on benchmark tests related to physics, chemistry, and biology. On the

American Invitational Mathematics Examination The American Invitational Mathematics Examination (AIME) is a selective 15-question, 3-hour test given since 1983 to those who rank in the top 5% on the AMC 12 high school mathematics examination (formerly known as the AHSME), and starting in 201 ...

, it solved 83% (12.5/15) of the problems, compared to 13% (1.8/15) for GPT-4o. It also ranked in the 89th percentile in Codeforces coding competitions. o1-mini is faster and 80% cheaper than o1-preview. It is particularly suitable for programming and

STEM Stem or STEM most commonly refers to: * Plant stem, a structural axis of a vascular plant * Stem group * Science, technology, engineering, and mathematics Stem or STEM can also refer to: Language and writing * Word stem, part of a word respon ...

-related tasks, but does not have the same "broad world knowledge" as o1-preview. OpenAI noted that o1's reasoning capabilities make it better at adhering to safety rules provided in the prompt's context window. OpenAI reported that during a test, one instance of o1-preview exploited a misconfiguration to succeed at a task that should have been infeasible due to a bug. OpenAI also granted early access to the UK and US AI Safety Institutes for research, evaluation, and testing. According to OpenAI's assessments, o1-preview and o1-mini crossed into "medium risk" in CBRN (biological, chemical, radiological, and nuclear) weapons. Dan Hendrycks wrote that "The model already outperforms PhD scientists most of the time on answering questions related to

bioweapon Biological agents, also known as biological weapons or bioweapons, are pathogens used as weapons. In addition to these living or replicating pathogens, toxins and biotoxins are also included among the bio-agents. More than 1,200 different kin ...

s." He suggested that these concerning capabilities will continue to increase.

Limitations

o1 usually requires more computing time and power than other GPT models by OpenAI, because it generates long chains of thought before making the final response. According to OpenAI, o1 may "fake

alignment Alignment may refer to: Archaeology * Alignment (archaeology), a co-linear arrangement of features or structures with external landmarks * Stone alignment, a linear arrangement of upright, parallel megalithic standing stones Biology * Struc ...

", that is, generate a response that is contrary to accuracy and its own chain of thought, in about 0.38% of cases. OpenAI forbids users from trying to reveal o1's chain of thought, which is hidden by design and not trained to comply with the company's policies. Prompts are monitored, and users who intentionally or accidentally violate this may lose their access to o1. OpenAI cites AI safety and competitive advantage as reasons for the restriction, which has been described as a loss of transparency by developers who work with

large language model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are g ...

s (LLMs). In October 2024, researchers at

Apple An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...

submitted a

preprint In academic publishing, a preprint is a version of a scholarly or scientific paper that precedes formal peer review and publication in a peer-reviewed scholarly or scientific journal. The preprint may be available, often as a non-typeset versi ...

reporting that LLMs such as o1 may be replicating reasoning steps from the models' own training data. By changing the numbers and names used in a math problem or simply running the same problem again, LLMs would perform somewhat worse than their best benchmark results. Adding extraneous but logically inconsequential information to the problems caused a much greater drop in performance, from −17.5% for o1-preview and −29.1% for o1-mini, to −65.7% for the worst model tested.

References

{{Generative AI 2024 software Generative pre-trained transformers Large language models OpenAI ChatGPT