OpenAI o3 is a

reflective Reflection is the change in direction of a wavefront at an interface between two different media so that the wavefront returns into the medium from which it originated. Common examples include the reflection of light, sound and water waves. The ...

generative pre-trained transformer A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an Neural network (machine learning), artificial neural network that is used in natural ...

(GPT) model developed by

OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...

as a successor to

OpenAI o1 OpenAI o1 is a reflective generative pre-trained transformer (GPT). A preview of o1 was released by OpenAI on September 12, 2024. o1 spends time "thinking" before it answers, making it better at complex reasoning tasks, science and programming th ...

for

ChatGPT ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other Multimodal learning, multimodal models to create human-like re ...

. It is designed to devote additional deliberation time when addressing questions that require step-by-step logical reasoning. On January 31, 2025, OpenAI released a smaller model, o3-mini, followed on April 16 by o3 and o4-mini.

History

The OpenAI o3 model was announced on December 20, 2024. It was called "o3" rather than "o2" to avoid

trademark A trademark (also written trade mark or trade-mark) is a form of intellectual property that consists of a word, phrase, symbol, design, or a combination that identifies a Good (economics and accounting), product or Service (economics), service f ...

conflict with the mobile carrier brand named O2. OpenAI invited safety and security researchers to apply for early access of these models until January 10, 2025. Similarly to o1, there are two different models: o3 and o3-mini. On January 31, 2025, OpenAI released o3-mini to all

users (including free-tier) and some

API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...

users. OpenAI describes o3-mini as a "specialized alternative" to o1 for "technical domains requiring precision and speed". o3-mini features three reasoning effort levels: low, medium and high. The free version uses medium. The variant using more compute is called o3-mini-high, and is available to paid subscribers. Subscribers to ChatGPT's Pro tier have unlimited access to both o3-mini and o3-mini-high. On February 2, OpenAI launched OpenAI Deep Research, a ChatGPT service using a version of o3 that makes comprehensive reports within 5 to 30 minutes, based on web searches. On February 6, in response to pressure from rivals like

DeepSeek Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., Trade name, doing business as DeepSeek, is a Chinese artificial intelligence company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, Deepse ...

, OpenAI announced an update aimed at enhancing the transparency of the thought process in its o3-mini model. On February 12, OpenAI further increased rate limits for o3-mini-high to 50 requests per day (from 50 requests per week) for ChatGPT Plus subscribers, and implemented file/image upload support. On April 16, 2025, OpenAI released o3 and o4-mini, a successor of o3-mini. On June 10, OpenAI released o3-pro, which the company claims is its most capable model yet. OpenAI stated: "We recommend using it for challenging questions where reliability matters more than speed, and waiting a few minutes is worth the tradeoff".

Capabilities

Reinforcement learning Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...

was used to teach o3 to "think" before generating answers, using what OpenAI refers to as a "private chain of thought". This approach enables the model to plan ahead and reason through tasks, performing a series of intermediate reasoning steps to assist in solving the problem, at the cost of additional computing power and increased latency of responses. o3 demonstrates significantly better performance than o1 on complex tasks, including coding,

mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...

, and

science Science is a systematic discipline that builds and organises knowledge in the form of testable hypotheses and predictions about the universe. Modern science is typically divided into twoor threemajor branches: the natural sciences, which stu ...

. OpenAI reported that o3 achieved a score of 87.7% on the GPQA Diamond benchmark, which contains expert-level science questions not publicly available online. On SWE-bench Verified, a

software engineering Software engineering is a branch of both computer science and engineering focused on designing, developing, testing, and maintaining Application software, software applications. It involves applying engineering design process, engineering principl ...

benchmark Benchmark may refer to: Business and economics * Benchmarking, evaluating performance within organizations * Benchmark price * Benchmark (crude oil), oil-specific practices Science and technology * Experimental benchmarking, the act of defining a ...

assessing the ability to solve real

GitHub GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...

issues, o3 scored 71.7%, compared to 48.9% for o1. On

Codeforces Codeforces () is a website that hosts competitive programming contests. It is maintained by a group of competitive programmers from ITMO University led by Mikhail Mirzayanov. Since 2013, Codeforces claims to surpass TopCoder in terms of active co ...

, o3 reached an Elo score of 2727, whereas o1 scored 1891. On the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) benchmark, which evaluates an AI's ability to handle new logical and skill acquisition problems, o3 attained three times the accuracy of o1.

References

External links

''Introducing OpenAI o3 and o4-mini''''O3 is 80% cheaper and introducing o3-pro''
{{Generative AI Large language models 2024 software Generative pre-trained transformers OpenAI ChatGPT