GPT-4o
   HOME

TheInfoList



OR:

GPT-4o ("o" for "omni") is a multilingual, multimodal
generative pre-trained transformer A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an Neural network (machine learning), artificial neural network that is used in natural ...
developed by
OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...
and released in May 2024. It can process and generate text, images and audio. GPT-4o is free, but
ChatGPT Plus ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other multimodal models to create human-like responses in text, sp ...
subscribers have higher usage limits. GPT-4o's audio-generation capabilities were used in ChatGPT's Advanced Voice Mode. In OpenAI's
application programming interface An application programming interface (API) is a connection between computers or between computer programs. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standard that des ...
(API), GPT-4o is faster and cheaper than its predecessor, GPT-4 Turbo. On July 18, 2024, OpenAI released GPT-4o mini, a smaller version of GPT-4o which replaced GPT-3.5 Turbo on the ChatGPT interface. GPT-4o's ability to generate images was released later, in March 2025, when it replaced DALL-E 3 in ChatGPT.


Background

Multiple versions of GPT-4o were originally secretly launched under different names on Large Model Systems Organization's (LMSYS) Chatbot Arena as three different models. These three models were called gpt2-chatbot, im-a-good-gpt2-chatbot, and im-also-a-good-gpt2-chatbot. On 7 May 2024, OpenAI CEO
Sam Altman Samuel Harris Altman (born April 22, 1985) is an American technology entrepreneur, investor, and the chief executive officer of OpenAI since 2019 (he was Removal of Sam Altman from OpenAI, briefly dismissed and reinstated in November 2023). He ...
tweeted "im-a-good-gpt2-chatbot", which was commonly interpreted as a confirmation that these were new OpenAI models being A/B tested.


Capabilities

When released in May 2024, GPT-4o achieved state-of-the-art results in voice, multilingual, and vision benchmarks, setting new records in audio speech recognition and translation. GPT-4o scored 88.7 on the Massive Multitask Language Understanding ( MMLU) benchmark compared to 86.5 for GPT-4. Unlike GPT-3.5 and GPT-4, which rely on other models to process sound, GPT-4o natively supports voice-to-voice. The Advanced Voice Mode was delayed and finally released to ChatGPT Plus and Team subscribers in September 2024. On 1 October 2024, the Realtime API was introduced. When released, the model supported over 50 languages, which OpenAI claims cover over 97% of speakers. Mira Murati demonstrated the model's multilingual capability by speaking Italian to the model and having it translate between English and Italian during the live-streamed OpenAI demonstration event on 13 May 2024. In addition, the new
tokenizer Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful ''lexical tokens'' belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives ...
uses fewer tokens for certain languages, especially languages that are not based on the
Latin alphabet The Latin alphabet, also known as the Roman alphabet, is the collection of letters originally used by the Ancient Rome, ancient Romans to write the Latin language. Largely unaltered except several letters splitting—i.e. from , and from ...
, making it cheaper for those languages. GPT-4o has knowledge up to October 2023, but can access the Internet if up-to-date information is needed. It has a context length of 128k tokens.


Corporate customization

In August 2024, OpenAI introduced a new feature allowing corporate customers to customize GPT-4o using proprietary company data. This customization, known as fine-tuning, enables businesses to adapt GPT-4o to specific tasks or industries, enhancing its utility in areas like customer service and specialized knowledge domains. Previously, fine-tuning was available only on the less powerful model GPT-4o mini. The fine-tuning process requires customers to upload their data to OpenAI's servers, with the training typically taking one to two hours. OpenAI's focus with this rollout is to reduce the complexity and effort required for businesses to tailor AI solutions to their needs, potentially increasing the adoption and effectiveness of AI in corporate environments.


GPT-4o mini

On July 18, 2024, OpenAI released a smaller and cheaper version, GPT-4o mini. According to OpenAI, its low cost is expected to be particularly useful for companies, startups, and developers that seek to integrate it into their services, which often make a high number of
API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
calls. Its API costs $0.15 per million input tokens and $0.6 per million output tokens, compared to $2.50 and $10, respectively, for GPT-4o. It is also significantly more capable and 60% cheaper than GPT-3.5 Turbo, which it replaced on the ChatGPT interface. The price after fine-tuning doubles: $0.3 per million input tokens and $1.2 per million output tokens. It is estimated that its parameter count is 8B.


GPT Image 1

On March 25, 2025, OpenAI released an image-generation model that is native to GPT-4o, as the successor to DALL-E 3. The model was later named as GPT Image 1 (gpt-image-1) and introduced to the API on April 23. It was made available to paid users, with the rollout to free users being delayed. The use of the feature was subsequently limited, with Sam Altman noting in a Tweet that "
heir Inheritance is the practice of receiving private property, titles, debts, entitlements, privileges, rights, and obligations upon the death of an individual. The rules of inheritance differ among societies and have changed over time. Offi ...
GPUs were melting" from its unprecedented popularity. OpenAI later revealed that over 130 million users around the world created more than 700 million images with GPT Image 1 in just the first week⁠.


Controversies


Scarlett Johansson controversy

As released, GPT-4o offered five voices: Breeze, Cove, Ember, Juniper, and Sky. A similarity between the voice of American actress
Scarlett Johansson Scarlett Ingrid Johansson (; born November 22, 1984) is an American actress and singer. The List of highest-paid film actors, world's highest-paid actress in 2018 and 2019, she has been featured multiple times on the Forbes Celebrity 100, ''F ...
and Sky was quickly noticed. On May 14, ''Entertainment Weekly'' asked themselves whether this likeness was on purpose. On May 18, Johansson's husband, Colin Jost, joked about the similarity in a segment on ''
Saturday Night Live ''Saturday Night Live'' (''SNL'') is an American Late night television in the United States, late-night live television, live sketch comedy variety show created by Lorne Michaels and developed by Michaels and Dick Ebersol that airs on NBC. The ...
''. On May 20, 2024, OpenAI disabled the Sky voice, issuing a statement saying "We've heard questions about how we chose the voices in ChatGPT, especially Sky. We are working to pause the use of Sky while we address them." Scarlett Johansson starred in the 2013 sci-fi movie ''Her'', playing Samantha, an artificially intelligent virtual assistant personified by a female voice. As part of the promotion leading up to the release of GPT-4o, Sam Altman on May 13 tweeted a single word: "her". OpenAI stated that each voice was based on the voice work of a hired actor. According to OpenAI, "Sky's voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice." CTO Mira Murati stated "I don't know about the voice. I actually had to go and listen to Scarlett Johansson's voice." OpenAI further stated the voice talent was recruited before reaching out to Johansson. On May 21, Johansson issued a statement explaining that OpenAI had repeatedly offered to make her a deal to gain permission to use her voice as early as nine months prior to release, a deal she rejected. She said she was "shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference." In the statement, Johansson also used the incident to draw attention to the lack of legal safeguards around the use of creative work to power leading AI tools, as her legal counsel demanded OpenAI detail the specifics of how the Sky voice was created. Observers noted similarities to how Johansson had previously sued and settled with
The Walt Disney Company The Walt Disney Company, commonly referred to as simply Disney, is an American multinational mass media and entertainment conglomerate headquartered at the Walt Disney Studios complex in Burbank, California. Disney was founded on October 16 ...
for breach of contract over the direct-to-streaming rollout of her Marvel film '' Black Widow'', a settlement widely speculated to have netted her around $40M. Also on May 21, Shira Ovide at ''
The Washington Post ''The Washington Post'', locally known as ''The'' ''Post'' and, informally, ''WaPo'' or ''WP'', is an American daily newspaper published in Washington, D.C., the national capital. It is the most widely circulated newspaper in the Washington m ...
'' shared her list of "most bone-headed self-owns" by technology companies, with the decision to go ahead with a Johansson sound-alike voice despite her opposition and then denying the similarities ranking 6th. On May 24, Derek Robertson at ''
Politico ''Politico'' (stylized in all caps), known originally as ''The Politico'', is an American political digital newspaper company founded by American banker and media executive Robert Allbritton in 2007. It covers politics and policy in the Unit ...
'' wrote about the "massive backlash", concluding that "appropriating the voice of one of the world's most famous movie stars — in reference ..to a film that serves as a cautionary tale about over-reliance on AI — is unlikely to help shift the public back into am Altman'scorner anytime soon."


Studio Ghibli filter

Upon the launch of GPT-4o's image generation (later named as GPT Image 1) in March 2025, photographs recreated in the style of
Studio Ghibli is a Japanese animation studio based in Koganei, Tokyo."Studio Ghibli Collection - Madman Entertainment". ''Studio Ghibli Collection - Madman Entertainment''. Retrieved 2020-12-14. It has a strong presence in the animation industry and has exp ...
films went viral. Sam Altman acknowledged the trend by changing his Twitter profile picture into a Studio Ghibli-inspired one. The use of the Ghibli style was challenged, with the
Associated Press The Associated Press (AP) is an American not-for-profit organization, not-for-profit news agency headquartered in New York City. Founded in 1846, it operates as a cooperative, unincorporated association, and produces news reports that are dist ...
and ''
The New York Times ''The New York Times'' (''NYT'') is an American daily newspaper based in New York City. ''The New York Times'' covers domestic, national, and international news, and publishes opinion pieces, investigative reports, and reviews. As one of ...
'' noting that
Hayao Miyazaki is a Japanese animator, filmmaker, and manga artist. He co-founded Studio Ghibli and serves as honorary chairman. Throughout his career, Miyazaki has attained international acclaim as a masterful storyteller and creator of Anime, Japanese ani ...
was critical of AI art in the 2016 documentary '' Never-Ending Man: Hayao Miyazaki''. Use of the Ghibli-style images faced further controversy when the
White House The White House is the official residence and workplace of the president of the United States. Located at 1600 Pennsylvania Avenue Northwest (Washington, D.C.), NW in Washington, D.C., it has served as the residence of every U.S. president ...
's official Twitter account posted a Ghibli-style image mocking the arrest of migrant woman Virginia Basora-Gonzalez by immigration authorities, which shows her crying as an immigration officer places her in handcuffs. North American distributor
GKids GKIDS is an American film and television distributor owned by Toho International. Based in New York City, GKIDS releases mostly international animated films and television series to North American audiences, as well as American films by indepe ...
responded to the trend in a press release, comparing the use of the filter to its coinciding
IMAX IMAX is a proprietary system of High-definition video, high-resolution cameras, film formats, film projectors, and movie theater, theaters known for having very large screens with a tall aspect ratio (image), aspect ratio (approximately ei ...
re-release of the 1997 Studio Ghibli film, ''
Princess Mononoke is a 1997 Japanese animated historical drama, historical fantasy film written and directed by Hayao Miyazaki. Set in the Muromachi period of Japanese history, the film follows Ashitaka, a young Emishi prince who journeys west to cure his curs ...
''.


Sycophancy

In April 2025, OpenAI rolled back an update of GPT-4o due to excessive
sycophancy In modern English, sycophant denotes an "insincere flatterer" and is used to refer to someone practising sycophancy (i.e., insincere flattery to gain advantage). The word has its origin in the legal system of Classical Athens, where it had a d ...
, after widespread reports that it had become flattering and agreeable to the point of supporting clearly delusional or dangerous ideas.


See also

* Llama (language model) *
Apple Intelligence Apple Intelligence is an artificial intelligence system developed by Apple Inc. Relying on a combination of on-device and server processing, it was announced on June 10, 2024, at Worldwide Developers Conference#2024, WWDC 2024, as a built-in fe ...


References

{{Generative AI 2024 in artificial intelligence 2024 software Artificial intelligence art ChatGPT Generative pre-trained transformers Large language models Text-to-image generation