GPT-4o ("o" for "omni") is a multilingual,
multimodal generative pre-trained transformer developed by
OpenAI
OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
and released in May 2024.
GPT-4o is free, but with a usage limit that is five times higher for
ChatGPT Plus subscribers. It can process and generate text, images and audio. Its
application programming interface (API) is twice as fast and half the price of its predecessor,
GPT-4 Turbo.
Background
Multiple versions of GPT-4o were originally secretly launched under different names on Large Model Systems Organization's (LMSYS) Chatbot Arena as three different models. These three models were called gpt2-chatbot, im-a-good-gpt2-chatbot, and im-also-a-good-gpt2-chatbot. On 7 May 2024,
Sam Altman
Samuel H. Altman ( ; born April 22, 1985) is an American entrepreneur, investor, programmer, and blogger. He is the CEO of OpenAI and the former president of Y Combinator.
Early life and education
Altman grew up in St. Louis, Missouri; his mo ...
tweeted "im-a-good-gpt2-chatbot", which was commonly interpreted as a confirmation that these were new OpenAI models being
A/B tested.
Capabilities
GPT-4o achieved state-of-the-art results in voice, multilingual, and vision benchmarks, setting new records in audio speech recognition and translation. GPT-4o scored 88.7 on the Massive Multitask Language Understanding (
MMLU) benchmark compared to 86.5 by GPT-4.
Unlike
GPT-3.5 and GPT-4, which rely on other models to process sound, GPT-4o natively supports voice-to-voice.
Sam Altman noted on 15 May 2024 that GPT-4o's voice-to-voice capabilities were not yet integrated into ChatGPT, and that the old version was still being used. This new mode, called Advanced Voice Mode, is currently in limited alpha release and is based on the 4o-audio-preview. On 1 October 2024, the Realtime API was introduced.
The model supports over 50 languages,
which OpenAI claims cover over 97% of speakers. Mira Murati demonstrated the model's multilingual capability by speaking Italian to the model and having it translate between English and Italian during the live-streamed OpenAI demonstration event on 13 May 2024. In addition, the new
tokenizer uses fewer tokens for certain languages, especially languages that are not based on the
Latin alphabet
The Latin alphabet or Roman alphabet is the collection of letters originally used by the ancient Romans to write the Latin language. Largely unaltered with the exception of extensions (such as diacritics), it used to write English and the ...
, making it cheaper for those languages.
GPT-4o has knowledge up to October 2023,
but can access the Internet if up-to-date information is needed. It has a context length of 128k tokens
with an output token limit capped to 4,096,
and after a later update (gpt-4o-2024-08-06) to 16,384.
As of May 2024, it is the leading model in the LMSYS
Elo
Elo or ELO may refer to:
Music
* Electric Light Orchestra, a British rock music group
** The Electric Light Orchestra (album), ''The Electric Light Orchestra'' (album), the group's debut album
** ''ELO 2'', the group's second album
* ELO Part II ...
Arena Benchmarks by the
University of California, Berkeley
The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California) is a public land-grant research university in Berkeley, California. Established in 1868 as the University of California, it is the state's first land-grant u ...
.
Corporate customization
In August 2024, OpenAI introduced a new feature allowing corporate customers to customize GPT-4o using proprietary company data. This customization, known as
fine-tuning, enables businesses to adapt GPT-4o to specific tasks or industries, enhancing its utility in areas like customer service and specialized knowledge domains. Previously, fine-tuning was available only on the less powerful model GPT-4o mini.
The fine-tuning process requires customers to upload their data to OpenAI's servers, with the training typically taking one to two hours. Initially, the customization will be limited to text-based data. OpenAI's focus with this rollout is to reduce the complexity and effort required for businesses to tailor AI solutions to their needs, potentially increasing the adoption and effectiveness of AI in corporate environments.
GPT-4o mini
On July 18, 2024, OpenAI released a smaller and cheaper version, GPT-4o mini.
According to OpenAI, its low cost is expected to be particularly useful for companies, startups, and developers that seek to integrate it into their services, which often make a high number of
API
An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
calls. Its API costs $0.15 per million input tokens and $0.6 per million output tokens, compared to $5 and $15, respectively, for GPT-4o. It is also significantly more capable and 60% cheaper than GPT-3.5 Turbo, which it replaced on the ChatGPT interface.
The price after
fine-tuning doubles: $0.3 per million input tokens and $1.2 per million output tokens.
GPT-4o mini is the default model for users not logged in who use ChatGPT as guests and those who have hit the limit for GPT-4o.
GPT-4o mini will become available in fall 2024 on Apple's mobile devices and Mac desktops, through the
Apple Intelligence feature.
Scarlett Johansson controversy
As released, GPT-4o offered five voices: Breeze, Cove, Ember, Juniper, and Sky. A similarity between the voice of American actress
Scarlett Johansson
Scarlett Ingrid Johansson (; born November 22, 1984) is an American actress. The world's highest-paid actress in 2018 and 2019, she has featured multiple times on the ''Forbes'' Celebrity 100 list. ''Time'' magazine named her one of the 100 ...
and Sky was quickly noticed. On May 14, ''Entertainment Weekly'' asked themselves whether this likeness was on purpose. On May 18, Johansson's husband,
Colin Jost
Colin Kelly Jost (; born June 29, 1982) is an American comedian, actor, and writer. He has been a writer for ''Saturday Night Live'' (SNL) since 2005 and '' Weekend Update'' co-anchor since 2014. He also served as one of the show's co-head writ ...
, joked about the similarity in a segment on ''
Saturday Night Live
''Saturday Night Live'' (often abbreviated to ''SNL'') is an American late-night live television sketch comedy and variety show created by Lorne Michaels and developed by Dick Ebersol that airs on NBC and Peacock. Michaels currently serve ...
''. On May 20, 2024, OpenAI disabled the Sky voice, issuing a statement saying "We've heard questions about how we chose the voices in ChatGPT, especially Sky. We are working to pause the use of Sky while we address them."
Scarlett Johansson starred in the 2013 sci-fi movie
''Her'', playing Samantha, an artificially intelligent virtual assistant personified by a female voice.
As part of the promotion leading up to the release of GPT-4o, Sam Altman on May 13 tweeted a single word: "her".
OpenAI stated that each voice was based on the voice work of a hired actor. According to OpenAI, "Sky's voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice."
CTO Mira Murati stated "I don't know about the voice. I actually had to go and listen to Scarlett Johansson's voice." OpenAI further stated the voice talent was recruited before reaching out to Johansson.
On May 21, Johansson issued a statement explaining that OpenAI had repeatedly offered to make her a deal to gain permission to use her voice as early as nine months prior to release, a deal she rejected. She said she was "shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference." In the statement, Johansson also used the incident to draw attention to the lack of legal safeguards around the use of creative work to power leading AI tools, as her legal counsel demanded OpenAI detail the specifics of how the Sky voice was created.
Observers noted similarities to how Johansson had
previously sued and settled with
The Walt Disney Company
The Walt Disney Company, commonly known as Disney (), is an American multinational mass media and entertainment industry, entertainment conglomerate (company), conglomerate headquartered at the Walt Disney Studios (Burbank), Walt Disney Stud ...
for breach of contract over the direct-to-streaming rollout of her Marvel film ''
Black Widow
Black widow may refer to:
Spiders
* Black widow spider, a common name for some species of spiders in the genus ''Latrodectus''
American species
* ''Latrodectus apicalis'', the Galapagos black widow
* ''Latrodectus curacaviensis'', the South Amer ...
'', a settlement widely speculated to have netted her around $40M.
Also on May 21, Shira Ovide at ''
The Washington Post
''The Washington Post'' (also known as the ''Post'' and, informally, ''WaPo'') is an American daily newspaper published in Washington, D.C. It is the most widely circulated newspaper within the Washington metropolitan area and has a large n ...
'' shared her list of "most bone-headed self-owns" by technology companies, with the decision to go ahead with a Johansson sound-alike voice despite her opposition and then denying the similarities ranking 6th.
On May 24, Derek Robertson at ''
Politico
''Politico'' (stylized in all caps), known originally as ''The Politico'', is an American, German-owned political journalism newspaper company based in Arlington County, Virginia, that covers politics and policy in the United States and intern ...
'' wrote about the "massive backlash", concluding that "appropriating the voice of one of the world's most famous movie stars — in reference
..to a film that serves as a cautionary tale about over-reliance on AI — is unlikely to help shift the public back into
am Altman'scorner anytime soon."
See also
*
Llama (language model)
*
Apple Intelligence
References
{{Artificial intelligence navbox
Large language models
2024 software
Generative pre-trained transformers
OpenAI
ChatGPT