Sora is a text-to-video model developed by

OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...

. The model generates short video clips based on user prompts, and can also extend existing short videos. Sora was released publicly for ChatGPT Plus and ChatGPT Pro users in December 2024.

History

Several other text-to-video generating models had been created prior to Sora, including

Meta Meta most commonly refers to: * Meta (prefix), a common affix and word in English ( in Greek) * Meta Platforms, an American multinational technology conglomerate (formerly ''Facebook, Inc.'') Meta or META may also refer to: Businesses * Meta (ac ...

's Make-A-Video,

Runway In aviation, a runway is an elongated, rectangular surface designed for the landing and takeoff of an aircraft. Runways may be a human-made surface (often asphalt concrete, asphalt, concrete, or a mixture of both) or a natural surface (sod, ...

's Gen-2, and

Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...

's Lumiere, the last of which, is also still in its research phase.

, the company behind Sora, had released DALL·E 3, the third of its DALL-E

text-to-image model A text-to-image model is a machine learning model which takes an input natural language prompt and produces an image matching that description. Text-to-image models began to be developed in the mid-2010s during the beginnings of the AI boom ...

s, in September 2023. The team that developed Sora named it after the Japanese word for sky to signify its "limitless creative potential". On February 15, 2024, OpenAI first previewed Sora by releasing multiple clips of

high-definition video High-definition video (HD video) is video of higher resolution and quality than standard-definition. While there is no standardized meaning for ''high-definition'', generally any video image with considerably more than 480 vertical scan lines ( ...

s that it created, including an SUV driving down a mountain road, an animation of a "short fluffy monster" next to a candle, two people walking through

Tokyo Tokyo, officially the Tokyo Metropolis, is the capital of Japan, capital and List of cities in Japan, most populous city in Japan. With a population of over 14 million in the city proper in 2023, it is List of largest cities, one of the most ...

in the snow, and fake historical footage of the

California gold rush The California gold rush (1848–1855) began on January 24, 1848, when gold was found by James W. Marshall at Sutter's Mill in Coloma, California. The news of gold brought approximately 300,000 people to California from the rest of the U ...

, and stated that it was able to generate videos up to one minute long. The company then shared a technical report, which highlighted the methods used to train the model. OpenAI CEO

Sam Altman Samuel Harris Altman (born April 22, 1985) is an American technology entrepreneur, investor, and the chief executive officer of OpenAI since 2019 (he was Removal of Sam Altman from OpenAI, briefly dismissed and reinstated in November 2023). He ...

also posted a series of tweets, responding to

Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...

users' prompts with Sora-generated videos of the prompts. In November 2024, an

API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...

key for Sora access was leaked by a group of testers on

Hugging Face Hugging Face, Inc. is a French-American company based in List of tech companies in the New York metropolitan area, New York City that develops computation tools for building applications using machine learning. It is most notable for its Transf ...

, who posted a

manifesto A manifesto is a written declaration of the intentions, motives, or views of the issuer, be it an individual, group, political party, or government. A manifesto can accept a previously published opinion or public consensus, but many prominent ...

stating that they were protesting that Sora was used for " art washing". OpenAI revoked all access three hours after the leak was made public, and gave a statement that "hundreds of artists" have shaped the development, and that "participation is voluntary." As of December 9, 2024, OpenAI has made Sora available to the public, for ChatGPT Pro and ChatGPT Plus users. Prior to this, the company had provided limited access to a small "

red team A red team is a group that simulates an adversary, attempts a physical or digital intrusion against an organization at the direction of that organization, then reports back so that the organization can improve their defenses. Red teams work fo ...

", including experts in misinformation and bias, to perform adversarial testing on the model. The company also shared Sora with a small group of creative professionals, including video makers and artists, to seek feedback on its usefulness in creative fields. In February 2025, OpenAI announced plans to integrate Sora into ChatGPT by letting users generate Sora videos from the chatbot.

Capabilities and limitations

260px, A video generated by Sora of someone lying in a bed with a cat on it, containing several mistakes The technology behind Sora is an adaptation of the technology behind DALL-E 3. According to OpenAI, Sora is a diffusion transformer – a denoising latent diffusion model with one

Transformer In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple Electrical network, circuits. A varying current in any coil of the transformer produces ...

as the denoiser. A video is generated in latent space by denoising 3D "patches", then transformed to standard space by a video decompressor. Re-captioning is used to augment training data, by using a video-to-text model to create detailed captions on videos. OpenAI trained the model using publicly available videos as well as copyrighted videos licensed for the purpose, but did not reveal the number or the exact source of the videos. Upon its release, OpenAI acknowledged some of Sora's shortcomings, including its struggling to simulate complex physics, to understand causality, and to differentiate left from right. One example shows a group of wolf pups seemingly multiplying and converging, creating a hard-to-follow scenario. OpenAI also stated that, in adherence to the company's existing safety practices, Sora will restrict text prompts for sexual, violent, hateful, or celebrity imagery, as well as content featuring pre-existing

intellectual property Intellectual property (IP) is a category of property that includes intangible creations of the human intellect. There are many types of intellectual property, and some countries recognize more than others. The best-known types are patents, co ...

. Tim Brooks, a researcher on Sora, stated that the model figured out how to create

3D graphics 3D computer graphics, sometimes called CGI, 3D-CGI or three-dimensional computer graphics, are graphics that use a three-dimensional representation of geometric data (often Cartesian) that is stored in the computer for the purposes of perfor ...

from its dataset alone, while Bill Peebles, also a Sora researcher, said that the model automatically created different video angles without being prompted. According to OpenAI, Sora-generated videos are tagged with C2PA metadata to indicate that they were AI-generated.

Reception

Will Douglas Heaven of the ''

MIT Technology Review ''MIT Technology Review'' is a bimonthly magazine wholly owned by the Massachusetts Institute of Technology. It was founded in 1899 as ''The Technology Review'', and was re-launched without "''The''" in its name on April 23, 1998, under then pu ...

'' called the demonstration videos "impressive", but noted that they must have been cherry-picked and may not be representative of Sora's typical output. American academic

Oren Etzioni Oren Etzioni (born 1964) is Professor Emeritus of Computer Science at the University of Washington, and founding CEO of the Allen Institute for Artificial Intelligence (AI2). Etzioni is a co-founder oVercept an AI startup. Etzioni is the found ...

expressed concerns over the technology's ability to create online

disinformation Disinformation is misleading content deliberately spread to deceive people, or to secure economic or political gain and which may cause public harm. Disinformation is an orchestrated adversarial activity in which actors employ strategic dece ...

for political campaigns. For ''

Wired Wired may refer to: Arts, entertainment, and media Music * ''Wired'' (Jeff Beck album), 1976 * ''Wired'' (Hugh Cornwell album), 1993 * ''Wired'' (Mallory Knox album), 2017 * "Wired", a song by Prism from their album '' Beat Street'' * "Wired ...

'',

Steven Levy Steven Levy (born 1951) is an American journalist and editor at large for '' Wired'' who has written extensively for publications on computers, technology, cryptography, the internet, cybersecurity, and privacy. He is the author of the 1984 boo ...

similarly wrote that it had the potential to become "a misinformation train wreck" and opined that its preview clips were "impressive" but "not perfect" and that it "show dan emergent grasp of cinematic grammar" due to its unprompted shot changes. Levy added, " will be a very long time, if ever, before text-to-video threatens actual filmmaking." Lisa Lacy of CNET called its example videos "remarkably realistic – except perhaps when a human face appears close up or when sea creatures are swimming". Filmmaker

Tyler Perry Tyler Perry (born Emmitt Perry Jr., September 13, 1969) is an American actor, filmmaker, and playwright. He is the creator and performer of Madea, Mabel "Madea" Simmons, a tough elderly woman, and also portrays her brother Joe Simmons and her ...

announced he would be putting a planned $800 million expansion of his

Atlanta Atlanta ( ) is the List of capitals in the United States, capital and List of municipalities in Georgia (U.S. state), most populous city in the U.S. state of Georgia (U.S. state), Georgia. It is the county seat, seat of Fulton County, Georg ...

studio on hold, expressing concern about Sora's potential impact on the film industry.

References

External links

* {{Artificial intelligence navbox Articles containing video clips OpenAI Applications of artificial intelligence 2024 software Video processing Film and video technology Text-to-video generation

History

Capabilities and limitations

Reception

See also

References

External links