AI-generated Video
   HOME

TheInfoList



OR:

Generative artificial intelligence (Generative AI, GenAI, or GAI) is a subfield of
artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
that uses
generative model In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsiste ...
s to produce text, images, videos, or other forms of data. These models
learn Learning is the process of acquiring new understanding, knowledge, behaviors, skills, value (personal and cultural), values, Attitude (psychology), attitudes, and preferences. The ability to learn is possessed by humans, non-human animals, and ...
the underlying patterns and structures of their
training data In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from ...
and use them to produce new data based on the input, which often comes in the form of natural language prompts. Generative AI tools have become more common since an "
AI boom The AI boom is an ongoing period of rapid Progress in artificial intelligence, progress in the field of artificial intelligence (AI) that started in the late 2010s before gaining international prominence in the early 2020s. Examples include lar ...
" in the 2020s. This boom was made possible by improvements in
transformer In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple Electrical network, circuits. A varying current in any coil of the transformer produces ...
-based
deep Deep or The Deep may refer to: Places United States * Deep Creek (Appomattox River tributary), Virginia * Deep Creek (Great Salt Lake), Idaho and Utah * Deep Creek (Mahantango Creek tributary), Pennsylvania * Deep Creek (Mojave River tributary ...
neural networks A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...
, particularly
large language model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are g ...
s (LLMs). Major tools include
chatbots A chatbot (originally chatterbot) is a software application or web interface designed to have textual or spoken conversations. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of main ...
such as
ChatGPT ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other Multimodal learning, multimodal models to create human-like re ...
,
Copilot In aviation, the first officer (FO), also called co-pilot, is a Aircraft pilot, pilot in addition to the Pilot in command, captain, who is the legal commander. In the event of incapacitation of the captain, the first officer will assume command ...
,
Gemini Gemini most often refers to: * Gemini (constellation), one of the constellations of the zodiac * Gemini (astrology), an astrological sign Gemini may also refer to: Science and technology Space * Gemini in Chinese astronomy, the Gemini constellat ...
,
Grok ''Grok'' () is a neologism coined by the American writer Robert A. Heinlein for his 1961 science fiction novel '' Stranger in a Strange Land''. While the ''Oxford English Dictionary'' summarizes the meaning of ''grok'' as "to understand intuit ...
, and
DeepSeek Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., Trade name, doing business as DeepSeek, is a Chinese artificial intelligence company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, Deepse ...
;
text-to-image A text-to-image model is a machine learning model which takes an input natural language prompt and produces an image matching that description. Text-to-image models began to be developed in the mid-2010s during the beginnings of the AI boom ...
models such as
Stable Diffusion Stable Diffusion is a deep learning, text-to-image model released in 2022 based on Diffusion model, diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of ...
,
Midjourney Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called '' prompts'', ...
, and
DALL-E DALL-E, DALL-E 2, and DALL-E 3 (stylised DALL·E) are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as Prompt engineering, ''prompts''. The first ...
; and
text-to-video A text-to-video model is a machine learning model that uses a natural language description as input to produce a video relevant to the input text. Advancements during the 2020s in the generation of high-quality, text-conditioned videos have large ...
models such as Sora and Veo. Technology companies developing generative AI include
OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...
,
Anthropic Anthropic PBC is an American artificial intelligence (AI) startup company founded in 2021. Anthropic has developed a family of large language models (LLMs) named Claude as a competitor to OpenAI's ChatGPT and Google's Gemini. According to the ...
,
Meta AI Meta AI is a research division of Meta (formerly Facebook) that develops artificial intelligence and augmented reality technologies. History The foundation of laboratory was announced in 2013, under the name Facebook Artificial Intelligence ...
,
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
,
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
,
DeepSeek Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., Trade name, doing business as DeepSeek, is a Chinese artificial intelligence company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, Deepse ...
, and
Baidu Baidu, Inc. ( ; ) is a Chinese multinational technology company specializing in Internet services and artificial intelligence. It holds a dominant position in China's search engine market (via Baidu Search), and provides a wide variety of o ...
. Generative AI has raised many ethical questions. It can be used for
cybercrime Cybercrime encompasses a wide range of criminal activities that are carried out using digital devices and/or Computer network, networks. It has been variously defined as "a crime committed on a computer network, especially the Internet"; Cyberc ...
, or to deceive or manipulate people through
fake news Fake news or information disorder is false or misleading information (misinformation, disinformation, propaganda, and hoaxes) claiming the aesthetics and legitimacy of news. Fake news often has the aim of damaging the reputation of a person ...
or
deepfake ''Deepfakes'' (a portmanteau of and ) are images, videos, or audio that have been edited or generated using artificial intelligence, AI-based tools or AV editing software. They may depict real or fictional people and are considered a form of ...
s. Even if used ethically, it may lead to mass replacement of human jobs. The tools themselves have been criticized as violating intellectual property laws, since they are trained on and emulate copyrighted works of art. Generative AI is used across many industries. Examples include software development, healthcare, finance, entertainment, customer service, sales and marketing, art, writing, fashion, and product design.


History


Early history

The first example of an algorithmically generated media is likely the
Markov chain In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...
. Markov chains have long been used to model natural languages since their development by Russian mathematician
Andrey Markov Andrey Andreyevich Markov (14 June 1856 – 20 July 1922) was a Russian mathematician best known for his work on stochastic processes. A primary subject of his research later became known as the Markov chain. He was also a strong, close to mas ...
in the early 20th century. Markov published his first paper on the topic in 1906, and analyzed the pattern of vowels and consonants in the novel '' Eugeny Onegin'' using Markov chains. Once a Markov chain is learned on a
text corpus In linguistics and natural language processing, a corpus (: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated. Annotated, they have been used in corp ...
, it can then be used as a probabilistic text generator. Computers were needed to go beyond Markov chains. By the early 1970s, Harold Cohen was creating and exhibiting generative AI works created by
AARON According to the Old Testament of the Bible, Aaron ( or ) was an Israelite prophet, a high priest, and the elder brother of Moses. Information about Aaron comes exclusively from religious texts, such as the Hebrew Bible, the New Testament ...
, the computer program Cohen created to generate paintings. The terms generative AI planning or generative planning were used in the 1980s and 1990s to refer to
AI planning Automated planning and scheduling, sometimes denoted as simply AI planning, is a branch of artificial intelligence that concerns the realization of strategy, strategies or action sequences, typically for execution by intelligent agents, autonomou ...
systems, especially computer-aided process planning, used to generate sequences of actions to reach a specified goal. Generative AI planning systems used
symbolic AI Symbolic may refer to: * Symbol, something that represents an idea, a process, or a physical entity Mathematics, logic, and computing * Symbolic computation, a scientific area concerned with computing with mathematical formulas * Symbolic dynamic ...
methods such as
state space search State-space search is a process used in the field of computer science, including artificial intelligence (AI), in which successive configurations or ''states'' of an instance are considered, with the intention of finding a ''goal state'' with the ...
and
constraint satisfaction In artificial intelligence and operations research, constraint satisfaction is the process of finding a solution through a set of constraints that impose conditions that the variables must satisfy. A solution is therefore an assignment of value ...
and were a "relatively mature" technology by the early 1990s. They were used to generate crisis action plans for military use, process plans for manufacturing and decision plans such as in prototype autonomous spacecraft.


Generative neural networks (2014–2019)

Since inception, the field of
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
has used both
discriminative model Discriminative models, also referred to as conditional models, are a class of models frequently used for classification. They are typically used to solve binary classification problems, i.e. assign labels, such as pass/fail, win/lose, alive/dead or ...
s and
generative model In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsiste ...
s to model and predict data. Beginning in the late 2000s, the emergence of
deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
drove progress, and research in
image classification Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form o ...
,
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also ...
,
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
and other tasks.
Neural network A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...
s in this era were typically trained as discriminative models due to the difficulty of generative modeling. In 2014, advancements such as the
variational autoencoder In machine learning, a variational autoencoder (VAE) is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling. It is part of the families of probabilistic graphical models and variational Bayesian metho ...
and
generative adversarial network A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June ...
produced the first practical deep neural networks capable of learning generative models, as opposed to discriminative ones, for complex data such as images. These deep generative models were the first to output not only class labels for images but also entire images. In 2017, the
Transformer In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple Electrical network, circuits. A varying current in any coil of the transformer produces ...
network enabled advancements in generative models compared to older Long-Short Term Memory models, leading to the first
generative pre-trained transformer A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an Neural network (machine learning), artificial neural network that is used in natural ...
(GPT), known as
GPT-1 Generative Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformer architecture in 2017. In June 2018, OpenAI released a paper entitled "Improving Language Understanding ...
, in 2018. This was followed in 2019 by
GPT-2 Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of Generative pre-trained transformer, GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages. It was par ...
, which demonstrated the ability to generalize unsupervised to many different tasks as a
Foundation model In artificial intelligence (AI), a foundation model (FM), also known as large X model (LxM), is a machine learning or deep learning model trained on vast datasets so that it can be applied across a wide range of use cases.Competition and Markets ...
. The new generative models introduced during this period allowed for large neural networks to be trained using
unsupervised learning Unsupervised learning is a framework in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. Other frameworks in the spectrum of supervisions include weak- or semi-supervision, wh ...
or
semi-supervised learning Weak supervision (also known as semi-supervised learning) is a paradigm in machine learning, the relevance and notability of which increased with the advent of large language models due to large amount of data required to train them. It is charact ...
, rather than the
supervised learning In machine learning, supervised learning (SL) is a paradigm where a Statistical model, model is trained using input objects (e.g. a vector of predictor variables) and desired output values (also known as a ''supervisory signal''), which are often ...
typical of discriminative models. Unsupervised learning removed the need for humans to manually label data, allowing for larger networks to be trained.


Generative AI boom (2020–)

In March 2020, the release of 15.ai, a free
web application A web application (or web app) is application software that is created with web technologies and runs via a web browser. Web applications emerged during the late 1990s and allowed for the server to dynamically build a response to the request, ...
created by an anonymous
MIT The Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of modern technology and sc ...
researcher that could generate convincing character voices using minimal training data, marked one of the earliest popular use cases of generative AI. The platform is credited as the first mainstream service to popularize AI voice cloning ( audio deepfakes) in
memes A meme (; ) is an idea, behavior, or style that spreads by means of imitation from person to person within a culture and often carries symbolic meaning representing a particular phenomenon or theme. A meme acts as a unit for carrying cultural ...
and
content creation Content creation or content creative is the act of producing and sharing information or media content for specific audiences, particularly in digital contexts. According to '' Dictionary.com'', content refers to "something that is to be expresse ...
, influencing subsequent developments in voice AI technology. In 2021, the emergence of
DALL-E DALL-E, DALL-E 2, and DALL-E 3 (stylised DALL·E) are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as Prompt engineering, ''prompts''. The first ...
, a
transformer In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple Electrical network, circuits. A varying current in any coil of the transformer produces ...
-based pixel generative model, marked an advance in AI-generated imagery. This was followed by the releases of
Midjourney Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called '' prompts'', ...
and
Stable Diffusion Stable Diffusion is a deep learning, text-to-image model released in 2022 based on Diffusion model, diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of ...
in 2022, which further democratized access to high-quality
artificial intelligence art Artificial intelligence visual art means visual artwork generated (or enhanced) through the use of artificial intelligence (AI) programs. Artists began to create AI art in the mid to late 20th century, when the discipline was founded. Throug ...
creation from natural language prompts. These systems demonstrated unprecedented capabilities in generating photorealistic images, artwork, and designs based on text descriptions, leading to widespread adoption among artists, designers, and the general public. In late 2022, the public release of
ChatGPT ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other Multimodal learning, multimodal models to create human-like re ...
revolutionized the accessibility and application of generative AI for general-purpose text-based tasks. The system's ability to engage in natural conversations, generate creative content, assist with coding, and perform various analytical tasks captured global attention and sparked widespread discussion about AI's potential impact on
work Work may refer to: * Work (human activity), intentional activity people perform to support themselves, others, or the community ** Manual labour, physical work done by humans ** House work, housework, or homemaking ** Working animal, an ani ...
,
education Education is the transmission of knowledge and skills and the development of character traits. Formal education occurs within a structured institutional framework, such as public schools, following a curriculum. Non-formal education als ...
, and
creativity Creativity is the ability to form novel and valuable Idea, ideas or works using one's imagination. Products of creativity may be intangible (e.g. an idea, scientific theory, Literature, literary work, musical composition, or joke), or a physica ...
. In March 2023,
GPT-4 Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model trained and created by OpenAI and the fourth in its series of GPT foundation models. It was launched on March 14, 2023, and made publicly available via the p ...
's release represented another jump in generative AI capabilities. A team from
Microsoft Research Microsoft Research (MSR) is the research subsidiary of Microsoft. It was created in 1991 by Richard Rashid, Bill Gates and Nathan Myhrvold with the intent to advance state-of-the-art computing and solve difficult world problems through technologi ...
controversially argued that it "could reasonably be viewed as an early (yet still incomplete) version of an
artificial general intelligence Artificial general intelligence (AGI)—sometimes called human‑level intelligence AI—is a type of artificial intelligence that would match or surpass human capabilities across virtually all cognitive tasks. Some researchers argue that sta ...
(AGI) system." However, this assessment was contested by other scholars who maintained that generative AI remained "still far from reaching the benchmark of 'general human intelligence'" as of 2023. Later in 2023,
Meta Meta most commonly refers to: * Meta (prefix), a common affix and word in English ( in Greek) * Meta Platforms, an American multinational technology conglomerate (formerly ''Facebook, Inc.'') Meta or META may also refer to: Businesses * Meta (ac ...
released ImageBind, an AI model combining multiple modalities including text, images, video, thermal data, 3D data, audio, and motion, paving the way for more immersive generative AI applications. In December 2023,
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
unveiled
Gemini Gemini most often refers to: * Gemini (constellation), one of the constellations of the zodiac * Gemini (astrology), an astrological sign Gemini may also refer to: Science and technology Space * Gemini in Chinese astronomy, the Gemini constellat ...
, a multimodal AI model available in four versions: Ultra, Pro, Flash, and Nano. The company integrated Gemini Pro into its Bard chatbot and announced plans for "Bard Advanced" powered by the larger Gemini Ultra model. In February 2024, Google unified Bard and Duet AI under the Gemini brand, launching a mobile app on Android and integrating the service into the Google app on
iOS Ios, Io or Nio (, ; ; locally Nios, Νιός) is a Greek island in the Cyclades group in the Aegean Sea. Ios is a hilly island with cliffs down to the sea on most sides. It is situated halfway between Naxos and Santorini. It is about long an ...
. In March 2024,
Anthropic Anthropic PBC is an American artificial intelligence (AI) startup company founded in 2021. Anthropic has developed a family of large language models (LLMs) named Claude as a competitor to OpenAI's ChatGPT and Google's Gemini. According to the ...
released the
Claude Claude may refer to: People and fictional characters * Claude (given name), a list of people and fictional characters * Claude (surname), a list of people * Claude Callegari (1962–2021), English Arsenal supporter * Claude Debussy (1862–1918), ...
3 family of large language models, including Claude 3 Haiku, Sonnet, and Opus. The models demonstrated significant improvements in capabilities across various benchmarks, with Claude 3 Opus notably outperforming leading models from OpenAI and Google. In June 2024, Anthropic released Claude 3.5 Sonnet, which demonstrated improved performance compared to the larger Claude 3 Opus, particularly in areas such as coding, multistep workflows, and image analysis.
Asia–Pacific The Asia–Pacific (APAC) also Known as Indo-Pacific is the region of the world adjoining the western Pacific Ocean. The region's precise boundaries vary depending on context, but countries and territories in Australasia, East Asia, and Southea ...
countries are significantly more optimistic than Western societies about generative AI and show higher adoption rates. Despite expressing concerns about privacy and the pace of change, in a 2024 survey, 68% of Asia-Pacific respondents believed that AI was having a positive impact on the world, compared to 57% globally. According to a survey by SAS and Coleman Parkes Research,
China China, officially the People's Republic of China (PRC), is a country in East Asia. With population of China, a population exceeding 1.4 billion, it is the list of countries by population (United Nations), second-most populous country after ...
in particular has emerged as a global leader in generative AI adoption, with 83% of Chinese respondents using the technology, exceeding both the global average of 54% and the U.S. rate of 65%. This leadership is further evidenced by China's
intellectual property Intellectual property (IP) is a category of property that includes intangible creations of the human intellect. There are many types of intellectual property, and some countries recognize more than others. The best-known types are patents, co ...
developments in the field, with a UN report revealing that Chinese entities filed over 38,000 generative AI
patent A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an sufficiency of disclosure, enabling discl ...
s from 2014 to 2023, substantially surpassing the United States in patent applications. A 2024 survey on the Chinese social app Soul reported that 18% of respondents born after 2000 used generative AI "almost every day", and that over 60% of respondents like or love AI-generated content, while less than 3% dislike or hate it.


Applications

A generative AI system is constructed by applying
unsupervised machine learning ''Unsupervised'' is an American adult animated sitcom created by David Hornsby, Rob Rosell, and Scott Marder which ran on FX from January 19 to December 20, 2012. On November 17, 2012, the series was canceled after one season. Plot The series ...
(invoking for instance
neural network A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...
architectures such as
generative adversarial network A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June ...
s (GANs), variation autoencoders (VAEs),
transformers ''Transformers'' is a media franchise produced by American toy company Hasbro and Japanese toy company Tomy, Takara Tomy. It primarily follows the heroic Autobots and the villainous Decepticons, two Extraterrestrials in fiction, alien robot fac ...
, or self-supervised machine learning trained on a
dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record o ...
. The capabilities of a generative AI system depend on the output (
modality Modality may refer to: Humanities * Modality (theology), the organization and structure of the church, as distinct from sodality or parachurch organizations * Modality (music), in music, the subject concerning certain diatonic scales * Modalit ...
) of the data set used. Generative AI can be either ''
unimodal In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object. Unimodal probability distribution In statistics, a unimodal p ...
'' or ''multimodal''; unimodal systems take only one type of input, whereas multimodal systems can take more than one type of input. For example, one version of
OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...
's GPT-4 accepts both text and image inputs. Generative AI has made its appearance in a wide variety of industries, radically changing the dynamics of content creation, analysis, and delivery. In healthcare, generative AI is instrumental in accelerating
drug discovery In the fields of medicine, biotechnology, and pharmacology, drug discovery is the process by which new candidate medications are discovered. Historically, drugs were discovered by identifying the active ingredient from traditional remedies or ...
by creating molecular structures with target characteristics and generating
radiology Radiology ( ) is the medical specialty that uses medical imaging to diagnose diseases and guide treatment within the bodies of humans and other animals. It began with radiography (which is why its name has a root referring to radiation), but tod ...
images for training diagnostic models. This extraordinary ability not only enables faster and cheaper development but also enhances medical decision-making. In finance, generative AI is invaluable as it generates datasets to train models and automates report generation with natural language summarization capabilities. It automates content creation, produces synthetic financial data, and tailors customer communications. It also powers chatbots and virtual agents. Collectively, these technologies enhance efficiency, reduce operational costs, and support data-driven decision-making in financial institutions. The media industry makes use of generative AI for numerous creative activities such as music composition, scriptwriting, video editing, and digital art. The educational sector is impacted as well, since the tools make learning personalized through creating quizzes, study aids, and essay composition. Both the teachers and the learners benefit from AI-based platforms that suit various learning patterns.


Text and software code

Generative AI systems trained on words or word tokens include
GPT-3 Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor, GPT-2, it is a decoder-only transformer model of deep neural network, which supersedes recurrence and convolution-based ...
,
GPT-4 Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model trained and created by OpenAI and the fourth in its series of GPT foundation models. It was launched on March 14, 2023, and made publicly available via the p ...
,
GPT-4o GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. It can process and generate text, images and audio. GPT-4o is free, but ChatGPT Plus subscribers have higher ...
,
LaMDA The London Academy of Music and Dramatic Art (LAMDA) is a drama school located in Hammersmith, London. Founded in 1861, it is the oldest specialist drama school in the British Isles and a founding member of the Federation of Drama Schools. In ...
,
LLaMA The llama (; or ) (''Lama glama'') is a domesticated South American camelid, widely used as a List of meat animals, meat and pack animal by Inca empire, Andean cultures since the pre-Columbian era. Llamas are social animals and live with ...
, BLOOM,
Gemini Gemini most often refers to: * Gemini (constellation), one of the constellations of the zodiac * Gemini (astrology), an astrological sign Gemini may also refer to: Science and technology Space * Gemini in Chinese astronomy, the Gemini constellat ...
and others (see
List of large language models A large language model (LLM) is a type of machine learning Model#Conceptual model, model designed for natural language processing tasks such as language Generative artificial intelligence, generation. LLMs are language models with many parameters, ...
). They are capable of
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
,
machine translation Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages. Early approaches were mostly rule-based or statisti ...
, and
natural language generation Natural language generation (NLG) is a software process that produces natural language output. A widely cited survey of NLG methods describes NLG as "the subfield of artificial intelligence and computational linguistics that is concerned with the ...
and can be used as
foundation models In artificial intelligence (AI), a foundation model (FM), also known as large X model (LxM), is a machine learning or deep learning model trained on vast datasets so that it can be applied across a wide range of use cases.Competition and Markets ...
for other tasks. Data sets include BookCorpus,
Wikipedia Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...
, and others (see List of text corpora). In addition to
natural language A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
text, large language models can be trained on
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
text, allowing them to generate
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
for new
computer programs A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...
. Examples include OpenAI Codex, Tabnine,
GitHub Copilot GitHub Copilot is a code completion and automatic programming tool developed by GitHub and OpenAI that assists users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocomplete, autocom ...
,
Microsoft Copilot Microsoft Copilot (or simply Copilot) is a generative artificial intelligence chatbot developed by Microsoft. Based on the GPT-4 series of large language models, it was launched in 2023 as Microsoft's primary replacement for the discontinued C ...
, and
VS Code Visual Studio Code, commonly referred to as VS Code, is an integrated development environment developed by Microsoft for Windows, Linux, macOS and web browsers. Features include support for debugging, syntax highlighting, intelligent code comple ...
fork In cutlery or kitchenware, a fork (from 'pitchfork') is a utensil, now usually made of metal, whose long handle terminates in a head that branches into several narrow and often slightly curved tines with which one can spear foods either to h ...
Cursor. Some AI assistants help candidates cheat during online coding interviews by providing code, improvements, and explanations. Their clandestine interfaces minimize the need for eye movements that would expose cheating to the interviewer.


Images

Producing high-quality visual art is a prominent application of generative AI. Generative AI systems trained on sets of images with text captions include Imagen,
DALL-E DALL-E, DALL-E 2, and DALL-E 3 (stylised DALL·E) are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as Prompt engineering, ''prompts''. The first ...
,
Midjourney Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called '' prompts'', ...
, Adobe Firefly, FLUX.1, Stable Diffusion and others (see
Artificial intelligence art Artificial intelligence visual art means visual artwork generated (or enhanced) through the use of artificial intelligence (AI) programs. Artists began to create AI art in the mid to late 20th century, when the discipline was founded. Throug ...
,
Generative art Generative art is post-conceptual art that has been created (in whole or in part) with the use of an autonomous system. An ''autonomous system'' in this context is generally one that is non-human and can independently determine features of an ...
, and
Synthetic media Synthetic media (also known as AI-generated media, media produced by generative AI, personalized media, personalized content, and colloquially as deepfakes) is a catch-all term for the artificial production, manipulation, and modification of dat ...
). They are commonly used for
text-to-image A text-to-image model is a machine learning model which takes an input natural language prompt and produces an image matching that description. Text-to-image models began to be developed in the mid-2010s during the beginnings of the AI boom ...
generation and neural style transfer. Datasets include LAION-5B and others (see List of datasets in computer vision and image processing).


Audio

Generative AI can also be trained extensively on audio clips to produce natural-sounding
speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...
and
text-to-speech Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or Computer hardware, hardware products. A text-to-speech (TTS) system conv ...
capabilities. An early pioneer in this field was 15.ai, launched in March 2020, which demonstrated the ability to clone character voices using as little as 15 seconds of training data. The website gained widespread attention for its ability to generate emotionally expressive speech for various fictional characters, though it was later taken offline in 2022 due to copyright concerns. Commercial alternatives subsequently emerged, including ElevenLabs' context-aware synthesis tools and Meta Platform's Voicebox. Generative AI systems such as MusicLM and MusicGen can also be trained on the audio waveforms of recorded music along with text annotations, in order to generate new musical samples based on text descriptions such as ''a calming violin melody backed by a distorted guitar riff''.
Audio deepfake Audio deepfake technology, also referred to as voice cloning or deepfake audio, is an application of artificial intelligence designed to generate speech that convincingly mimics specific individuals, often speech synthesis, synthesizing phrases or ...
s of music
lyrics Lyrics are words that make up a song, usually consisting of verses and choruses. The writer of lyrics is a lyricist. The words to an extended musical composition such as an opera are, however, usually known as a "libretto" and their writer, ...
have been generated, like the song Savages, which used AI to mimic rapper
Jay-Z Shawn Corey Carter (born December 4, 1969), known professionally as Jay-Z, is an American Rapping, rapper, businessman, and record executive. Rooted in East Coast hip-hop, he was named Billboard and Vibe's 50 Greatest Rappers of All Time, the ...
's vocals. Music artist's instrumentals and lyrics are copyrighted but their voices are not protected from regenerative AI yet, raising a debate about whether artists should get royalties from audio deepfakes. Many AI music generators have been created that can be generated using a text phrase,
genre Genre () is any style or form of communication in any mode (written, spoken, digital, artistic, etc.) with socially agreed-upon conventions developed over time. In popular usage, it normally describes a category of literature, music, or other fo ...
options, and looped
libraries A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
of bars and
riff A riff is a short, repeated motif or figure in the melody or accompaniment of a musical composition. Riffs are most often found in rock music, punk, heavy metal music, Latin, funk, and jazz, although classical music is also sometimes based ...
s.


Video

Generative AI trained on annotated video can generate temporally-coherent, detailed and
photorealistic Photorealism is a genre of art that encompasses painting, drawing and other graphic media, in which an artist studies a photograph and then attempts to reproduce the image as realistically as possible in another medium. Although the term can b ...
video clips. Examples include Sora by
OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...
,
Runway In aviation, a runway is an elongated, rectangular surface designed for the landing and takeoff of an aircraft. Runways may be a human-made surface (often asphalt concrete, asphalt, concrete, or a mixture of both) or a natural surface (sod, ...
, and Make-A-Video by
Meta Platforms Meta Platforms, Inc. is an American multinational technology company headquartered in Menlo Park, California. Meta owns and operates several prominent social media platforms and communication services, including Facebook, Instagram, Threads ...
.


Robotics

Generative AI can also be trained on the motions of a
robotic Robotics is the interdisciplinary study and practice of the design, construction, operation, and use of robots. Within mechanical engineering, robotics is the design and construction of the physical structures of robots, while in computer s ...
system to generate new trajectories for
motion planning Motion planning, also path planning (also known as the navigation problem or the piano mover's problem) is a computational problem to find a sequence of valid configurations that moves the object from the source to destination. The term is used ...
or
navigation Navigation is a field of study that focuses on the process of monitoring and controlling the motion, movement of a craft or vehicle from one place to another.Bowditch, 2003:799. The field of navigation includes four general categories: land navig ...
. For example, UniPi from Google Research uses prompts like ''"pick up blue bowl"'' or ''"wipe plate with yellow sponge"'' to control movements of a robot arm. Multimodal "vision-language-action" models such as Google's RT-2 can perform rudimentary reasoning in response to user prompts and visual input, such as picking up a toy
dinosaur Dinosaurs are a diverse group of reptiles of the clade Dinosauria. They first appeared during the Triassic Geological period, period, between 243 and 233.23 million years ago (mya), although the exact origin and timing of the #Evolutio ...
when given the prompt ''pick up the extinct animal'' at a table filled with toy animals and other objects.


3D modeling

Artificially intelligent
computer-aided design Computer-aided design (CAD) is the use of computers (or ) to aid in the creation, modification, analysis, or optimization of a design. This software is used to increase the productivity of the designer, improve the quality of design, improve c ...
(CAD) can use text-to-3D, image-to-3D, and video-to-3D to
automate Automation describes a wide range of technologies that reduce human intervention in processes, mainly by predetermining decision criteria, subprocess relationships, and related actions, as well as embodying those predeterminations in machine ...
3D modeling In 3D computer graphics, 3D modeling is the process of developing a mathematical coordinate-based Computer representation of surfaces, representation of a surface of an object (inanimate or living) in Three-dimensional space, three dimensions vi ...
. AI-based CAD libraries could also be developed using linked
open data Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license. The goals of the open data movement are similar to those of other "open(-so ...
of
schematic A schematic, or schematic diagram, is a designed representation of the elements of a system using abstract, graphic symbols rather than realistic pictures. A schematic usually omits all details that are not relevant to the key information the sc ...
s and
diagram A diagram is a symbolic Depiction, representation of information using Visualization (graphics), visualization techniques. Diagrams have been used since prehistoric times on Cave painting, walls of caves, but became more prevalent during the Age o ...
s. AI CAD assistants are used as tools to help streamline workflow.


Software and hardware

Generative AI models are used to power
chatbot A chatbot (originally chatterbot) is a software application or web interface designed to have textual or spoken conversations. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of main ...
products such as
ChatGPT ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other Multimodal learning, multimodal models to create human-like re ...
,
programming tools A programming tool or software development tool is a computer program that is used to software development, develop another computer program, usually by helping the developer manage computer files. For example, a programmer may use a tool called ...
such as
GitHub Copilot GitHub Copilot is a code completion and automatic programming tool developed by GitHub and OpenAI that assists users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocomplete, autocom ...
,
text-to-image A text-to-image model is a machine learning model which takes an input natural language prompt and produces an image matching that description. Text-to-image models began to be developed in the mid-2010s during the beginnings of the AI boom ...
products such as Midjourney, and text-to-video products such as
Runway In aviation, a runway is an elongated, rectangular surface designed for the landing and takeoff of an aircraft. Runways may be a human-made surface (often asphalt concrete, asphalt, concrete, or a mixture of both) or a natural surface (sod, ...
Gen-2. Generative AI features have been integrated into a variety of existing commercially available products such as
Microsoft Office Microsoft Office, MS Office, or simply Office, is an office suite and family of client software, server software, and services developed by Microsoft. The first version of the Office suite, announced by Bill Gates on August 1, 1988, at CO ...
(
Microsoft Copilot Microsoft Copilot (or simply Copilot) is a generative artificial intelligence chatbot developed by Microsoft. Based on the GPT-4 series of large language models, it was launched in 2023 as Microsoft's primary replacement for the discontinued C ...
),
Google Photos Google Photos is a photo sharing and Cloud storage, storage service developed by Google. It was announced in May 2015 and spun off from Google+, the company's former Social networking service, social network. Google Photos shares the 15 gigab ...
, and the Adobe Suite ( Adobe Firefly). Many generative AI models are also available as
open-source software Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
, including Stable Diffusion and the LLaMA language model. Smaller generative AI models with up to a few billion parameters can run on
smartphones A smartphone is a mobile phone with advanced computing capabilities. It typically has a touchscreen interface, allowing users to access a wide range of applications and services, such as web browsing, email, and social media, as well as mult ...
, embedded devices, and
personal computers A personal computer, commonly referred to as PC or computer, is a computer designed for individual use. It is typically used for tasks such as Word processor, word processing, web browser, internet browsing, email, multimedia playback, and PC ...
. For example, LLaMA-7B (a version with 7 billion parameters) can run on a
Raspberry Pi 4 The Raspberry Pi 4 is the fourth generation of the Raspberry Pi flagship series of single-board computers. Developed by Raspberry Pi Holdings and released on 24 June 2019, it introduced significant upgrades over its predecessor. At its core, th ...
and one version of Stable Diffusion can run on an
iPhone 11 The iPhone 11 is a smartphone developed and marketed by Apple Inc., Apple. It is the thirteenth generation of iPhone, succeeding the iPhone XR, and was unveiled on September 10, 2019, alongside the higher-end iPhone 11 Pro at the Steve Jobs Th ...
. Larger models with tens of billions of parameters can run on
laptop A laptop computer or notebook computer, also known as a laptop or notebook, is a small, portable personal computer (PC). Laptops typically have a Clamshell design, clamshell form factor (design), form factor with a flat-panel computer scree ...
or
desktop computers A desktop computer, often abbreviated as desktop, is a personal computer designed for regular use at a stationary location on or near a desk (as opposed to a portable computer) due to its size and power requirements. The most common configurati ...
. To achieve an acceptable speed, models of this size may require accelerators such as the
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...
chips produced by
NVIDIA Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...
and
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
or the Neural Engine included in
Apple silicon Apple silicon is a series of system on a chip (SoC) and system in a package (SiP) processors designed by Apple Inc., mainly using the ARM architecture family, ARM architecture. They are used in nearly all of the company's devices including Mac ...
products. For example, the 65 billion parameter version of LLaMA can be configured to run on a desktop PC. The advantages of running generative AI locally include protection of
privacy Privacy (, ) is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively. The domain of privacy partially overlaps with security, which can include the concepts of a ...
and
intellectual property Intellectual property (IP) is a category of property that includes intangible creations of the human intellect. There are many types of intellectual property, and some countries recognize more than others. The best-known types are patents, co ...
, and avoidance of
rate limiting In computer networks, rate limiting is used to control the rate of requests sent or received by a network interface controller. It can be used to prevent DoS attacks and limit web scraping. Research indicates flooding rates for one zombie machin ...
and
censorship Censorship is the suppression of speech, public communication, or other information. This may be done on the basis that such material is considered objectionable, harmful, sensitive, or "inconvenient". Censorship can be conducted by governmen ...
. The
subreddit Reddit ( ) is an American Proprietary software, proprietary social news news aggregator, aggregation and Internet forum, forum Social media, social media platform. Registered users (commonly referred to as "redditors") submit content to the ...
r/LocalLLaMA in particular focuses on using
consumer A consumer is a person or a group who intends to order, or use purchased goods, products, or services primarily for personal, social, family, household and similar needs, who is not directly related to entrepreneurial or business activities. ...
-grade gaming
graphics card A graphics card (also called a video card, display card, graphics accelerator, graphics adapter, VGA card/VGA, video adapter, display adapter, or colloquially GPU) is a computer expansion card that generates a feed of graphics output to a displa ...
s through such techniques as
compression Compression may refer to: Physical science *Compression (physics), size reduction due to forces *Compression member, a structural element such as a column *Compressibility, susceptibility to compression * Gas compression *Compression ratio, of a ...
. That forum is one of only two sources
Andrej Karpathy Andrej Karpathy (born 23 October 1986) is a Slovak-Canadian computer scientist who served as the director of artificial intelligence and Autopilot Vision at Tesla. He co-founded and formerly worked at OpenAI, where he specialized in deep lear ...
trusts for language model benchmarks.
Yann LeCun Yann André Le Cun ( , ; usually spelled LeCun; born 8 July 1960) is a French-American computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics and computational neuroscience. He is the Silver Pr ...
has advocated open-source models for their value to vertical applications and for improving AI safety. Language models with hundreds of billions of parameters, such as GPT-4 or
PaLM Palm most commonly refers to: * Palm of the hand, the central region of the front of the hand * Palm plants, of family Arecaceae ** List of Arecaceae genera **Palm oil * Several other plants known as "palm" Palm or Palms may also refer to: Music ...
, typically run on
datacenter A data center is a building, a dedicated space within a building, or a group of buildings used to house computer systems and associated components, such as telecommunications and storage systems. Since IT operations are crucial for business ...
computers equipped with arrays of
GPUs A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...
(such as NVIDIA's H100) or
AI accelerator A neural processing unit (NPU), also known as AI accelerator or deep learning processor, is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence (AI) and machine learning applications, inc ...
chips (such as Google's TPU). These very large models are typically accessed as
cloud In meteorology, a cloud is an aerosol consisting of a visible mass of miniature liquid droplets, frozen crystals, or other particles, suspended in the atmosphere of a planetary body or similar space. Water or various other chemicals may ...
services over the Internet. In 2022, the United States New Export Controls on Advanced Computing and Semiconductors to China imposed restrictions on exports to China of
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...
and AI accelerator chips used for generative AI. Chips such as the NVIDIA A800 and the Biren Technology BR104 were developed to meet the requirements of the sanctions. There is free software on the market capable of recognizing text generated by generative artificial intelligence (such as
GPTZero GPTZero is an artificial intelligence detection software developed to identify artificially generated text, such as those produced by large language models. While GPTZero was praised for its efforts to prevent academic dishonesty, many news outl ...
), as well as images, audio or video coming from it. Potential mitigation strategies for detecting generative AI content include
digital watermarking A digital watermark is a kind of marker covertly embedded in a noise-tolerant signal such as audio, video or image data.H.T. Sencar, M. Ramkumar and A.N. Akansu: ''Data Hiding Fundamentals and Applications: Content Security in Digital Multimedia'' ...
, content authentication,
information retrieval Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
, and machine learning classifier models. Despite claims of accuracy, both free and paid AI text detectors have frequently produced false positives, mistakenly accusing students of submitting AI-generated work.


Generative models and training techniques


Generative adversarial networks

Generative adversarial network A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June ...
s (GANs) are an influential generative modeling technique. GANs consist of two neural networks—the generator and the discriminator—trained simultaneously in a competitive setting. The generator creates
synthetic data Synthetic data are artificially generated rather than produced by real-world events. Typically created using algorithms, synthetic data can be deployed to validate mathematical models and to train machine learning models. Data generated by a comp ...
by transforming random noise into samples that resemble the training dataset. The discriminator is trained to distinguish the authentic data from synthetic data produced by the generator. The two models engage in a
minimax Minimax (sometimes Minmax, MM or saddle point) is a decision rule used in artificial intelligence, decision theory, combinatorial game theory, statistics, and philosophy for ''minimizing'' the possible loss function, loss for a Worst-case scenari ...
game: the generator aims to create increasingly realistic data to "fool" the discriminator, while the discriminator improves its ability to distinguish real from fake data. This continuous training setup enables the generator to produce high-quality and realistic outputs.


Variational autoencoders

Variational autoencoder In machine learning, a variational autoencoder (VAE) is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling. It is part of the families of probabilistic graphical models and variational Bayesian metho ...
s (VAEs) are deep learning models that probabilistically encode data. They are typically used for tasks such as
noise reduction Noise reduction is the process of removing noise from a signal. Noise reduction techniques exist for audio and images. Noise reduction algorithms may distort the signal to some degree. Noise rejection is the ability of a circuit to isolate an u ...
from images,
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
, identifying unusual patterns, and
facial recognition Facial recognition or face recognition may refer to: *Face detection, often a step done before facial recognition *Face perception, the process by which the human brain understands and interprets the face *Pareidolia, which involves, in part, seein ...
. Unlike standard autoencoders, which compress input data into a fixed latent representation, VAEs model the latent space as a probability distribution, allowing for smooth sampling and interpolation between data points. The encoder ("recognition model") maps input data to a latent space, producing means and variances that define a probability distribution. The decoder ("generative model") samples from this latent distribution and attempts to reconstruct the original input. VAEs optimize a loss function that includes both the reconstruction error and a
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how much a model probability distribution is diff ...
term, which ensures the latent space follows a known prior distribution. VAEs are particularly suitable for tasks that require structured but smooth latent spaces, although they may create blurrier images than GANs. They are used for applications like image generation, data interpolation and
anomaly detection In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of ...
.


= Transformers

= Transformers became the foundation for many powerful generative models, most notably the
generative pre-trained transformer A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an Neural network (machine learning), artificial neural network that is used in natural ...
(GPT) series developed by OpenAI. They marked a major shift in natural language processing by replacing traditional recurrent and convolutional models. This architecture allows models to process entire sequences simultaneously and capture long-range dependencies more efficiently. The self-attention mechanism enables the model to capture the significance of every word in a sequence when predicting the subsequent word, thus improving its contextual understanding. Unlike recurrent neural networks, transformers process all the tokens in parallel, which improves the training efficiency and scalability. Transformers are typically pre-trained on enormous corpora in a self-supervised manner, prior to being fine-tuned.


Law and regulation

In the United States, a group of companies including OpenAI, Alphabet, and Meta signed a voluntary agreement with the
Biden administration Joe Biden's tenure as the List of presidents of the United States, 46th president of the United States began with Inauguration of Joe Biden, his inauguration on January 20, 2021, and ended on January 20, 2025. Biden, a member of the Democr ...
in July 2023 to watermark AI-generated content. In October 2023, Executive Order 14110 applied the
Defense Production Act The Defense Production Act (DPA) of 1950 () is a United States federal law enacted on September 8, 1950, in response to the start of the Korean War.Congressional Research ServiceThe Defense Production Act of 1950: History, Authorities, and Con ...
to require all US companies to report information to the federal government when training certain high-impact AI models. In the European Union, the proposed Artificial Intelligence Act includes requirements to disclose copyrighted material used to train generative AI systems, and to label any AI-generated output as such. In China, the Interim Measures for the Management of Generative AI Services introduced by the
Cyberspace Administration of China The Cyberspace Administration of China (CAC; ) is the national internet regulator and censor of the People's Republic of China. The agency was initially established in 2011 by the State Council as the State Internet Information Office (SIIO) ...
regulates any public-facing generative AI. It includes requirements to watermark generated images or videos, regulations on training data and label quality, restrictions on personal data collection, and a guideline that generative AI must "adhere to socialist core values".


Copyright


Training with copyrighted content

Generative AI systems such as
ChatGPT ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other Multimodal learning, multimodal models to create human-like re ...
and
Midjourney Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called '' prompts'', ...
are trained on large, publicly available datasets that include copyrighted works. AI developers have argued that such training is protected under
fair use Fair use is a Legal doctrine, doctrine in United States law that permits limited use of copyrighted material without having to first acquire permission from the copyright holder. Fair use is one of the limitations to copyright intended to bal ...
, while copyright holders have argued that it infringes their rights. Proponents of fair use training have argued that it is a
transformative use In United States copyright law, transformative use or transformation is a type of fair use that builds on a copyrighted work in a different manner or for a different purpose from the original, and thus does not infringe its holder's copyright. Tr ...
and does not involve making copies of copyrighted works available to the public. Critics have argued that image generators such as
Midjourney Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called '' prompts'', ...
can create nearly-identical copies of some copyrighted images, and that generative AI programs compete with the content they are trained on. As of 2024, several lawsuits related to the use of copyrighted material in training are ongoing.
Getty Images Getty Images Holdings, Inc. (stylized as gettyimages) is a visual media company and supplier of stock images, editorial photography, video, and music for business and consumers, with a library of over 477 million assets. It targets three mark ...
has sued
Stability AI Stability AI Ltd is a UK-based artificial intelligence company, best known for its text-to-image model Stable Diffusion. History and founding Stability AI was founded in 2019 by Emad Mostaque and by Cyrus Hodes. In August 2022 Stability AI r ...
over the use of its images to train
Stable Diffusion Stable Diffusion is a deep learning, text-to-image model released in 2022 based on Diffusion model, diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of ...
. Both the
Authors Guild The Authors Guild is the United States' oldest and largest professional organization for writers and provides advocacy on issues of free expression and copyright protection. Since its founding in 1912 as the Authors League of America, it has coun ...
and
The New York Times ''The New York Times'' (''NYT'') is an American daily newspaper based in New York City. ''The New York Times'' covers domestic, national, and international news, and publishes opinion pieces, investigative reports, and reviews. As one of ...
have sued
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
and
OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...
over the use of their works to train
ChatGPT ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other Multimodal learning, multimodal models to create human-like re ...
.


Copyright of AI-generated content

A separate question is whether AI-generated works can qualify for copyright protection. The
United States Copyright Office The United States Copyright Office (USCO), a part of the Library of Congress, is a United States government body that registers copyright claims, records information about copyright ownership, provides information to the public, and assists ...
has ruled that works created by artificial intelligence without any human input cannot be copyrighted, because they lack human authorship. Some legal professionals have suggested that '' Naruto v. Slater'' (2018), in which the U.S. 9th Circuit Court of Appeals held that non-humans cannot be copyright holders of artistic works, could be a potential precedent in copyright litigation over works created by generative AI. However, the office has also begun taking public input to determine if these rules need to be refined for generative AI. In January 2025, the
United States Copyright Office The United States Copyright Office (USCO), a part of the Library of Congress, is a United States government body that registers copyright claims, records information about copyright ownership, provides information to the public, and assists ...
(USCO) released extensive guidance regarding the use of AI tools in the creative process, and established that "...generative AI systems also offer tools that similarly allow users to exert control. hesecan enable the user to control the selection and placement of individual creative elements. Whether such modifications rise to the minimum standard of originality required under Feist will depend on a case-by-case determination. In those cases where they do, the output should be copyrightable" Subsequently, the USCO registered the first visual artwork to be composed of entirely AI-generated materials, titled "A Single Piece of American Cheese".


Concerns

The development of generative AI has raised concerns from governments, businesses, and individuals, resulting in protests, legal actions, calls to pause AI experiments, and actions by multiple governments. In a July 2023 briefing of the
United Nations Security Council The United Nations Security Council (UNSC) is one of the six principal organs of the United Nations (UN) and is charged with ensuring international peace and security, recommending the admission of new UN members to the General Assembly, an ...
,
Secretary-General Secretary is a title often used in organizations to indicate a person having a certain amount of authority, Power (social and political), power, or importance in the organization. Secretaries announce important events and communicate to the org ...
António Guterres António Manuel de Oliveira Guterres (born 30 April 1949) is a Portuguese politician and diplomat who is serving as the ninth and current secretary-general of the United Nations since 2017. A member of the Socialist Party (Portugal), ...
stated "Generative AI has enormous potential for good and evil at scale", that AI may "turbocharge global development" and contribute between $10 and $15 trillion to the global economy by 2030, but that its malicious use "could cause horrific levels of death and destruction, widespread trauma, and deep psychological damage on an unimaginable scale". In addition, generative AI has a significant
carbon footprint A carbon footprint (or greenhouse gas footprint) is a calculated value or index that makes it possible to compare the total amount of greenhouse gases that an activity, product, company or country Greenhouse gas emissions, adds to the atmospher ...
.


Job losses

From the early days of the development of AI, there have been arguments put forward by
ELIZA ELIZA is an early natural language processing computer program developed from 1964 to 1967 at MIT by Joseph Weizenbaum. Created to explore communication between humans and machines, ELIZA simulated conversation by using a pattern matching and ...
creator
Joseph Weizenbaum Joseph Weizenbaum (8 January 1923 – 5 March 2008) was a German-American computer scientist and a professor at Massachusetts Institute of Technology, MIT. He is the namesake of the Weizenbaum Award and the Weizenbaum Institute. Life and career ...
and others about whether tasks that can be done by computers actually should be done by them, given the difference between computers and humans, and between quantitative calculations and qualitative, value-based judgements. In April 2023, it was reported that image generation AI has resulted in 70% of the jobs for video game illustrators in China being lost. In July 2023, developments in generative AI contributed to the
2023 Hollywood labor disputes From May 2 to November 9, 2023, a series of long labor disputes within the Cinema of the United States, film and Television in the United States, television industries of the United States took place, mainly focused on the strikes of the 2023 W ...
.
Fran Drescher Francine Joy Drescher (born September 30, 1957) is an American actress and trade unionist. She is currently serving as the national president of the Screen Actors Guild – American Federation of Television and Radio Artists (SAG-AFTRA). She pla ...
, president of the
Screen Actors Guild The Screen Actors Guild (SAG) was an American labor union which represented over 100,000 film and television principal and background performers worldwide. On March 30, 2012, the union leadership announced that the SAG membership voted to m ...
, declared that "artificial intelligence poses an existential threat to creative professions" during the
2023 SAG-AFTRA strike From July 14 to November 9, 2023, the American actors' union SAG-AFTRA (Screen Actors Guild – American Federation of Television and Radio Artists) went on strike over a labor dispute with the Alliance of Motion Picture and Television Pro ...
. Voice generation AI has been seen as a potential challenge to the
voice acting Voice acting is the art of performing a character or providing information to an audience with one's voice. Performers are often called voice actors/actresses in addition to other names. Examples of voice work include animated, off-stage, off-sc ...
sector. The intersection of AI and employment concerns among underrepresented groups globally remains a critical facet. While AI promises efficiency enhancements and skill acquisition, concerns about job displacement and biased recruiting processes persist among these groups, as outlined in surveys by
Fast Company ''Fast Company'' is an American business magazine published monthly in print and online, focusing on technology, business, and design. It releases six print issues annually. History ''Fast Company'' was founded in November 1995 by Alan Webb ...
. To leverage AI for a more equitable society, proactive steps encompass mitigating biases, advocating transparency, respecting privacy and consent, and embracing diverse teams and ethical considerations. Strategies involve redirecting policy emphasis on regulation, inclusive design, and education's potential for personalized teaching to maximize benefits while minimizing harms.


Racial and gender bias

Generative AI models can reflect and amplify any
cultural bias Cultural bias is the interpretation and judgment of phenomena by the standards of one's own culture. It is sometimes considered a problem central to social and human sciences, such as economics, psychology, anthropology, and sociology. Some practit ...
present in the underlying data. For example, a language model might assume that doctors and judges are male, and that secretaries or nurses are female, if those biases are common in the training data. Similarly, an image model prompted with the text "a photo of a CEO" might disproportionately generate images of white male CEOs, if trained on a racially biased data set. A number of methods for mitigating bias have been attempted, such as altering input prompts and reweighting training data.


Deepfakes

Deepfakes (a
portmanteau In linguistics, a blend—also known as a blend word, lexical blend, or portmanteau—is a word formed by combining the meanings, and parts of the sounds, of two or more words together.
of "deep learning" and "fake") are AI-generated media that take a person in an existing image or video and replace them with someone else's likeness using
artificial neural network In machine learning, a neural network (also artificial neural network or neural net, abbreviated ANN or NN) is a computational model inspired by the structure and functions of biological neural networks. A neural network consists of connected ...
s. Deepfakes have garnered widespread attention and concerns for their uses in deepfake celebrity pornographic videos,
revenge porn Revenge porn is the distribution of sexually explicit images or videos of individuals without their consent, with the punitive intention to create public humiliation or character assassination out of revenge against the victim. The material ma ...
,
fake news Fake news or information disorder is false or misleading information (misinformation, disinformation, propaganda, and hoaxes) claiming the aesthetics and legitimacy of news. Fake news often has the aim of damaging the reputation of a person ...
,
hoax A hoax (plural: hoaxes) is a widely publicised falsehood created to deceive its audience with false and often astonishing information, with the either malicious or humorous intent of causing shock and interest in as many people as possible. S ...
es, health
disinformation Disinformation is misleading content deliberately spread to deceive people, or to secure economic or political gain and which may cause public harm. Disinformation is an orchestrated adversarial activity in which actors employ strategic dece ...
,
financial fraud In law, fraud is intentional deception to deprive a victim of a legal right or to gain from a victim unlawfully or unfairly. Fraud can violate civil law (e.g., a fraud victim may sue the fraud perpetrator to avoid the fraud or recover mone ...
, and covert foreign election interference. This has elicited responses from both industry and government to detect and limit their use. In July 2023, the fact-checking company Logically found that the popular generative AI models
Midjourney Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called '' prompts'', ...
,
DALL-E 2 DALL-E, DALL-E 2, and DALL-E 3 (stylised DALL·E) are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as ''prompts''. The first version of DALL-E w ...
and
Stable Diffusion Stable Diffusion is a deep learning, text-to-image model released in 2022 based on Diffusion model, diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of ...
would produce plausible disinformation images when prompted to do so, such as images of
electoral fraud Electoral fraud, sometimes referred to as election manipulation, voter fraud, or vote rigging, involves illegal interference with the process of an election, either by increasing the vote share of a favored candidate, depressing the vote share o ...
in the United States and Muslim women supporting India's
Hindu nationalist Hindu nationalism has been collectively referred to as the expression of political thought, based on the native social and cultural traditions of the Indian subcontinent. "Hindu nationalism" is a simplistic translation of . It is better descri ...
Bharatiya Janata Party The Bharatiya Janata Party (BJP; , ) is a political party in India and one of the two major List of political parties in India, Indian political parties alongside the Indian National Congress. BJP emerged out from Syama Prasad Mukherjee's ...
. In April 2024, a paper proposed to use
blockchain The blockchain is a distributed ledger with growing lists of Record (computer science), records (''blocks'') that are securely linked together via Cryptographic hash function, cryptographic hashes. Each block contains a cryptographic hash of th ...
(
distributed ledger A distributed ledger (also called a shared ledger or distributed ledger technology or DLT) is a system whereby replicated, shared, and synchronized digital data is geographically spread (distributed) across many sites, countries, or institutions. I ...
technology) to promote "transparency, verifiability, and decentralization in AI development and usage".


Audio deepfakes

Instances of users abusing software to generate controversial statements in the vocal style of celebrities, public officials, and other famous individuals have raised ethical concerns over voice generation AI. In response, companies such as ElevenLabs have stated that they would work on mitigating potential abuse through safeguards and identity verification. Concerns and fandoms have spawned from AI-generated music. The same software used to clone voices has been used on famous musicians' voices to create songs that mimic their voices, gaining both tremendous popularity and criticism. Similar techniques have also been used to create improved quality or full-length versions of songs that have been leaked or have yet to be released. Generative AI has also been used to create new digital artist personalities, with some of these receiving enough attention to receive record deals at major labels. The developers of these virtual artists have also faced their fair share of criticism for their personified programs, including backlash for "dehumanizing" an artform, and also creating artists which create unrealistic or immoral appeals to their audiences.


Illegal imagery

Many websites that allow explicit AI generated images or videos have been created, and this has been used to create illegal content, such as
rape Rape is a type of sexual assault involving sexual intercourse, or other forms of sexual penetration, carried out against a person without consent. The act may be carried out by physical force, coercion, abuse of authority, or against a person ...
, child sexual abuse material,
necrophilia Necrophilia, also known as necrophilism, necrolagnia, necrocoitus, necrochlesis, and thanatophilia, is sexual attraction or acts involving corpses. It is classified as a paraphilia by the World Health Organization (WHO) in its ''International ...
, and
zoophilia Zoophilia is a paraphilia in which a person experiences a sexual fixation on non-human animals. Bestiality instead refers to cross-species sexual activity between humans and non-human animals. Due to the lack of research on the subject, it is ...
.


Cybercrime

Generative AI's ability to create realistic fake content has been exploited in numerous types of cybercrime, including
phishing Phishing is a form of social engineering and a scam where attackers deceive people into revealing sensitive information or installing malware such as viruses, worms, adware, or ransomware. Phishing attacks have become increasingly sophisticate ...
scams.
Deepfake ''Deepfakes'' (a portmanteau of and ) are images, videos, or audio that have been edited or generated using artificial intelligence, AI-based tools or AV editing software. They may depict real or fictional people and are considered a form of ...
video and audio have been used to create disinformation and fraud. In 2020, former Google
click fraud Click fraud is a type of ad fraud that occurs on the Internet in pay per click (PPC) online advertising. In this type of advertising, the owners of websites that post the ads are paid based on how many site visitors click on the ads. Fraud occurs ...
czar Shuman Ghosemajumder argued that once deepfake videos become perfectly realistic, they would stop appearing remarkable to viewers, potentially leading to uncritical acceptance of false information. Additionally,
large language model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are g ...
s and other forms of text-generation AI have been used to create fake reviews of
e-commerce E-commerce (electronic commerce) refers to commercial activities including the electronic buying or selling products and services which are conducted on online platforms or over the Internet. E-commerce draws on technologies such as mobile co ...
websites to boost ratings. Cybercriminals have created large language models focused on fraud, including WormGPT and FraudGPT. A 2023 study showed that generative AI can be vulnerable to jailbreaks,
reverse psychology Reverse psychology is a technique involving the assertion of a belief or behavior that is opposite to the one desired, with the expectation that this approach will encourage the subject of the persuasion to do what is actually desired. This techn ...
and prompt injection attacks, enabling attackers to obtain help with harmful requests, such as for crafting social engineering and phishing attacks. Additionally, other researchers have demonstrated that open-source models can be fine-tuned to remove their safety restrictions at low cost.


Reliance on industry giants

Training frontier AI models requires an enormous amount of computing power. Usually only
Big Tech Big Tech, also referred to as the Tech Giants or Tech Titans, is a collective term for the largest and most influential technology companies in the world. The label draws a parallel to similar classifications in other industries, such as "Big Oi ...
companies have the financial resources to make such investments. Smaller start-ups such as Cohere and
OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...
end up buying access to
data centers A data center is a building, a dedicated space within a building, or a group of buildings used to house computer, computer systems and associated components, such as telecommunications and computer data storage, storage systems. Since IT opera ...
from
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
and
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
respectively.


Energy and environment

AI has a significant carbon footprint due to growing energy consumption from both training and usage. Scientists and journalists have expressed concerns about the environmental impact that the development and deployment of generative models are having: high CO2 emissions, large amounts of freshwater used for data centers, and high amounts of electricity usage. There is also concern that these impacts may increase as these models are incorporated into widely used search engines such as Google Search and Bing, as
chatbot A chatbot (originally chatterbot) is a software application or web interface designed to have textual or spoken conversations. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of main ...
s and other applications become more popular, and as models need to be retrained. The carbon footprint of generative AI globally is estimated to be growing steadily, with potential annual emissions ranging from 18.21 to 245.94 million tons of CO2 by 2035, with the highest estimates for 2035 nearing the impact of the United States
beef industry The meat industry are the people and companies engaged in modern industrialized livestock agriculture for the production, packing, preservation and marketing of meat (in contrast to dairy products, wool, etc.). In economics, the meat industry is ...
on emissions (currently estimated to emit 257.5 million tons annually as of 2024). Proposed mitigation strategies include factoring potential environmental costs prior to model development or data collection, increasing efficiency of data centers to reduce electricity/energy usage, building more efficient machine learning models, minimizing the number of times that models need to be retrained, developing a government-directed framework for auditing the environmental impact of these models, regulating for transparency of these models, regulating their energy and water usage, encouraging researchers to publish data on their models' carbon footprint, and increasing the number of subject matter experts who understand both machine learning and climate science.


Content quality

''
The New York Times ''The New York Times'' (''NYT'') is an American daily newspaper based in New York City. ''The New York Times'' covers domestic, national, and international news, and publishes opinion pieces, investigative reports, and reviews. As one of ...
'' defines slop as analogous to
spam Spam most often refers to: * Spam (food), a consumer brand product of canned processed pork of the Hormel Foods Corporation * Spamming, unsolicited or undesired electronic messages ** Email spam, unsolicited, undesired, or illegal email messages ...
: "shoddy or unwanted A.I. content in social media, art, books and ... in search results." Journalists have expressed concerns about the scale of low-quality generated content with respect to social media content moderation, the monetary incentives from social media companies to spread such content, false political messaging, spamming of scientific research paper submissions, increased time and effort to find higher quality or desired content on the Internet, the indexing of generated content by search engines, and on journalism itself. A paper published by researchers at Amazon Web Services AI Labs found that over 57% of sentences from a sample of over 6 billion sentences from
Common Crawl Common Crawl is a nonprofit organization, nonprofit 501(c) organization#501.28c.29.283.29, 501(c)(3) organization that web crawler, crawls the web and freely provides its archives and datasets to the public. Common Crawl's Web archiving, web arch ...
, a snapshot of web pages, were machine translated. Many of these automated translations were seen as lower quality, especially for sentences that were translated across at least three languages. Many lower-resource languages (ex.
Wolof Wolof or Wollof may refer to: * Wolof people, an ethnic group found in Senegal, Gambia, and Mauritania * Wolof language, a language spoken in Senegal, Gambia, and Mauritania * The Wolof or Jolof Empire, a medieval West African successor of the Mal ...
,
Xhosa Xhosa may refer to: * Xhosa people, a nation, and ethnic group, who live in south-central and southeasterly region of South Africa * Xhosa language, one of the 11 official languages of South Africa, principally spoken by the Xhosa people See als ...
) were translated across more languages than higher-resource languages (ex. English, French). In September 2024, Robyn Speer, the author of wordfreq, an open source database that calculated word frequencies based on text from the Internet, announced that she had stopped updating the data for several reasons: high costs for obtaining data from
Reddit Reddit ( ) is an American Proprietary software, proprietary social news news aggregator, aggregation and Internet forum, forum Social media, social media platform. Registered users (commonly referred to as "redditors") submit content to the ...
and
Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
, excessive focus on generative AI compared to other methods in the
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
community, and that "generative AI has polluted the data". The adoption of generative AI tools led to an explosion of AI-generated content across multiple domains. A study from
University College London University College London (Trade name, branded as UCL) is a Public university, public research university in London, England. It is a Member institutions of the University of London, member institution of the Federal university, federal Uni ...
estimated that in 2023, more than 60,000 scholarly articles—over 1% of all publications—were likely written with LLM assistance. According to
Stanford University Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...
's Institute for Human-Centered AI, approximately 17.5% of newly published computer science papers and 16.9% of peer review text now incorporate content generated by LLMs. Many academic disciplines have concerns about the factual reliably of academic content generated by AI. Visual content follows a similar trend. Since the launch of
DALL-E DALL-E, DALL-E 2, and DALL-E 3 (stylised DALL·E) are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as Prompt engineering, ''prompts''. The first ...
2 in 2022, it is estimated that an average of 34 million images have been created daily. As of August 2023, more than 15 billion images had been generated using text-to-image algorithms, with 80% of these created by models based on
Stable Diffusion Stable Diffusion is a deep learning, text-to-image model released in 2022 based on Diffusion model, diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of ...
. If AI-generated content is included in new data crawls from the Internet for additional training of AI models, defects in the resulting models may occur. Training an AI model exclusively on the output of another AI model produces a lower-quality model. Repeating this process, where each new model is trained on the previous model's output, leads to progressive degradation and eventually results in a " model collapse" after multiple iterations. Tests have been conducted with pattern recognition of handwritten letters and with pictures of human faces. As a consequence, the value of data collected from genuine human interactions with systems may become increasingly valuable in the presence of LLM-generated content in data crawled from the Internet. On the other side,
synthetic data Synthetic data are artificially generated rather than produced by real-world events. Typically created using algorithms, synthetic data can be deployed to validate mathematical models and to train machine learning models. Data generated by a comp ...
is often used as an alternative to data produced by real-world events. Such data can be deployed to validate mathematical models and to train machine learning models while preserving user privacy, including for structured data. The approach is not limited to text generation; image generation has been employed to train computer vision models.


Misuse in journalism

In January 2023, ''Futurism.com'' broke the story that CNET had been using an undisclosed internal AI tool to write at least 77 of its stories; after the news broke, CNET posted corrections to 41 of the stories. In April 2023, the German tabloid '' Die Aktuelle'' published a fake AI-generated interview with former racing driver
Michael Schumacher Michael Schumacher (; born 3 January 1969) is a German former racing driver, who competed in Formula One from to and from to . Schumacher won a record-setting seven Formula One World Drivers' Championship titles, tied by Lewis Hamilton in ...
, who had not made any public appearances since 2013 after sustaining a brain injury in a skiing accident. The story included two possible disclosures: the cover included the line "deceptively real", and the interview included an acknowledgment at the end that it was AI-generated. The editor-in-chief was fired shortly thereafter amid the controversy. Other outlets that have published articles whose content or byline have been confirmed or suspected to be created by generative AI models – often with false content, errors, or non-disclosure of generative AI use – include: * NewsBreak * outlets owned by Arena Group ** ''
Sports Illustrated ''Sports Illustrated'' (''SI'') is an American sports magazine first published in August 1954. Founded by Stuart Scheftel, it was the first magazine with a circulation of over one million to win the National Magazine Award for General Excellen ...
'' **
TheStreet ''TheStreet'' is a financial news and financial literacy website. It is a subsidiary of The Arena Group. The company provides both free content and subscription services such as Action Alerts Plus, a stock recommendation portfolio co-managed b ...
** ''
Men's Journal ''Men's Journal'' was an American men's lifestyle magazine focused on outdoor recreation and comprising editorials on the outdoors, environmental issues, health and fitness, style and fashion, and gear. It was founded in 1992 by Jann Wenner of ...
'' * B&H Photo * outlets owned by
Gannett Gannett Co., Inc. ( ) is an American mass media holding company headquartered in New York City. It is the largest U.S. newspaper publisher as measured by total daily circulation. It owns the national newspaper ''USA Today'', as well as several ...
** ''
The Columbus Dispatch ''The Columbus Dispatch'' is a daily newspaper based in Columbus, Ohio. Its first issue was published on July 1, 1871, and it has been the only mainstream daily newspaper in the city since ''The Columbus Citizen-Journal'' ceased publication in ...
'' ** Reviewed **
USA Today ''USA Today'' (often stylized in all caps) is an American daily middle-market newspaper and news broadcasting company. Founded by Al Neuharth in 1980 and launched on September 14, 1982, the newspaper operates from Gannett's corporate headq ...
** '' Journal Star'' ** ''
El Paso Times The ''El Paso Times'' is the newspaper for the US city of El Paso, Texas. The paper is the only English-language daily in El Paso (after the ''El Paso Herald-Post'', an afternoon paper, closed in 1997), but often competes with the Spanish-languag ...
'' ** ''
Fort Collins Coloradoan The ''Coloradoan'' is a daily newspaper in Fort Collins, Colorado. The ''Coloradoan''s website is updated throughout the day with breaking news and video coverage of community news in Northern Colorado. History Founded by Joseph L. McClella ...
'' ** ''
The Record The Record may refer to: Music * The Record (Fear album), ''The Record'' (Fear album), a 1982 studio album by the hardcore-punk band Fear * The Record (Boygenius album), ''The Record'' (Boygenius album), a 2023 studio album by the indie rock supe ...
'' ** ''
The Augusta Chronicle ''The Augusta Chronicle'' is the daily newspaper of Augusta, Georgia, and is one of the oldest newspapers in the United States still in publication. The paper is known for its coverage of the Masters Tournament, which is played in Augusta. Hist ...
'' ** ''
The Providence Journal ''The Providence Journal'', colloquially known as the ''ProJo'', is a daily newspaper serving the metropolitan area of Providence, the largest newspaper in Rhode Island, US. The newspaper was first published in 1829. The newspaper had won four ...
'' ** ''
Argus Leader The ''Argus Leader'' is the daily newspaper of Sioux Falls, South Dakota. It is the largest newspaper by total circulation in South Dakota. It is owned by Gannett and part of the USA Today Network. History The ''Argus Leader'' traces its h ...
'' ** ''
Southwest Times Record The ''Southwest Times Record'' is a daily newspaper in Fort Smith, Arkansas, and covers 10 counties in western Arkansas and eastern Oklahoma. It is owned and published by Gannett. History The ''Times Record'' began as three separate papers: ...
'' ** ''
The Des Moines Register ''The Des Moines Register'' is the daily morning newspaper of Des Moines, Iowa, United States. History Early period The first newspaper in Des Moines was the ''Iowa Star''. In July 1849, Barlow Granger began the paper in an abandoned log cab ...
'' ** ''
North Jersey Media Group North Jersey Media Group is a newspaper publishing company headquartered in Woodland Park, New Jersey and owned by the Gannett Company, Inc. It publishes ''The Record'', the ''Herald News'' of Passaic County, the ''Daily Record'' of Morris Coun ...
'' ** ''
Pocono Record The ''Pocono Record'' is a daily newspaper published in print and online in Stroudsburg, Pennsylvania, United States. History The ''Pocono Record'' was founded as the ''Stroudsburg Daily Times'' on April 2, 1894. In 1946 the newspaper was pur ...
'' *
MSN MSN is a web portal and related collection of Internet services and apps provided by Microsoft. The main webpage provides news, weather, sports, finance and other content curated from hundreds of different sources that Microsoft has partnere ...
*
News Corp The second and current incarnation of News Corporation, doing business as News Corp, is an American mass media and publishing company headquartered at 1211 Avenue of the Americas in Midtown Manhattan, New York City. The company was formed on ...
* outlets owned by
G/O Media G/O Media Inc. is an American media holding company that owns and operates the digital media outlets '' Kotaku'' and '' The Root''. It was formed in 2019 after the private equity firm Great Hill Partners purchased two digital portfolios from ...
** ''
Gizmodo ''Gizmodo'' () is a design, technology, science, and science fiction website. It was originally launched as part of the Gawker Media network run by Nick Denton. ''Gizmodo'' also includes the sub-blogs ''io9'' and ''Earther'', which focus on pop ...
'' ** ''Jalopnik'' ** ''
A.V. Club ''The A.V. Club'' is an online newspaper and entertainment website featuring reviews, interviews, and other articles that examine films, music, television, books, games, and other elements of pop-culture media. ''The A.V. Club'' was created in ...
'' ** ''
Quartz Quartz is a hard, crystalline mineral composed of silica (silicon dioxide). The Atom, atoms are linked in a continuous framework of SiO4 silicon–oxygen Tetrahedral molecular geometry, tetrahedra, with each oxygen being shared between two tet ...
'' ** ''
Deadspin ''Deadspin'' is a sports blog owned by Lineup Publishing. Founded by Will Leitch in 2005 and originally based in Chicago, it was then sold to Gawker Media, Univision Communications and G/O Media. Lineup Publishing acquired it in March 2024, t ...
'' ** '' The Takeout'' * ''
The Irish Times ''The Irish Times'' is an Irish daily broadsheet newspaper and online digital publication. It was launched on 29 March 1859. The editor is Ruadhán Mac Cormaic. It is published every day except Sundays. ''The Irish Times'' is Ireland's leading n ...
'' * outlets owned by
Red Ventures Red Ventures is an American media company that owns and operates brands such as Lonely Planet, The Points Guy, Healthline, and Bankrate. Red Ventures focuses on news, advice, and review websites. The company's corporate headquarters is locate ...
**
Bankrate Bankrate, LLC is a consumer financial services company based in New York City. Bankrate.com, perhaps its best-known brand, is a personal finance website. As of November 8, 2017, it became a subsidiary of Red Ventures through an acquisition. His ...
*
BuzzFeed BuzzFeed, Inc. is an American Internet mass media, media, news and entertainment company with a focus on digital media. Based in New York City, BuzzFeed was founded in 2006 by Jonah Peretti and John Seward Johnson III, John S. Johnson III to ...
*
Newsweek ''Newsweek'' is an American weekly news magazine based in New York City. Founded as a weekly print magazine in 1933, it was widely distributed during the 20th century and has had many notable editors-in-chief. It is currently co-owned by Dev P ...
* Hoodline * outlets owned by
Outside Inc. Outside Inc., formerly called Pocket Outdoor Media until February 2021, is an American company focused on sports and recreation (especially outdoor sports), fitness and nutrition. It has various ventures such as ''Outside'' magazine, Outside ...
** ''
Yoga Journal ''Yoga Journal'' is a website and digital journal, formerly a print magazine, on yoga as exercise founded in California in 1975 with the goal of combining the essence of traditional yoga with scientific understanding. It has produced live events ...
'' ** '' Backpacker'' ** ''
Clean Eating Clean eating is an umbrella term for variety of diets based on the belief that consuming whole foods and avoiding convenience food and other processed foods offers certain health benefits. Variations of the diet may also exclude gluten, grains, ...
'' *
Hollywood Life Hollywood Life is an American digital media brand launched in 2009 by magazine editor Bonnie Fuller. The site covers celebrity, fashion, beauty, women issues, and entertainment news. It also airs award shows and other pop culture events. History ...
*
Us Weekly ''Us Weekly'' is an American weekly celebrity and entertainment magazine based in New York City. ''Us Weekly'' was founded in 1977 by The New York Times Company, which sold it in 1980. It was acquired by Wenner Media in 1986, and sold to Ameri ...
*
The Los Angeles Times The ''Los Angeles Times'' is an American daily newspaper that began publishing in Los Angeles, California, in 1881. Based in the Greater Los Angeles city of El Segundo since 2018, it is the sixth-largest newspaper in the U.S. and the larges ...
* Cody Enterprise *
Cosmos The cosmos (, ; ) is an alternative name for the universe or its nature or order. Usage of the word ''cosmos'' implies viewing the universe as a complex and orderly system or entity. The cosmos is studied in cosmologya broad discipline covering ...
* outlets owned by McClatchy **
Miami Herald The ''Miami Herald'' is an American daily newspaper owned by McClatchy, The McClatchy Company and headquartered in Miami-Dade County, Florida. Founded in 1903, it is the fifth-largest newspaper in Florida, serving Miami-Dade, Broward County, Fl ...
**
Sacramento Bee ''The Sacramento Bee'' is a daily newspaper published in Sacramento, California, in the United States. Since its foundation in 1857, ''The Bee'' has become the largest newspaper in Sacramento, the fifth largest newspaper in California, and the 2 ...
**
Tacoma News Tribune ''The News Tribune'' is an American daily newspaper based in Tacoma, Washington. It is the second-largest daily newspaper in the state of Washington with a weekday circulation of 30,945 in 2020. With origins dating back to 1883, the newspaper w ...
** The Rock Hill Herald ** The Modesto Bee ** Fort Worth Star-Telegram ** Merced Sun-Star ** Ledger-Enquirer ** The Kansas City Star ** Raleigh News & Observer * outlets owned by Ziff Davis ** PC Magazine ** Mashable ** AskMen * outlets owned by Hearst Communications, Hearst ** Good Housekeeping * outlets owned by IAC Inc. ** ''People (magazine), People'' ** ''Parents (magazine), Parents'' ** ''Food & Wine'' ** ''InStyle'' ** ''Real Simple'' ** ''Travel + Leisure'' ** ''Better Homes and Gardens (magazine), Better Homes & Gardens'' ** ''Southern Living'' * outlets owned by Street Media ** LA Weekly ** The Village Voice * Riverfront Times * Apple Intelligence In May 2024, Futurism noted that a content management system video by AdVon Commerce, who had used generative AI to produce articles for many of the aforementioned outlets, appeared to show that they "had produced tens of thousands of articles for more than 150 publishers."'''' News broadcasters in Kuwait, Greece, South Korea, India, China and Taiwan have presented news with anchors based on Generative AI models, prompting concerns about job losses for human anchors and audience trust in news that has historically been influenced by parasocial relationships with broadcasters, content creators or social media influencers. Algorithmically generated anchors have also been used by allies of ISIS for their broadcasts. In 2023, Google reportedly pitched a tool to news outlets that claimed to "produce news stories" based on input data provided, such as "details of current events". Some news company executives who viewed the pitch described it as "[taking] for granted the effort that went into producing accurate and artful news stories." In February 2024, Google launched a program to pay small publishers to write three articles per day using a beta generative AI model. The program does not require the knowledge or consent of the websites that the publishers are using as sources, nor does it require the published articles to be labeled as being created or assisted by these models. Many defunct news sites (''The Hairpin'', ''The Frisky'', ''Apple Daily, Ashland Daily Tidings'', ''List of newspapers in Iowa, Clayton County Register'', ''Southwest Journal'') and blogs (''The Unofficial Apple Weblog'', ''iLounge'') have undergone cybersquatting, with articles created by generative AI. United States Senators Richard Blumenthal and Amy Klobuchar have expressed concern that generative AI could have a harmful impact on local news. In July 2023, OpenAI partnered with the American Journalism Project to fund local news outlets for experimenting with generative AI, with Axios noting the possibility of generative AI companies creating a dependency for these news outlets. Meta AI, a chatbot based on Llama 3 which summarizes news stories, was noted by ''The Washington Post'' to copy sentences from those stories without direct attribution and to potentially further decrease the traffic of online news outlets. In response to potential pitfalls around the use and misuse of generative AI in journalism and worries about declining audience trust, outlets around the world, including publications such as ''Wired (magazine), Wired'', Associated Press, The Quint, Rappler or ''The Guardian'' have published guidelines around how they plan to use and not use AI and generative AI in their work. In June 2024, Reuters Institute published their ''Digital News Report for 2024''. In a survey of people in America and Europe, Reuters Institute reports that 52% and 47% respectively are uncomfortable with news produced by "mostly AI with some human oversight", and 23% and 15% respectively report being comfortable. 42% of Americans and 33% of Europeans reported that they were comfortable with news produced by "mainly human with some help from AI". The results of global surveys reported that people were more uncomfortable with news topics including politics (46%), crime (43%), and local news (37%) produced by AI than other news topics.


See also

* * * * * * * * * * * * * *


References


Further reading

* {{Authority control Artificial neural networks Deep learning Machine learning Generative artificial intelligence, 2020s in computing 2023 in computing 2024 in computing 2025 in computing