
Generative audio refers to the creation of
audio
Audio most commonly refers to sound, as it is transmitted in signal form. It may also refer to:
Sound
*Audio signal, an electrical representation of sound
*Audio frequency, a frequency in the audio spectrum
*Digital audio, representation of sound ...
files from databases of
audio clips. This technology differs from
synthesized voices such as Apple's
Siri
Siri ( , backronym: Speech Interpretation and Recognition Interface) is a digital assistant purchased, developed, and popularized by Apple Inc., which is included in the iOS, iPadOS, watchOS, macOS, Apple TV, audioOS, and visionOS operating sys ...
or Amazon's
Alexa
Alexa may refer to: Technology
*Amazon Alexa, a virtual assistant developed by Amazon
* Alexa Internet, a defunct website ranking and traffic analysis service
* Alexa Fluor, a family of fluorescent dyes
* Arri Alexa, a digital motion picture ca ...
, which use a collection of fragments that are stitched together on demand.
Generative audio works by using neural networks to learn the statistical properties of an audio source, then reproduces those properties.
Implications
With this technology, a person's voice
can be replicated to speak phrases that they may have never spoken. This could lead to a synthetic version of a public figure's voice being used against them.
Technology
Modern generative audio systems employ various deep learning architectures. One notable approach uses
generative adversarial network
A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June ...
s (GANs), where two machine learning models work against each other to create realistic audio. Other architectures include
WaveNet, which uses dilated causal convolutions to model raw audio waveforms, and implementations like
15.ai, which demonstrated in 2020 the ability to clone voices using as little as 15 seconds of training data through specialized neural network architectures.
See also
*
15.ai
*
Deep learning speech synthesis
Deep learning speech synthesis refers to the application of deep learning models to generate natural-sounding human speech from written text (text-to-speech) or spectrum (vocoder). Deep neural networks are trained using large amounts of recorded s ...
*
Generative art
Generative art is post-conceptual art that has been created (in whole or in part) with the use of an autonomous system. An ''autonomous system'' in this context is generally one that is non-human and can independently determine features of an ...
*
Generative music
Generative music is a term popularized by Brian Eno to describe music that is ever-different and changing, and that is created by a system.
Historical background
In 1995 whilst working with SSEYO's Koan_(program), Koan software (built by Tim ...
*
WaveNet
References
{{reflist
Sound production