Adobe Voco is an unreleased
audio editing and
generating prototype software by
Adobe
Adobe ( ; ) is a building material made from earth and organic materials. is Spanish for '' mudbrick''. In some English-speaking regions of Spanish heritage, such as the Southwestern United States, the term is used to refer to any kind of ...
that enables novel editing and generation of audio. Dubbed "
Photoshop
Adobe Photoshop is a raster graphics editor developed and published by Adobe Inc. for Windows and macOS. It was originally created in 1988 by Thomas and John Knoll. Since then, the software has become the industry standard not only in ras ...
-for-voice",
[ it was first previewed at the ]Adobe MAX
Adobe MAX is an annual creativity conference held by Adobe Inc. The event helps Adobe to present the new developments of its suite of applications and to build a community of creative professionals.
History
The first MAX conference was hel ...
event in November 2016. The technology shown at Adobe MAX was a preview that could potentially be incorporated into Adobe Creative Cloud
Adobe Creative Cloud is a set of applications and services from Adobe Inc. that gives subscribers access to a collection of software used for graphic design, video editing, web development, photography, along with a set of mobile applications ...
. It was later revealed that Voco was never meant to be released and was meant to be a research prototype.
Technical details
As the demo showed, the software takes approximately 20 minutes of the desired target's speech and generates a sound-alike voice including phonemes
In phonology and linguistics, a phoneme () is a unit of sound that can distinguish one word from another in a particular language.
For example, in most dialects of English, with the notable exception of the West Midlands and the north-wes ...
that were not present in the target example material. Adobe stated Voco would lower the cost of audio production.[
]
Concerns
Ethical and security concerns were raised over the ability to alter an audio recording to include words and phrases the original speaker never spoke, and the potential risk to voiceprint biometrics
Biometrics are body measurements and calculations related to human characteristics. Biometric authentication (or realistic authentication) is used in computer science as a form of identification and access control. It is also used to identify ...
.[
]
Concerns also rose that it may be used in conjunction with:
* Human image synthesis
Human image synthesis is technology that can be applied to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery h ...
, which has reached such levels of likeness since the early 2000s
File:2000s decade montage3.png, From top left, clockwise: The World Trade Center on fire and the Statue of Liberty during the 9/11 attacks in 2001; the euro enters into European currency in 2002; a statue of Saddam Hussein being toppled during ...
that distinguishing between a human recorded with a camera and a simulation of a human is very difficult.
* Video manipulation of a person's facial expressions
A facial expression is one or more motions or positions of the muscles beneath the skin of the face. According to one set of controversial theories, these movements convey the emotional state of an individual to observers. Facial expressions are a ...
in near real-time
Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constrain ...
using an existing 2D RGB
The RGB color model is an additive color model in which the red, green and blue primary colors of light are added together in various ways to reproduce a broad array of colors. The name of the model comes from the initials of the three additiv ...
video of them.[
]
Alternatives
Adobe's lack of publicized progress opened opportunities for other projects to build alternative products to VOCO, such a
Resemble AI
and 15.ai, a real-time text-to-speech tool using artificial intelligence.
WaveNet
WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices b ...
is a similar but open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
research project at London-based artificial intelligence firm DeepMind
DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was acquired by Google in 2014 and became a wholly owned subsidiary of Alphabet Inc, after Google's restru ...
, developed independently around the same time as Adobe Voco.
See also
* 15.ai
*WaveNet
WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices b ...
References
Adobe software
Speech synthesis
{{Simulation-software-stub