Adobe VoCo is an unreleased
audio editing
Audio editing software is any software or computer program which allows editing and generating audio data. Audio editing software can be implemented completely or partly as a library, as a computer application, as a web application, or as a loa ...
and
generating prototype software by
Adobe
Adobe (from arabic: الطوب Attub ; ) is a building material made from earth and organic materials. is Spanish for mudbrick. In some English-speaking regions of Spanish heritage, such as the Southwestern United States, the term is use ...
that enables novel editing and generation of audio. Dubbed "
Photoshop
Adobe Photoshop is a raster graphics editor developed and published by Adobe for Windows and macOS. It was created in 1987 by Thomas and John Knoll. It is the most used tool for professional digital art, especially in raster graphics editin ...
-for-voice",
[ it was first previewed at the Adobe MAX event in November 2016. The technology shown at Adobe MAX was a preview that could potentially be incorporated into ]Adobe Creative Cloud
Adobe Creative Cloud is a set of applications and services from Adobe Inc., Adobe that gives subscribers access to a collection of software used for graphic design, video editing, web development, photography, along with a set of mobile applicat ...
. It was later revealed that Voco was never meant to be released and was meant to be a research prototype.
In 2023, Adobe introduced the ability to edit video by editing an AI-generated transcript of the video in Premiere Pro
Adobe Premiere Pro is a video editing application developed by Adobe Inc. and is distributed as part of the Adobe Creative Cloud suite. It is primarily used for producing high-quality videos across various industries.
History Original Ado ...
, demonstrating similar functionality to Voco.
Technical details
As the demo showed, the software takes approximately 20 minutes of the desired target's speech and generates a sound-alike voice including phonemes
A phoneme () is any set of similar speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible phonetic unit—that helps distinguish one word from another. All languages con ...
that were not present in the target example material. Adobe stated Voco would lower the cost of audio production.[
]
Concerns
Ethical and security concerns were raised over the ability to alter an audio recording to include words and phrases the original speaker never spoke, and the potential risk to voiceprint biometrics
Biometrics are body measurements and calculations related to human characteristics and features. Biometric authentication (or realistic authentication) is used in computer science as a form of identification and access control. It is also used t ...
.[
]
Concerns also rose that it may be used in conjunction with:
* Human image synthesis, which has reached such levels of likeness since the early 2000s
File:2000s decade montage3.png, From top left, clockwise: The Twin Towers of the original World Trade Center (1973–2001), World Trade Center on fire and the Statue of Liberty on the left during the September 11 attacks, terrorist attacks on Sep ...
that distinguishing between a human recorded with a camera and a simulation of a human is very difficult.
* Video manipulation of a person's facial expressions
Facial expression is the motion and positioning of the muscles beneath the skin of the face. These movements convey the emotional state of an individual to observers and are a form of nonverbal communication. They are a primary means of conveying ...
in near real-time using an existing 2D RGB
The RGB color model is an additive color model in which the red, green, and blue primary colors of light are added together in various ways to reproduce a broad array of colors. The name of the model comes from the initials of the three ...
video of them.[
]
Alternatives
Adobe's lack of publicized progress opened opportunities for other projects to build alternative products to VOCO, such a
Resemble AI
and 15.ai, a real-time text-to-speech tool using artificial intelligence.
WaveNet is a similar but open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
research project at London-based artificial intelligence firm DeepMind
DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Go ...
, developed independently around the same time as Adobe Voco.
See also
* 15.ai
* WaveNet
References
Voco
Speech synthesis
{{Simulation-software-stub