15ai
   HOME

TheInfoList



OR:

15.ai is a free non-commercial
web application A web application (or web app) is application software that is created with web technologies and runs via a web browser. Web applications emerged during the late 1990s and allowed for the server to dynamically build a response to the request, ...
and
research project Research is creative and systematic work undertaken to increase the stock of knowledge. It involves the collection, organization, and analysis of evidence to increase understanding of a topic, characterized by a particular attentiveness to ...
that uses
artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
to generate
text-to-speech Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or Computer hardware, hardware products. A text-to-speech (TTS) system conv ...
voices of fictional characters from
popular media Mass media include the diverse arrays of media that reach a large audience via mass communication. Broadcast media transmit information electronically via media such as films, radio, recorded music, or television. Digital media comprises bot ...
. Created by a
pseudonymous A pseudonym (; ) or alias () is a fictitious name that a person assumes for a particular purpose, which differs from their original or true meaning (orthonym). This also differs from a new name that entirely or legally replaces an individual's ow ...
artificial intelligence researcher known as 15, who began developing the technology as a
freshman A freshman, fresher, first year, or colloquially frosh, is a person in the first year at an educational institution, usually a secondary school or at the college and university level, but also in other forms of post-secondary educational in ...
during their undergraduate research at the
Massachusetts Institute of Technology The Massachusetts Institute of Technology (MIT) is a Private university, private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of moder ...
, the application allowed users to make characters from
video games A video game or computer game is an electronic game that involves interaction with a user interface or input device (such as a joystick, game controller, controller, computer keyboard, keyboard, or motion sensing device) to generate visual fe ...
,
television shows A television show, TV program (), or simply a TV show, is the general reference to any content produced for viewing on a television set that is broadcast via over-the-air, satellite, and cable, or distributed digitally on streaming platfo ...
, and
movies A film, also known as a movie or motion picture, is a work of visual art that simulates experiences and otherwise communicates ideas, stories, perceptions, emotions, or atmosphere through the use of moving images that are generally, since ...
speak custom text with emotional inflections faster than real-time. The platform was notable for its ability to generate convincing voice output using minimal training data—the name "15.ai" referenced the creator's claim that a voice could be cloned with just 15 seconds of audio, in contrast to contemporary deep learning speech models which typically required tens of hours of audio data. It was an early example of an application of
generative artificial intelligence Generative artificial intelligence (Generative AI, GenAI, or GAI) is a subfield of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models Machine learning, learn the underlyin ...
during the initial stages of the
AI boom The AI boom is an ongoing period of rapid Progress in artificial intelligence, progress in the field of artificial intelligence (AI) that started in the late 2010s before gaining international prominence in the early 2020s. Examples include lar ...
. Launched in March 2020, 15.ai gained widespread attention in early 2021 when content utilizing it went
viral The word ''Viral'' means "relating to viruses" (small infectious agents). It may also refer to: Viral behavior, or virality Memetic behavior likened that of a virus, for example: * Viral marketing, the use of existing social networks to spre ...
on social media platforms like
YouTube YouTube is an American social media and online video sharing platform owned by Google. YouTube was founded on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim who were three former employees of PayPal. Headquartered in ...
and
Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
, and quickly became popular among Internet fandoms, such as the '' My Little Pony: Friendship Is Magic'', ''
Team Fortress 2 ''Team Fortress 2'' (''TF2'') is a Multiplayer video game, multiplayer first-person shooter game developed and published by Valve Corporation in 2007. It is the sequel to the 1996 ''Team Fortress'' Mod (video gaming), mod for ''Quake (video g ...
'', and ''
SpongeBob SquarePants ''SpongeBob SquarePants'' is an American animated television series, animated comedy television series created by marine science educator and animator Stephen Hillenburg for Nickelodeon. It first aired as a sneak peek after the 1999 Kids' C ...
'' fandoms. The service distinguished itself through its support for emotional context in speech generation through
emoji An emoji ( ; plural emoji or emojis; , ) is a pictogram, logogram, ideogram, or smiley embedded in text and used in electronic messages and web pages. The primary function of modern emoji is to fill in emotional cues otherwise missing from type ...
s, precise pronunciation control through
phonetic transcription Phonetic transcription (also known as Phonetic script or Phonetic notation) is the visual representation of speech sounds (or ''phonetics'') by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the ...
s, and multi-speaker capabilities that allowed a single model to generate diverse character voices. 15.ai is credited as the first mainstream platform to popularize AI voice cloning (
audio deepfake Audio deepfake technology, also referred to as voice cloning or deepfake audio, is an application of artificial intelligence designed to generate speech that convincingly mimics specific individuals, often speech synthesis, synthesizing phrases or ...
s) in
memes A meme (; ) is an idea, behavior, or style that spreads by means of imitation from person to person within a culture and often carries symbolic meaning representing a particular phenomenon or theme. A meme acts as a unit for carrying cultural ...
and
content creation Content creation or content creative is the act of producing and sharing information or media content for specific audiences, particularly in digital contexts. According to '' Dictionary.com'', content refers to "something that is to be expresse ...
. * : "However, back then if you wanted to create your own dialogue, it required layers of sound enhancements and tweaks. Thankfully, the world has evolved and now thanks to the 15.ai app, we can make ..popular characters say whatever we want" * : 大家是否都曾經想像過,假如能讓自己喜歡的遊戲或是動畫角色說出自己想聽的話,不論是名字、惡搞或是經典名言,都是不少人的夢想吧。不過來到 2021 年,現在這種夢想不再是想想而已,因為有一個網站通過 AI 生成的技術,( Have you ever imagined what it would be like if your favorite game or anime characters could say exactly what you want to hear? Whether it's names, parodies, or classic quotes, this is a dream for many. However, as we enter 2021, this dream is no longer just a fantasy, because there is a website that uses AI-generated technology,). * : "While AI voice memes have been around in some form since '15.ai' launched in 2020, .. * : "AI voice tools used to create "audio deepfakes" have existed for years in one form or another, with 15.ai being a notable example." * : "It gained popularity because it was the first AI voice platform that featured an assortment of fictional characters from a variety of media sources" * : "During this period, 15.ai earned credit for single-handedly popularizing AI voice cloning—often described as 'audio deepfakes'—in memes, viral content, and fan-driven media." * : "Many credit 15.ai as the first mainstream text-to-speech platform that truly made 'audio deepfakes' go viral,"
Voice actor Voice acting is the art of Acting, performing a character or providing information to an audience with one's voice. Performers are often called voice actors/actresses in addition to other names. Examples of voice work include animation, animated, ...
s and industry professionals debated 15.ai's merits for fan creativity versus its potential impact on the profession. While many critics praised the application's accessibility and emotional control, they also noted technical limitations in areas like prosody options and non-English language support. 15.ai prompted discussions about ethical implications, including concerns about reduction of employment opportunities for voice actors, voice-related fraud, and misuse in explicit content. In January 2022, Voiceverse generated controversy when it was discovered that the company had generated audio using 15.ai without attribution and sold it as a
non-fungible token A non-fungible token (NFT) is a unique digital identifier that is recorded on a blockchain and is used to certify ownership and authenticity. It cannot be copied, substituted, or subdivided. The ownership of an NFT is recorded in the blockchai ...
(NFT) without permission. News publications universally characterized this incident as Voiceverse having "stolen" voice lines from 15.ai. The service was ultimately taken offline in September 2022 due to legal issues surrounding
artificial intelligence and copyright In the 2020s, the rapid advancement of deep learning-based generative artificial intelligence models raised questions about whether copyright infringement occurs when such are trained or used. This includes text-to-image models such as Stable Dif ...
. Its shutdown was followed by the emergence of various commercial alternatives in subsequent years, with their founders acknowledging 15.ai's pioneering influence in the field of
deep learning speech synthesis Deep learning speech synthesis refers to the application of deep learning models to generate natural-sounding human speech from written text (text-to-speech) or spectrum (vocoder). Deep neural networks are trained using large amounts of recorded s ...
. On May 18, 2025, 15 launched 15.dev, a
sequel A sequel is a work of literature, film, theatre, television, music, or video game that continues the story of, or expands upon, some earlier work. In the common context of a narrative work of fiction, a sequel portrays events set in the same ...
to the original service that launched after nearly three years of inactivity.


History


Background

The field of artificial
speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...
underwent a significant transformation with the introduction of
deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
approaches. In 2016,
DeepMind DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Go ...
's publication of the seminal paper ''
WaveNet WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices ...
: A Generative Model for Raw Audio'' marked a pivotal shift toward
neural network A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...
-based speech synthesis, demonstrating unprecedented audio quality through causal
convolutional neural network A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different ty ...
s. Previously,
concatenative synthesis Concatenative synthesis is a technique for synthesising sounds by concatenating short samples of recorded sound (called ''units''). The duration of the units is not strictly defined and may vary according to the implementation, roughly in the range ...
—which worked by stitching together pre-recorded segments of human speech—was the predominant method for generating artificial speech, but it often produced robotic-sounding results at the boundaries of sentences. Two years later, this was followed by
Google AI Google AI is a division of Google dedicated to artificial intelligence. It was announced at Google I/O 2017 by CEO Sundar Pichai. This division has expanded its reach with research facilities in various parts of the world such as Zurich, Pa ...
's Tacotron 2 in 2018, which demonstrated that neural networks could produce highly natural speech synthesis but required substantial training data—typically tens of hours of audio—to achieve acceptable quality. When trained on smaller datasets, such as 2 hours of speech, the output quality degraded while still being able to maintain intelligible speech, and with just 24 minutes of training data, Tacotron 2 failed to produce intelligible speech. The same year saw the emergence of HiFi-GAN, a
generative adversarial network A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June ...
(GAN)-based vocoder that improved the efficiency of waveform generation while producing high-fidelity speech, followed by Glow-TTS, which introduced a flow-based approach that allowed for both fast inference and voice style transfer capabilities. Chinese tech companies also made significant contributions to the field, with
Baidu Baidu, Inc. ( ; ) is a Chinese multinational technology company specializing in Internet services and artificial intelligence. It holds a dominant position in China's search engine market (via Baidu Search), and provides a wide variety of o ...
and
ByteDance ByteDance Ltd. is a Chinese internet technology company headquartered in Haidian, Beijing, and incorporated in the Cayman Islands. Founded by Zhang Yiming, Liang Rubo, and a team of others in 2012, ByteDance developed the video-sharing ap ...
developing proprietary text-to-speech frameworks that further advanced the technology, though specific technical details of their implementations remained largely undisclosed.


Development, release, and operation

15.ai was conceived in 2016 as a research project in
deep learning speech synthesis Deep learning speech synthesis refers to the application of deep learning models to generate natural-sounding human speech from written text (text-to-speech) or spectrum (vocoder). Deep neural networks are trained using large amounts of recorded s ...
by a developer known as ''"15"'' (at the age of 18) during their
freshman A freshman, fresher, first year, or colloquially frosh, is a person in the first year at an educational institution, usually a secondary school or at the college and university level, but also in other forms of post-secondary educational in ...
year at the
Massachusetts Institute of Technology The Massachusetts Institute of Technology (MIT) is a Private university, private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of moder ...
(MIT) as part of its
Undergraduate Research Opportunities Program An Undergraduate Research Opportunities Program provides funding and/or credit to undergraduate students who volunteer for faculty-mentored research projects pertaining to all academic disciplines. Participating universities Universities involved ...
(UROP). The developer was inspired by
DeepMind DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Go ...
's
WaveNet WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices ...
paper, with development continuing through their studies as
Google AI Google AI is a division of Google dedicated to artificial intelligence. It was announced at Google I/O 2017 by CEO Sundar Pichai. This division has expanded its reach with research facilities in various parts of the world such as Zurich, Pa ...
released Tacotron 2 the following year. By 2019, the developer had demonstrated at MIT their ability to replicate WaveNet and Tacotron 2's results using 75% less training data than previously required. The name ''15'' is a reference to the creator's claim that a voice can be cloned with as little as 15 seconds of data. The developer had originally planned to pursue a
doctorate A doctorate (from Latin ''doctor'', meaning "teacher") or doctoral degree is a postgraduate academic degree awarded by universities and some other educational institutions, derived from the ancient formalism '' licentia docendi'' ("licence to teach ...
based on their undergraduate research, but opted to work in the
tech industry Tech or The Tech may refer to: * An abbreviation of technology or technician *Tech Dinghy, an American sailing dinghy developed at MIT *Tech (mascot), the mascot of Louisiana Tech University, U.S. * Tech (river), in southern France * "Tech" (''S ...
instead after their
startup A startup or start-up is a company or project undertaken by an entrepreneur to seek, develop, and validate a scalable business model. While entrepreneurship includes all new businesses including self-employment and businesses that do not intend to ...
was accepted into the
Y Combinator Y Combinator, LLC (YC) is an American technology startup accelerator and venture capital firm launched in March 2005 which has been used to launch more than 5,000 companies. The accelerator program started in Boston and Mountain View, Californi ...
accelerator in 2019. After their departure in early 2020, the developer returned to their voice synthesis research, implementing it as a
web application A web application (or web app) is application software that is created with web technologies and runs via a web browser. Web applications emerged during the late 1990s and allowed for the server to dynamically build a response to the request, ...
. According to a 2024 post on X from the developer, instead of using conventional voice datasets like LJSpeech that contained simple, monotone recordings, they sought out more challenging voice samples that could demonstrate the model's ability to handle complex speech patterns and emotional undertones. The Pony Preservation Project—a fan initiative originating from /mlp/,
4chan 4chan is an anonymous English-language imageboard website. Launched by Christopher "moot" Poole in October 2003, the site hosts boards dedicated to a wide variety of topics, from video games and television to literature, cooking, weapons, mu ...
's ''
My Little Pony ''My Little Pony'' (''MLP'') is a toy line and media franchise developed by American toy company Hasbro. The first toys were developed by Bonnie Zacherle, Charles Muenchinger, and Steve D'Aguanno, and were produced in 1981. The ponies feature ...
'' board, that had compiled voice clips from '' My Little Pony: Friendship Is Magic''—played a crucial role in the implementation. The project's contributors had manually trimmed, denoised, transcribed, and emotion-tagged every line from the show. This dataset provided ideal training material for 15.ai's deep learning model. 15.ai was released in March 2020 with a limited selection of characters, including those from '' My Little Pony: Friendship Is Magic'' and ''
Team Fortress 2 ''Team Fortress 2'' (''TF2'') is a Multiplayer video game, multiplayer first-person shooter game developed and published by Valve Corporation in 2007. It is the sequel to the 1996 ''Team Fortress'' Mod (video gaming), mod for ''Quake (video g ...
''. The system was designed to function efficiently with limited
training data In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from ...
—requiring only minutes of clean audio per character, in contrast to the 40+ hours typically needed by traditional deep learning models. To overcome data constraints, the developer employed specific
data augmentation Data augmentation is a statistical technique which allows maximum likelihood estimation from incomplete data. Data augmentation has important applications in Bayesian analysis, and the technique is widely used in machine learning to reduce overfi ...
techniques to improve generalization, including deliberate introduction of spelling variations, punctuation patterns, and pronunciation distortions during training. Upon its launch, 15.ai was offered as a free * : "But thanks to a free online machine-learning tool," * : "Благодаря бесплатному онлайн-инструменту, основанному на машинном обучении" ( "A free online tool powered by machine learning") * : "It's 15.ai, a free website anyone can use." * : "该网站的访问量为在线任务差不多5000以上,而且目前完全免费" ( "The number of visits to the website is more than 5,000 online, but it is currently completely free.") * : "15.ai, a free web app specializing in text-to-speech and AI-voice generation" * : "the popular free-to-use text-to-speech service, 15.ai," * : "15.ai, a free-to-use AI that can accurately clone voices," and
non-commercial A non-commercial (also spelled noncommercial) activity is an activity that is not carried out in the interest of Profit (economics), profit. The opposite is Commerce, commercial, something that primarily serves profit interests and is focused on bu ...
* : " ..15.ai – a non-commercial text-to-speech service." * : " ..15.ai, a non-commercial text-to-speech service." * : "When using the 15.ai project, a warning comes up that states the project is not for commercial use" service that did not require user registration or user accounts to operate, * : "Simplemente tienes que entrar, seleccionar a GlaDOS entre la lista de personajes disponibles," ( "You just have to enter, select GlaDOS from the list of available characters,") * : "Para usar o programa é bem simples, acesse-o CLICANDO AQUI, e nas barras de opções é só escolher o game/desenho/seriado de onde quer a voz, escolher o personagem e digitar." ( Using the program is very simple. Access it by CLICKING HERE, and in the options bars, just choose the game/cartoon/series you want the voice from, choose the character and type.) and required the user to accept the terms of use before proceeding. Users were permitted to create any content with the synthesized voices under two specific conditions: they must properly credit 15.ai by including the website URL in any posts, videos, or projects using the generated audio; and they were prohibited from mixing 15.ai outputs with other text-to-speech outputs in the same work, to prevent misrepresentation of the technology's capabilities. More voices were added to the website in the following months. A significant technical advancement came in late 2020 with the implementation of a multi-speaker
embedding In mathematics, an embedding (or imbedding) is one instance of some mathematical structure contained within another instance, such as a group (mathematics), group that is a subgroup. When some object X is said to be embedded in another object Y ...
in the deep neural network, enabling simultaneous training of multiple voices rather than requiring individual models for each character voice. This not only allowed rapid expansion from eight to over fifty character voices, but also let the model recognize common emotional patterns across characters, even when certain emotions were missing from some characters' training data. By May 2020, the site had served over 4.2 million audio files to users. In early 2021, the application gained popularity after skits, memes, and fan content created using 15.ai went viral on
Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
,
TikTok TikTok, known in mainland China and Hong Kong as Douyin (), is a social media and Short-form content, short-form online video platform owned by Chinese Internet company ByteDance. It hosts user-submitted videos, which may range in duration f ...
,
Reddit Reddit ( ) is an American Proprietary software, proprietary social news news aggregator, aggregation and Internet forum, forum Social media, social media platform. Registered users (commonly referred to as "redditors") submit content to the ...
,
Twitch Twitch may refer to: Biology * Muscle contraction ** Convulsion, rapid and repeated muscle contraction and relaxation ** Fasciculation, a small, local, involuntary muscle contraction ** Myoclonic twitch, a jerk usually caused by sudden muscle c ...
,
Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
, and
YouTube YouTube is an American social media and online video sharing platform owned by Google. YouTube was founded on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim who were three former employees of PayPal. Headquartered in ...
. At its peak, the platform incurred operational costs of per month from AWS infrastructure needed to handle millions of daily voice generations; despite receiving offers from companies to
acquire ''Acquire'' is a board game published by 3M in 1964 that involves multi-player mergers and acquisitions. It was one of the most popular games in the 3M Bookshelf games series published in the 1960s, and the only one still published in the Uni ...
15.ai and its underlying technology, the website remained independent and was funded out of the personal previous startup earnings of the developer—then aged 23 at the time.


Voiceverse NFT controversy

On January 14, 2022, a controversy ensued after it was discovered that Voiceverse NFT had taken credit for voice lines generated from 15.ai without permission * : "it was revealed that oiceversehad stolen voice work it’d been using." * : "Voiceverse NFT ..admitted to using content without permission from 15.ai" * : "Voiceverse NFT was caught having taken voice lines from ..15.ai" * : "Voiceverse NFT had taken voice lines from 5.aiwithout giving credit" * : "Voiceverse has admitted that they stole voice lines" * : "NFT firm Voiceverse admits it stole work" * : "että yhtiö käytti luvatta kilpailijan ääninäyttelyä" ( " oiceverseused 5.ai'svoice acting without permission") * : "Voiceverse NFT Uses Stolen Technology!" * : "Voiceverse har nu indrømmet ..at de har stjålet" ( "Voiceverse has now admitted ..that they stole") * : "Troy Baker-backed NFT firm admitted using voice lines from 5.aiwithout permission" * : "Voiceverse had stolen work without crediting it from ..15.ai" * : "компанию уличили в воровстве в тот же день, когда актёр объявил о сотрудничестве" ( the company was caught stealing on the same day the actor announced his partnership) * : "Вскоре в тот же день Voiceverse NFT уличили в воровстве." ( "Shortly after that same day, Voiceverse NFT was caught stealing.") * : "Voiceverse NFT Service Reportedly Uses Stolen Technology from 15ai" * : "la firma de NFTs ya mencionada estaría intentando sacarle partido al comercializar una muestra ..sin el permiso de su autor" ( "the aforementioned NFT company would be trying to take advantage of it by marketing a sample ..without the permission of its author.") and sold them as NFTs (
non-fungible tokens In economics and law, fungibility is the property of something whose individual units are considered fundamentally interchangeable with each other. For example, the fungibility of money means that a $100 bill (note) is considered entirely equi ...
). * : "audio sold as an NFT on Voiceverse’s platform was acknowledged by the company for having been created by 15.ai" * : "Voiceverse har nu indrømmet, efter en masse beskyldninger, at de har stjålet, og solgt, AI-baseret stemmeskuespil som NFT'er baseret på en stemme opfundet og designet af en tjeneste ved navn 15.ai." ( "Voiceverse has now admitted, after a lot of accusations, that they have stolen, and sold, AI-based voice acting as NFTs based on a voice invented and designed by a service called 15.ai.") * : "VoiceverseNFT previously admitted to selling voice content stolen from fifteenAI" * : "Indeed, log files apparently showed Voiceverse NFT had used 15.ai for an AI-powered voice to be sold as an NFT." * : "Его работу взяли и продавали как уникальный токен." ( " 5.ai'swork was taken and sold as a unique token.") * This came shortly after 15.ai's developer had explicitly stated in December 2021 that they had no interest in incorporating NFTs into their work.
Log files In computing, logging is the act of keeping a log of events that occur in a computer system, such as problems, errors or broad information on current operations. These events may occur in the operating system or in other software. A message or ' ...
showed that Voiceverse had generated audio of characters from '' My Little Pony: Friendship Is Magic'' using 15.ai, pitched them up to make them sound unrecognizable from the original voices to market their own platform—in violation of 15.ai's terms of service which explicitly prohibited commercial use and required proper attribution. Voiceverse initially claimed their platform would allow NFT owners to possess commercial rights to AI-generated voices for content creation, in-game chats, and video calls. When confronted with evidence of their misappropriation, Voiceverse claimed that someone in their marketing team used the voice without properly crediting 15.ai and explained in their
Discord Discord is an instant messaging and Voice over IP, VoIP social platform which allows communication through Voice over IP, voice calls, Videotelephony, video calls, text messaging, and digital media, media. Communication can be private or take ...
server that their marketing team had been in such a rush to create a partnership demo that they used 15.ai without waiting for their own voice technology to be ready. The controversial tweet was deleted thereafter. In response to their apology, 15 tweeted "Go fuck yourself," which went viral, amassing hundreds of thousands of retweets and likes on
Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
in support of the developer. 15 later expressed deeper frustration, writing: "the entire field of vocal synthesis is now being misrepresented by charlatans who are only in it for the money." Following continued backlash and the plagiarism revelation, voice actor
Troy Baker Troy Edward Baker (born April 1, 1976) is an American voice actor and musician. He is known for his numerous roles in video games, including Yuri Lowell in ''Tales of Vesperia'' (2008), Joel Miller in ''The Last of Us'' franchise, Booker DeWi ...
(who had partnered with Voiceverse) faced criticism for supporting an NFT project and his confrontational announcement tone. Baker had described Voiceverse's service as allowing people to "create customized audiobooks, YouTube videos, e-learning lectures, or even podcasts with your favorite voice all without the hassle of additional legal work," which critics noted raised concerns about potentially replacing professional voice actors with AI. Baker subsequently acknowledged that his original announcement tweet ending with "You can hate. Or you can create. What'll it be?" may have been "antagonistic," and on January 31, announced he would discontinue his partnership with Voiceverse. The event raised concerns about NFT projects, which critics observed were frequently associated with
intellectual property Intellectual property (IP) is a category of property that includes intangible creations of the human intellect. There are many types of intellectual property, and some countries recognize more than others. The best-known types are patents, co ...
theft and questionable business practices. The incident was later documented in the AI Incident Database (AIID), cataloging it as an example of "an AI-synthetic audio sold as an NFT on Voiceverse's platform
hat A hat is a Headgear, head covering which is worn for various reasons, including protection against weather conditions, ceremonial reasons such as university graduation, religious reasons, safety, or as a fashion accessory. Hats which incorpor ...
was acknowledged by the company for having been created by 15.ai, a free web app specializing in text-to-speech and AI-voice generation, and reused without proper attribution." The controversy was also featured in writer and crypto skeptic Molly White's '' Web3 Is Going Just Great'' project, which documented how Baker's partnership announcement and its antagonistic tone exacerbated negative reactions to the NFT initiative. White noted the vague nature of Voiceverse's offering, described only as "provid ngyou an ownership to a unique voice in the
Metaverse The metaverse is a loosely defined term referring to virtual worlds in which users represented by avatars interact, usually in 3D and focused on social and economic connection. The term ''metaverse'' originated in the 1992 science fiction ...
," and noted how the revelation of stolen work from 15.ai further damaged Voiceverse's credibility. Russian educational platform '' Skillbox'' listed the incident as an example of fraud in NFTs. Voice actor and YouTuber Yong Yea criticized voice NFTs for its potential impact on the voice acting industry, and stated in a follow-up YouTube video:
"This isn't just one of those things oiceversecan go 'Whoopsies!' on.
hey Hey, HEY, or Hey! may refer to: Music * Hey (band), a Polish rock band Albums * ''Hey'' (Andreas Bourani album) or the title song (see below), 2014 * ''Hey!'' (Julio Iglesias album) or the title song, 1980 * ''Hey!'' (Jullie album) or the ...
plagiarized somebody else's work and used that as a means to falsely market the quality of
heir Inheritance is the practice of receiving private property, titles, debts, entitlements, privileges, rights, and obligations upon the death of an individual. The rules of inheritance differ among societies and have changed over time. Offi ...
own products, by using somebody else's higher quality voice AI to promote oiceversefor
heir Inheritance is the practice of receiving private property, titles, debts, entitlements, privileges, rights, and obligations upon the death of an individual. The rules of inheritance differ among societies and have changed over time. Offi ...
own benefit."
In a 2024
class action lawsuit A class action A class action is a form of lawsuit. Class Action may also refer to: * ''Class Action'' (film), 1991, starring Gene Hackman and Mary Elizabeth Mastrantonio * Class Action (band), a garage house band * "Class Action" (''Teenage R ...
filed against LOVO, Inc., court documents alleged that the founders of LOVO also created Voiceverse, with plaintiffs claiming that Voiceverse had "already been found to have stolen technology from 5.ai.


Inactivity

In September 2022, 15.ai was taken offline due to legal issues surrounding
artificial intelligence and copyright In the 2020s, the rapid advancement of deep learning-based generative artificial intelligence models raised questions about whether copyright infringement occurs when such are trained or used. This includes text-to-image models such as Stable Dif ...
. In a post on
Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
, 15 suggested a potential future version that would better address copyright concerns from the outset.


Revival

On May 18, 2025, 15 launched 15.dev as the official
sequel A sequel is a work of literature, film, theatre, television, music, or video game that continues the story of, or expands upon, some earlier work. In the common context of a narrative work of fiction, a sequel portrays events set in the same ...
to 15.ai. Fandom news site ''
Equestria Daily Equestria Daily (frequently shortened to EqD or ED) is a fan site dedicated to news and fan work coverage of the animated television series '' My Little Pony: Friendship Is Magic,'' and other generations, such as G5. The site is run with a b ...
'' reported that "almost every voiced pony in the show seems available from varying levels of quality" and noted that the website included "a dropdown for various emotions you want to generate."


Features

The platform was
non-commercial A non-commercial (also spelled noncommercial) activity is an activity that is not carried out in the interest of Profit (economics), profit. The opposite is Commerce, commercial, something that primarily serves profit interests and is focused on bu ...
, had no
advertisement Advertising is the practice and techniques employed to bring attention to a Product (business), product or Service (economics), service. Advertising aims to present a product or service in terms of utility, advantages, and qualities of int ...
s, generated no
revenue In accounting, revenue is the total amount of income generated by the sale of product (business), goods and services related to the primary operations of a business. Commercial revenue may also be referred to as sales or as turnover. Some compan ...
, and operated without requiring user registration or accounts. Users generated speech by inputting text and selecting a character voice, with optional parameters for emotional contextualizers and phonetic transcriptions. Each request produced three audio variations with distinct emotional deliveries sorted by
confidence Confidence is the feeling of belief or trust that a person or thing is reliable. * * * Self-confidence is trust in oneself. Self-confidence involves a positive belief that one can generally accomplish what one wishes to do in the future. Sel ...
score. Characters available included multiple characters from ''
Team Fortress 2 ''Team Fortress 2'' (''TF2'') is a Multiplayer video game, multiplayer first-person shooter game developed and published by Valve Corporation in 2007. It is the sequel to the 1996 ''Team Fortress'' Mod (video gaming), mod for ''Quake (video g ...
'' and '' My Little Pony: Friendship Is Magic'', including
Twilight Sparkle Princess Twilight Sparkle is a fictional character who appears in the fourth incarnation of Hasbro's ''My Little Pony'' toyline and media franchise, beginning with '' My Little Pony: Friendship Is Magic'' (2010–2019), and later in the fra ...
;
GLaDOS GLaDOS (Genetic Lifeform and Disk Operating System) is a fictional character from the video game series '' Portal''. The character was created by Erik Wolpaw and Kim Swift, and voiced by Ellen McLain. GLaDOS is depicted in the series as an ar ...
, Wheatley, and the Sentry Turret from the ''
Portal Portal may refer to: Arts and entertainment Gaming * ''Portal'' (series), a series of video games developed by Valve ** ''Portal'' (video game), a 2007 video game, the first in the series ** '' Portal 2'', the 2011 sequel ** '' Portal Stori ...
'' series;
SpongeBob SquarePants ''SpongeBob SquarePants'' is an American animated television series, animated comedy television series created by marine science educator and animator Stephen Hillenburg for Nickelodeon. It first aired as a sneak peek after the 1999 Kids' C ...
; Kyu Sugardust from ''
HuniePop ''HuniePop'' (, "HoneyPop") is a 2015 tile-matching and dating sim adult video game released for Microsoft Windows, macOS, and Linux-based personal computers. The game follows the dating adventures of the main character as they try to woo seve ...
'',
Rise Kujikawa The plot of Atlus's 2008 role-playing video game ''Persona 4'' is centered on a group of high-school students dedicated to capturing the culprit responsible for the murders and kidnappings that happened in their small town of Inaba starting on ...
from ''
Persona 4 is a 2008 role-playing video game by Atlus. It is chronologically the fifth installment in the ''Persona (series), Persona'' series, itself a part of the larger ''Megami Tensei'' franchise, and was released for the PlayStation 2 in Japan in Ju ...
'';
Daria Morgendorffer Daria Morgendorffer is a fictional character and the eponymous main protagonist of the MTV adult animated series ''Daria'', which originally aired from March 1997 to January 2002. She was initially designed and created by ''Beavis and Butt-Head'' ...
and Jane Lane from
Daria ''Daria'' is an American adult animation, adult animated sitcom television series created by Glenn Eichler and Susie Lewis, Susie Lewis Lynn. The series ran from March 3, 1997, to January 21, 2002, on MTV. It centers on the titular character, D ...
; Carl Brutananadilewski from ''
Aqua Teen Hunger Force ''Aqua Teen Hunger Force'' (also branded with different #Alternative titles, alternative titles for seasons 8–11), is an American adult animated television series created by Dave Willis and Matt Maiellaro for Cartoon Network's late night progra ...
'';
Steven Universe ''Steven Universe'' is an American animated television series created by Rebecca Sugar for Cartoon Network. It tells the coming-of-age story of a young boy, Steven Universe (character), Steven Universe (Zach Callison), who lives with the Crys ...
from ''
Steven Universe ''Steven Universe'' is an American animated television series created by Rebecca Sugar for Cartoon Network. It tells the coming-of-age story of a young boy, Steven Universe (character), Steven Universe (Zach Callison), who lives with the Crys ...
''; Sans from ''
Undertale ''Undertale'' is a 2015 role-playing video game created by American indie developer Toby Fox. The player controls a child who has fallen into the Underground: a large, secluded region under the surface of the Earth, separated by a magical b ...
'';
Madeline ''Madeline'' is a media franchise that originated as a series of children's books written and illustrated by Ludwig Bemelmans. The books have been adapted into numerous formats, spawning telefilms, television series and a live action feature fi ...
and multiple characters from '' Celeste''; the Tenth Doctor Who; the Narrator from ''The Stanley Parable''; and
HAL 9000 HAL 9000 (or simply HAL or Hal) is a fictional artificial intelligence character and the main antagonist in the '' Space Odyssey'' series. First appearing in the 1968 film '' 2001: A Space Odyssey'', HAL ( Heuristically Programmed Algorithmic C ...
from '' 2001: A Space Odyssey''. Out of the over fifty voices available, thirty were of characters from '' My Little Pony: Friendship Is Magic''. Certain "silent" characters like Chell and
Gordon Freeman Gordon Freeman is the silent protagonist of the ''Half-Life'' video game series, created by Gabe Newell and designed by Marc Laidlaw of Valve. His first appearance is in ''Half-Life'' (1998). Gordon is depicted as a bespectacled white man f ...
were able to be selected as a joke, and would emit silent audio files when any text was submitted. Characters from ''Undertale'' and ''Celeste'' did not produce spoken words but instead generated their games' distinctive beeps when text was entered. 15.ai generated audio at
44.1 kHz In digital audio, 44,100 Hz (alternately represented as 44.1 kHz) is a common sampling frequency. Analog audio is often recorded by sampling it 44,100 times per second, and then these samples are used to Signal reconstruction, reconstru ...
sampling rate In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples". A sample is a value of the signal at a point in time and/or s ...
—higher than the 16 kHz standard used by most deep learning text-to-speech systems of that period. This higher fidelity created more detailed audio
spectrograms A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data are represent ...
and greater audio resolution, though it also made any synthesis imperfections more noticeable. Users reported using Audacity to downsample any generated audio in order to mask apparent robotic artifacts, though this came at the cost of lower audio quality. The system processed speech ''faster-than-real-time'' using customized deep neural networks combined with specialized audio synthesis algorithms. While the underlying technology could produce 10 seconds of audio in less than 10 seconds of processing time (hence, ''faster-than-real-time''), the actual user experience often involved longer waits as the servers managed thousands of simultaneous requests, sometimes taking more than a minute to deliver results. The deep learning model's nondeterministic properties produced variations in speech output, creating different intonations with each generation, similar to how voice actors produce different takes. 15.ai introduced the concept of emotional contextualizers, which allowed users to specify the emotional tone of generated speech through guiding phrases. The emotional contextualizer functionality utilized DeepMoji, a sentiment analysis neural network developed at the
MIT Media Lab The MIT Media Lab is a research laboratory at the Massachusetts Institute of Technology, growing out of MIT's Architecture Machine Group in the MIT School of Architecture and Planning, School of Architecture. Its research does not restrict to fi ...
. Introduced in 2017, DeepMoji processed
emoji An emoji ( ; plural emoji or emojis; , ) is a pictogram, logogram, ideogram, or smiley embedded in text and used in electronic messages and web pages. The primary function of modern emoji is to fill in emotional cues otherwise missing from type ...
embeddings from 1.2 billion Twitter posts (from 2013 to 2017) to analyze emotional content. If an input into 15.ai contained additional context (specified by a vertical bar), the additional context following the bar would be used as the emotional contextualizer. For example, if the input was Today is a great day!, I'm very sad., the selected character would speak the sentence "Today is a great day!" in the emotion one would expect from someone saying the sentence "I'm very sad." Certain characters, such as
Twilight Sparkle Princess Twilight Sparkle is a fictional character who appears in the fourth incarnation of Hasbro's ''My Little Pony'' toyline and media franchise, beginning with '' My Little Pony: Friendship Is Magic'' (2010–2019), and later in the fra ...
from ''My Little Pony: Friendship Is Magic'', offered preset emotional modes, who had specific options to output text in different emotional states such as "happy". The application used pronunciation data from Oxford Dictionaries API,
Wiktionary Wiktionary (, ; , ; rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a number o ...
, and
CMU Pronouncing Dictionary The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research. CMUdict provides a mapping orthogra ...
, the last of which is based on ARPABET, a set of English phonetic transcriptions originally developed by the
Advanced Research Projects Agency The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Adva ...
in the 1970s. For modern and Internet-specific terminology, the system incorporated pronunciation data from
user-generated content User-generated content (UGC), alternatively known as user-created content (UCC), emerged from the rise of web services which allow a system's User (computing), users to create Content (media), content, such as images, videos, audio, text, testi ...
websites, including
Reddit Reddit ( ) is an American Proprietary software, proprietary social news news aggregator, aggregation and Internet forum, forum Social media, social media platform. Registered users (commonly referred to as "redditors") submit content to the ...
,
Urban Dictionary ''Urban Dictionary'' is a crowdsourced English-language online dictionary for slang words and phrases. The website was founded in 1999 by Aaron Peckham. Originally, ''Urban Dictionary'' was intended as a dictionary of slang or cultural word ...
,
4chan 4chan is an anonymous English-language imageboard website. Launched by Christopher "moot" Poole in October 2003, the site hosts boards dedicated to a wide variety of topics, from video games and television to literature, cooking, weapons, mu ...
, and
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
. Inputting ARPABET transcriptions was also supported, allowing users to correct mispronunciations or specify the desired pronunciation between heteronyms—words that have the same spelling but have different pronunciations. Users could invoke ARPABET transcriptions by enclosing the phoneme string in curly braces within the input box (for example, to specify the pronunciation of the word "ARPABET" ( ). The interface displayed parsed words with color-coding to indicate pronunciation certainty: green for words found in the existing pronunciation lookup table, blue for manually entered ARPABET pronunciations, and red for words where the pronunciation had to be algorithmically predicted. Later versions of 15.ai introduced multi-speaker capabilities. Rather than training separate models for each voice, 15.ai used a unified model that learned multiple voices simultaneously through speaker embeddings–learned numerical representations that captured each character's unique vocal characteristics. Along with the emotional context conferred by DeepMoji, this neural network architecture enabled the model to learn shared patterns across different characters' emotional expressions and speaking styles, even when individual characters lacked examples of certain emotional contexts in their training data. The platform limited text input to 200 characters per generation, though users could create multiple clips for longer speech sequences. The interface included technical metrics and graphs, which served to highlight the research aspect of the website. The name of the underlying
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
used by 15.ai was dubbed ''DeepThroat''. As of version v23 of 15.ai, the interface displayed comprehensive model analysis information, including word parsing results and emotional analysis data. The
flow Flow may refer to: Science and technology * Fluid flow, the motion of a gas or liquid * Flow (geomorphology), a type of mass wasting or slope movement in geomorphology * Flow (mathematics), a group action of the real numbers on a set * Flow (psyc ...
and
generative adversarial network A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June ...
(GAN) hybrid
vocoder A vocoder (, a portmanteau of ''vo''ice and en''coder'') is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation. The vocoder wa ...
and denoiser, introduced in an earlier version, was streamlined to remove manual parameter inputs.


Reception


Critical reception

Critics described 15.ai as easy to use and generally able to convincingly replicate character voices, with occasional mixed results. Natalie Clayton of ''
PC Gamer ''PC Gamer'' is a magazine and website founded in the United Kingdom in 1993 devoted to PC gaming and published monthly by Future plc. The magazine has several regional editions, with the UK and US editions becoming the best selling PC games m ...
'' wrote that
SpongeBob SquarePants ''SpongeBob SquarePants'' is an American animated television series, animated comedy television series created by marine science educator and animator Stephen Hillenburg for Nickelodeon. It first aired as a sneak peek after the 1999 Kids' C ...
' voice was replicated well, but noted challenges in mimicking the
Narrator Narration is the use of a written or spoken commentary to convey a story to an audience. Narration is conveyed by a narrator: a specific person, or unspecified literary voice, developed by the creator of the story to deliver information to the ...
from the ''
The Stanley Parable ''The Stanley Parable'' is a 2013 story-based video game designed and written by developers Davey Wreden and William Pugh (game designer), William Pugh. The game carries themes such as choice in video games, the relationship between a game cre ...
'': "the algorithm simply can't capture Kevan Brighting's whimsically droll intonation." Similarly, Russian gaming website ''Rampaga'' reflected that GLaDOS performed exceptionally well since "her voice was originally created to simulate human speech by artificial intelligence," while the Narrator from ''The Stanley Parable'' was less convincing due to insufficient training data. Zack Zwiezen of ''
Kotaku ''Kotaku'' is a video game website and blog that was originally launched in 2004 as part of the Gawker Media network. Notable former contributors to the site include Luke Smith, Cecilia D'Anastasio, Tim Rogers, and Jason Schreier. History ...
'' reported that " isgirlfriend was convinced it was a new voice line from GLaDOS' voice actor". Calvin Rugona of gaming news publication ''Gamezo'' commented that the tool's simplicity contributed significantly to its widespread adoption, as it allowed anyone online to easily create and save voice clips. Taiwanese newspaper ''
United Daily News ''United Daily News'' (UDN; ) is a newspaper published in Taiwan. It is considered to support the pan-Blue Coalition in its editorials. History UDN was founded in 1951 by Wang Tiwu as a merger of three newspapers, ''Popular Daily'' (全民日 ...
'' also highlighted 15.ai's ability to recreate GLaDOS's mechanical voice, alongside its diverse range of character voice options. ''
Yahoo! News Yahoo News (stylized as Yahoo! News) is a news website that originated as an internet-based news aggregator by Yahoo. The site was created by Yahoo software engineer Brad Clawsie in August 1996. Articles originally came from news services such ...
Taiwan'' reported that "GLaDOS in ''Portal'' can pronounce lines nearly perfectly", but also criticized that "there are still many imperfections, such as word limit and tone control, which are still a little weird in some words." Chris Button of AI newsletter ''Byteside'' called the ability to clone a voice with only 15 seconds of data "freaky," but also found the tech behind it impressive. Robin Lamorlette of French online magazine ''
Clubic ''Clubic'' is a French web site, which was owned by M6 Web until March 2018, and is now independent. Created in 2000, this webzine about computing and multimedia offers news, reviews and downloads of software applications, as well as community s ...
'' described the technology as "devilishly fun" and noted how Twitter and YouTube were filled with creative content from users experimenting with the tool. The platform's voice generation capabilities were regularly featured on ''
Equestria Daily Equestria Daily (frequently shortened to EqD or ED) is a fan site dedicated to news and fan work coverage of the animated television series '' My Little Pony: Friendship Is Magic,'' and other generations, such as G5. The site is run with a b ...
'', a fandom news site dedicated to the show '' My Little Pony: Friendship Is Magic'' and its other generations, with documented updates, fan creations, and additions of new character voices. In a post introducing new character additions to 15.ai, ''Equestria Dailys founder Shaun Scotellaro—also known by his online moniker "Sethisto"—wrote that "some of he voicesaren't great due to the lack of samples to draw from, but many are really impressive still anyway." Chinese ''My Little Pony'' fan site ''EquestriaCN'' also documented 15.ai's development, highlighting its various updates, though they criticized some of the bugs and the long queue wait times of the application. Multiple other critics also found the word count limit, prosody options, and English-only nature of the application as not entirely satisfactory. Peter Paltridge of
anime is a Traditional animation, hand-drawn and computer animation, computer-generated animation originating from Japan. Outside Japan and in English, ''anime'' refers specifically to animation produced in Japan. However, , in Japan and in Ja ...
and
superhero A superhero or superheroine is a fictional character who typically possesses ''superpowers'' or abilities beyond those of ordinary people, is frequently costumed concealing their identity, and fits the role of the hero, typically using their ...
news outlet ''Anime Superhero News'' opined that "voice synthesis has evolved to the point where the more expensive efforts are nearly indistinguishable from actual human speech," but also noted that "In some ways, SAM is still more advanced than this. It was possible to affect SAM's inflections by using special characters, as well as change his pitch at will. With 15.ai, you're at the mercy of whatever random inflections you get." Conversely, Lauren Morton of ''
Rock, Paper, Shotgun ''Rock Paper Shotgun'' is a British video game journalism website. It was launched in July 2007 to focus on PC games and was acquired by Gamer Network, a network of sites led by ''Eurogamer'', in May 2017. History ''Rock Paper Shotgun'' ...
'' praised the depth of pronunciation control—"if you're willing to get into the nitty gritty of it". Similarly, Eugenio Moto of Spanish news website ''Qore.com'' wrote that "the most experienced of users can change parameters like the stress or the tone." Takayuki Furushima of '' Den Fami Nico Gamer'' highlighted the "smooth pronunciations", and Yuki Kurosawa of ''
AUTOMATON An automaton (; : automata or automatons) is a relatively self-operating machine, or control mechanism designed to automatically follow a sequence of operations, or respond to predetermined instructions. Some automata, such as bellstrikers i ...
'' noted its "rich emotional expression" as a major feature; both Japanese authors noted the lack of Japanese-language support. Renan do Prado of Brazilian gaming news outlet ''Arkade'' and José Villalobos of Spanish gaming outlet ''LaPS4'' pointed out that while users could create amusing results in Portuguese and Spanish respectively, the generation performed best in English. Chinese gaming news outlet '' GamerSky'' called the app "interesting", but also criticized the word count limit of the text and the lack of intonations. Frank Park of South Korean video game outlet ''Zuntata'' wrote that "the surprising thing about 15.ai is that or some characters there's only about 30 seconds of data, but it achieves pronunciation accuracy close to 100%". Machine learning professor Yongqiang Li remarked in his blog that the application was still free despite having 5,000 people generating voices concurrently at the time of writing. Marco Cocomello of South African gaming and pop culture website ''GLITCHED'' remarked that despite the 200-character limitation, the results "blew imaway" when testing the app with GLaDOS's voice. Álvaro Ibáñez of Spanish technology publication '' Microsiervos'' wrote that he found the rhythm of the AI-generated voices noteworthy, observing that the system appeared to adapt its delivery based on the content's intended meaning. Technical publications and outlets focusing on artificial intelligence provided more in-depth analysis of 15.ai's capabilities and limitations compared to other text-to-speech technologies of the time. Rionaldi Chandraseta of AI newsletter ''Towards Data Science'' observed that voice models trained on larger datasets created more convincing output with better phrasing and natural pauses, particularly for extended text. Bai Feng of Chinese tech and AI media outlet ''XinZhiYuan'' on ''
QQ News Tencent QQ (), also known as QQ, is an instant messaging software service and web portal developed by the Mainland Chinese technology company Tencent. QQ offers services that provide online social games, music, shopping, microblogging, movies, ...
'' highlighted the technical achievement of 15.ai's high-quality output (
44.1 kHz In digital audio, 44,100 Hz (alternately represented as 44.1 kHz) is a common sampling frequency. Analog audio is often recorded by sampling it 44,100 times per second, and then these samples are used to Signal reconstruction, reconstru ...
sampling rate In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples". A sample is a value of the signal at a point in time and/or s ...
) despite using minimal training data, remarking that this was of significantly higher quality than typical deep learning text-to-speech implementations which used 16 kHz sampling rates. The outlet also acknowledged that while some pronunciation errors occurred due to the limited training data, this was understandable given that traditional deep learning models typically required 40 or more hours of training data. Similarly, Parth Mahendra of AI newsletter ''AI Daily'' observed that while the system "does a good job at accurately replicating most basic words," it struggled with more complex terms, noting that characters would "absolutely butcher the pronunciation" of certain words. Ji Yunyo of Chinese tech news website '' NetEase News'' called the technology behind 15.ai "remarkably efficient," requiring only minimal data to accurately clone numerous voices while maintaining emotional nuance and natural intonation. However, he also pointed out limitations, noting that the emotional expression was relatively "neutral" and that "extreme" emotions couldn't be properly synthesized, making it less suitable for
not safe for work Not safe for work, also called not suitable for work (NSFW), is Internet slang or shorthand used to mark links to content, videos, or website pages the viewer may not wish to be seen viewing in a public, formal, or controlled environment. The ...
applications. Ji also mentioned that while many
deepfake ''Deepfakes'' (a portmanteau of and ) are images, videos, or audio that have been edited or generated using artificial intelligence, AI-based tools or AV editing software. They may depict real or fictional people and are considered a form of ...
videos required creators to extract and edit material from hours of original content for very short results, 15.ai could achieve similar or better effects with only a few dozen minutes of training data per character, though server performance issues often meant synthesis could take over a minute to complete.


Reactions from voice actors of featured characters

Some voice actors whose characters appeared on 15.ai have publicly shared their thoughts about the platform. In a 2021 interview on video game voice acting podcast ''The VŌC'',
John Patrick Lowrie John Patrick Lowrie (born June 28, 1952) is an American voice actor best known for voicing the Sniper in ''Team Fortress 2'' and various characters in ''Dota 2''. He has played Sherlock Holmes in the radio series ''The Further Adventures of Sher ...
—who voices the Sniper in ''
Team Fortress 2 ''Team Fortress 2'' (''TF2'') is a Multiplayer video game, multiplayer first-person shooter game developed and published by Valve Corporation in 2007. It is the sequel to the 1996 ''Team Fortress'' Mod (video gaming), mod for ''Quake (video g ...
''—explained that he had discovered 15.ai when a prospective intern showed him a skit she had created using AI-generated voices of the Sniper and the Spy from ''Team Fortress 2''. Lowrie commented: He drew an analogy to synthesized music, adding: In a 2021 live broadcast on his
Twitch Twitch may refer to: Biology * Muscle contraction ** Convulsion, rapid and repeated muscle contraction and relaxation ** Fasciculation, a small, local, involuntary muscle contraction ** Myoclonic twitch, a jerk usually caused by sudden muscle c ...
channel, Nathan Vetterlein—the voice actor of the
Scout Scout may refer to: Youth movement *Scout (Scouting), a child, usually 10–18 years of age, participating in the worldwide Scouting movement ** Scouts (The Scout Association), section for 10-14 year olds in the United Kingdom ** Scouts BSA, sect ...
from ''Team Fortress 2''—listened to an AI recreation of his character's voice. He described the impression as "interesting" and noted that "there's some stuff in there."


Ethical concerns

Other voice actors had mixed reactions to 15.ai's capabilities. While some industry professionals acknowledged the technical innovation, others raised concerns about the technology's implications for their profession. When voice actor
Troy Baker Troy Edward Baker (born April 1, 1976) is an American voice actor and musician. He is known for his numerous roles in video games, including Yuri Lowell in ''Tales of Vesperia'' (2008), Joel Miller in ''The Last of Us'' franchise, Booker DeWi ...
announced his partnership with Voiceverse NFT, which had misappropriated 15.ai's technology, it sparked widespread controversy within the voice acting industry. Critics raised concerns about automated voice acting's potential reduction of employment opportunities for voice actors, risk of voice impersonation, and potential misuse in explicit content. Ruby Innes of ''
Kotaku Australia ''Kotaku'' is a video game website and blog that was originally launched in 2004 as part of the Gawker Media network. Notable former contributors to the site include Luke Smith, Cecilia D'Anastasio, Tim Rogers, and Jason Schreier. History ...
'' noted, "this practice could potentially put voice actors out of work considering you could just use their AI voice rather than getting them to voice act for a project and paying them." In her coverage of the Voiceverse controversy, Edie WK of ''Checkpoint Gaming'' raised the concern that "this kind of technology has the potential to push voice actors out of work if it becomes easier and cheaper to use AI voices instead of working with the actor directly." While 15.ai limited its scope to fictional characters and did not reproduce voices of real people or celebrities, computer scientist
Andrew Ng Andrew Yan-Tak Ng (; born April 18, 1976) is a British-American computer scientist and Internet Entrepreneur, technology entrepreneur focusing on machine learning and artificial intelligence (AI). Ng was a cofounder and head of Google Brain and ...
noted that similar technology could be used to do so, including for nefarious purposes. In his 2020 assessment of 15.ai, he wrote: While discussing potential risks, he added:


Legacy

15.ai was an early pioneer of audio deepfakes, leading to the emergence of AI speech synthesis-based memes during the initial stages of the
AI boom The AI boom is an ongoing period of rapid Progress in artificial intelligence, progress in the field of artificial intelligence (AI) that started in the late 2010s before gaining international prominence in the early 2020s. Examples include lar ...
in 2020. 15.ai is credited as the first mainstream platform to popularize AI voice cloning in
Internet meme An Internet meme, or meme (, Help:Pronunciation respelling key, ''MEEM''), is a cultural item (such as an idea, behavior, or style) that spreads across the Internet, primarily through Social media, social media platforms. Internet memes manif ...
s and content creation, particularly through its ability to generate convincing character voices in real-time without requiring extensive technical expertise. The platform's impact was especially notable in fan communities, including the ''My Little Pony: Friendship Is Magic'', ''
Portal Portal may refer to: Arts and entertainment Gaming * ''Portal'' (series), a series of video games developed by Valve ** ''Portal'' (video game), a 2007 video game, the first in the series ** '' Portal 2'', the 2011 sequel ** '' Portal Stori ...
'', ''
Team Fortress 2 ''Team Fortress 2'' (''TF2'') is a Multiplayer video game, multiplayer first-person shooter game developed and published by Valve Corporation in 2007. It is the sequel to the 1996 ''Team Fortress'' Mod (video gaming), mod for ''Quake (video g ...
'', and ''
SpongeBob SquarePants ''SpongeBob SquarePants'' is an American animated television series, animated comedy television series created by marine science educator and animator Stephen Hillenburg for Nickelodeon. It first aired as a sneak peek after the 1999 Kids' C ...
'' fandoms, where it enabled the creation of viral content that garnered millions of views across social media platforms like
Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
and
YouTube YouTube is an American social media and online video sharing platform owned by Google. YouTube was founded on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim who were three former employees of PayPal. Headquartered in ...
. ''Team Fortress 2'' content creators also used the platform to produce both short-form memes and complex narrative animations using
Source Filmmaker Source Filmmaker (often abbreviated as SFM) is a free 3D computer graphics software tool published by Valve for creating animated films, which uses the Source game engine. Source Filmmaker has been used to create many community-based animated ...
. Fan creations included skits and new fan animations (such as the popular ''Team Fortress 2'' Source Filmmaker video ''Spy's Confession''), crossover content—such as ''
Game Informer ''Game Informer'' (''GI'' is an American monthly Video game journalism, video game magazine featuring articles, news, strategy, and reviews of video games and video game console, game consoles. It debuted in August 1991, when the video game reta ...
'' writer Liana Ruppert's demonstration combining ''Portal'' and ''
Mass Effect ''Mass Effect'' is a military science fiction media franchise created by Casey Hudson. The franchise depicts a distant future where humanity and several alien civilizations have colonized the galaxy using technology left behind by Elder race, a ...
'' dialogue in her coverage of the platform—recreations of viral videos (including the infamous Big Bill Hell's Cars car dealership parody), adaptations of
fanfiction Fan fiction or fanfiction, also known as fan fic, fanfic, fic or FF, is fiction typically written in an amateur capacity by fans as a form of fan labor, unauthorized by, but based on, an existing work of fiction. The author uses copyrighted ...
using AI-generated character voices (such as ''The Tax Breaks'', a fully voiced 17-minute fan-made episode of ''Friendship Is Magic''), music videos and new musical compositions—such as the ''Team Fortress 2'' remix ''Pootis
Hardbass Hardbass or hard bass ( rus, хардбас(с), khardbas(s), xɐrdˈbas) is a subgenre of pumping house that originated in Saint Petersburg, Russia during the late 1990s, drawing inspiration from bouncy techno, hardstyle, as well as local Ru ...
''—and content where characters recited
sea shanties A sea shanty, shanty, chantey, or chanty () is a genre of traditional folk song that was once commonly sung as a work song to accompany rhythmical labor aboard large merchant sailing vessels. The term ''shanty'' most accurately refers to a sp ...
. Some fan creations gained mainstream attention, such as a viral edit replacing
Donald Trump Donald John Trump (born June 14, 1946) is an American politician, media personality, and businessman who is the 47th president of the United States. A member of the Republican Party (United States), Republican Party, he served as the 45 ...
's cameo in '' Home Alone 2: Lost in New York'' with the
Heavy Weapons Guy ''Team Fortress 2'' (''TF2'') is a Multiplayer video game, multiplayer first-person shooter game developed and published by Valve Corporation in 2007. It is the sequel to the 1996 ''Team Fortress'' Mod (video gaming), mod for ''Quake (video g ...
's AI-generated voice, which was featured on a daytime
CNN Cable News Network (CNN) is a multinational news organization operating, most notably, a website and a TV channel headquartered in Atlanta. Founded in 1980 by American media proprietor Ted Turner and Reese Schonfeld as a 24-hour cable ne ...
segment in January 2021. Some users integrated 15.ai's voice synthesis with VoiceAttack, a voice command software, to create personal assistants. Its influence has been noted in the years after it became defunct, with several commercial alternatives emerging to fill the void, such as
ElevenLabs ElevenLabs is a software company that specializes in developing natural-sounding speech synthesis software using deep learning. History ElevenLabs was co-founded in 2022 by Piotr Dąbkowski, an ex-Google machine learning engineer and Mati ...
and
Speechify Speechify is a mobile, Chrome extension, and desktop app that reads text aloud using a computer-generated text to speech voice. The app also uses optical character recognition technology to turn physical books or printed text into audio which c ...
. Contemporary generative voice AI companies have acknowledged 15.ai's pioneering role.
Y Combinator Y Combinator, LLC (YC) is an American technology startup accelerator and venture capital firm launched in March 2005 which has been used to launch more than 5,000 companies. The accelerator program started in Boston and Mountain View, Californi ...
startup PlayHT called the debut of 15.ai "a breakthrough in the field of text-to-speech (TTS) and speech synthesis".
Cliff Weitzman Cliff Weitzman is an Israeli-American entrepreneur and the co-founder of Speechify Text To Speech software. In 2017, Weitzman was named to ''Forbes'' magazine's 30 Under 30 list. Weitzman is a dyslexia Dyslexia (), previously known as w ...
, the founder and CEO of
Speechify Speechify is a mobile, Chrome extension, and desktop app that reads text aloud using a computer-generated text to speech voice. The app also uses optical character recognition technology to turn physical books or printed text into audio which c ...
, credited 15.ai for "making AI voice cloning popular for content creation by being the first ..to feature popular existing characters from fandoms". Mati Staniszewski, co-founder and CEO of
ElevenLabs ElevenLabs is a software company that specializes in developing natural-sounding speech synthesis software using deep learning. History ElevenLabs was co-founded in 2022 by Piotr Dąbkowski, an ex-Google machine learning engineer and Mati ...
, wrote that 15.ai was transformative in the field of AI text-to-speech. At
brony conventions A fan convention (also called a brony convention, or brony con or pony con) is a fan convention organized for the fandom of the animated television series ''My Little Pony: Friendship Is Magic'', whose adult fans are commonly referred to as '' ...
, 15.ai has been discussed in
presentation A presentation conveys information from a speaker to an audience. Presentations are typically demonstrations, introduction, lecture, or speech meant to inform, persuade, inspire, motivate, build goodwill, or present a new idea/product. Presenta ...
s on the intersection of the ''My Little Pony'' fandom and
artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
. Prior to its shutdown, 15.ai established several technical precedents that influenced subsequent developments in AI voice synthesis. Its integration of
DeepMoji An emoji ( ; plural emoji or emojis; , ) is a pictogram, logogram, ideogram, or smiley embedded in text and used in electronic messages and web pages. The primary function of modern emoji is to fill in emotional cues otherwise missing from type ...
for emotional analysis demonstrated the viability of incorporating sentiment-aware speech generation, while its support for ARPABET
phonetic transcription Phonetic transcription (also known as Phonetic script or Phonetic notation) is the visual representation of speech sounds (or ''phonetics'') by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the ...
s set a standard for precise pronunciation control in public-facing voice synthesis tools. The platform's unified multi-speaker model, which enabled simultaneous training of diverse character voices, proved particularly influential. This approach allowed the system to recognize emotional patterns across different voices even when certain emotions were absent from individual character training sets; for example, if one character had examples of joyful speech but no angry examples, while another had angry but no joyful samples, the system could learn to generate both emotions for both characters by understanding the common patterns of how emotions affect speech. 15.ai also made a key contribution in reducing training data requirements for speech synthesis. Earlier systems like
Google AI Google AI is a division of Google dedicated to artificial intelligence. It was announced at Google I/O 2017 by CEO Sundar Pichai. This division has expanded its reach with research facilities in various parts of the world such as Zurich, Pa ...
's Tacotron and
Microsoft Research Microsoft Research (MSR) is the research subsidiary of Microsoft. It was created in 1991 by Richard Rashid, Bill Gates and Nathan Myhrvold with the intent to advance state-of-the-art computing and solve difficult world problems through technologi ...
's FastSpeech required tens of hours of audio to produce acceptable results and failed to generate intelligible speech with less than 24 minutes of training data. In contrast, 15.ai demonstrated the ability to generate speech with substantially less training data—specifically, the name "15.ai" refers to the creator's claim that a voice could be cloned with just 15 seconds of data. This approach to data efficiency influenced subsequent developments in AI voice synthesis technology, as the 15-second benchmark became a reference point for subsequent voice synthesis systems. The original claim that only 15 seconds of data is required to clone a human's voice was corroborated by
OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...
in 2024.


See also

*
AI boom The AI boom is an ongoing period of rapid Progress in artificial intelligence, progress in the field of artificial intelligence (AI) that started in the late 2010s before gaining international prominence in the early 2020s. Examples include lar ...
* Character.ai *
Deepfake ''Deepfakes'' (a portmanteau of and ) are images, videos, or audio that have been edited or generated using artificial intelligence, AI-based tools or AV editing software. They may depict real or fictional people and are considered a form of ...
*
Ethics of artificial intelligence The ethics of artificial intelligence covers a broad range of topics within AI that are considered to have particular ethical stakes. This includes algorithmic biases, Fairness (machine learning), fairness, automated decision-making, accountabili ...
*
WaveNet WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices ...
* ''My Little Pony: Friendship Is Magic'' fandom *
Synthetic media Synthetic media (also known as AI-generated media, media produced by generative AI, personalized media, personalized content, and colloquially as deepfakes) is a catch-all term for the artificial production, manipulation, and modification of dat ...


Explanatory footnotes


References


Notes


Tweets


Videos


Works cited

* * * * * * * * * * * * * * * * * * * * * * * * * * * *
Alt URL
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *


External links


Archived frontend
* Internet properties established in 2020 Applications of artificial intelligence 2020 software 2020 in Internet culture 2020s in Internet culture 2020s fads and trends Internet-related controversies 2022 controversies Web applications Speech synthesis Deep learning software applications Deepfakes Generative artificial intelligence My Little Pony: Friendship Is Magic fandom Massachusetts Institute of Technology software American websites English-language websites {{Artificial intelligence navbox