Articulatory synthesis refers to computational techniques for
synthesizing speech based on models of the human
vocal tract
The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source (larynx in mammals; syrinx in birds) is filtered.
In birds, it consists of the trachea, the syrinx, the oral cavity, the upper part of t ...
and the articulation processes occurring there. The shape of the vocal tract can be controlled in a number of ways which usually involves modifying the position of the speech articulators, such as the
tongue
The tongue is a Muscle, muscular organ (anatomy), organ in the mouth of a typical tetrapod. It manipulates food for chewing and swallowing as part of the digestive system, digestive process, and is the primary organ of taste. The tongue's upper s ...
,
jaw
The jaws are a pair of opposable articulated structures at the entrance of the mouth, typically used for grasping and manipulating food. The term ''jaws'' is also broadly applied to the whole of the structures constituting the vault of the mouth ...
, and lips. Speech is created by digitally simulating the flow of air through the representation of the vocal tract.
Mechanical talking heads
There is a long history of attempts to build mechanical "
talking heads
Talking Heads were an American Rock music, rock band formed in New York City in 1975.[Talking Heads](_blank) ".
Gerbert (d. 1003),
Albertus Magnus
Albertus Magnus ( 1200 – 15 November 1280), also known as Saint Albert the Great, Albert of Swabia, Albert von Bollstadt, or Albert of Cologne, was a German Dominican friar, philosopher, scientist, and bishop, considered one of the great ...
(1198–1280) and
Roger Bacon
Roger Bacon (; or ', also '' Rogerus''; ), also known by the Scholastic accolades, scholastic accolade ''Doctor Mirabilis'', was a medieval English polymath, philosopher, scientist, theologian and Franciscans, Franciscan friar who placed co ...
(1214–1294) are all said to have built speaking heads (
Wheatstone 1837). However, historically confirmed speech synthesis begins with
Wolfgang von Kempelen (1734–1804), who published an account of his research in 1791 (see also ).
Electrical vocal tract analogs
The first electrical vocal tract analogs were static, like those of Dunn (1950),
Ken Stevens and colleagues (1953),
Gunnar Fant (1960). Rosen (1958) built a dynamic vocal tract (DAVO), which Dennis (1963) later attempted to control by computer. Dennis et al. (1964), Hiki et al. (1968) and Baxter and Strong (1969) have also described hardware vocal-tract analogs. Kelly and Lochbaum (1962) made the first computer simulation; later digital computer simulations have been made, e.g. by Nakata and Mitsuoka (1965), Matsui (1968) and Paul Mermelstein (1971). Honda et al. (1968) have made an
analog computer
An analog computer or analogue computer is a type of computation machine (computer) that uses physical phenomena such as Electrical network, electrical, Mechanics, mechanical, or Hydraulics, hydraulic quantities behaving according to the math ...
simulation.
Haskins and Maeda models
The first software articulatory synthesizer regularly used for laboratory experiments was developed at
Haskins Laboratories in the mid-1970s by
Philip Rubin, Tom Baer, and Paul Mermelstein. This synthesizer, known as ASY, was a computational model of speech production based on vocal tract models developed at
Bell Laboratories
Nokia Bell Labs, commonly referred to as ''Bell Labs'', is an American industrial research and development company owned by Finnish technology company Nokia. With headquarters located in Murray Hill, New Jersey, the company operates several lab ...
in the 1960s and 1970s by Paul Mermelstein, Cecil Coker, and colleagues. Another popular model that has been frequently used is that of Shinji Maeda, which uses a factor-based approach to control
tongue
The tongue is a Muscle, muscular organ (anatomy), organ in the mouth of a typical tetrapod. It manipulates food for chewing and swallowing as part of the digestive system, digestive process, and is the primary organ of taste. The tongue's upper s ...
shape.
Modern models
Recent progress in speech production imaging, articulatory control modeling, and tongue biomechanics modeling has led to changes in the way articulatory synthesis is performe
Examples include the Haskins CASY model (Configurable Articulatory Synthesis), designed by
Philip Rubin, Mark Tied
, and Louis Goldstei
which matches midsagittal vocal tracts to actual
magnetic resonance imaging
Magnetic resonance imaging (MRI) is a medical imaging technique used in radiology to generate pictures of the anatomy and the physiological processes inside the body. MRI scanners use strong magnetic fields, magnetic field gradients, and ...
(MRI) data, and uses MRI data to construct a 3D model of the vocal tract. A full 3D articulatory synthesis model has been described by Olov Engwall. A geometrically based 3D articulatory speech synthesizer has been developed by Peter Birkholz (VocalTractLab). The
Directions Into Velocities of Articulators (DIVA) model, a feedforward control approach which takes the neural computations underlying speech production into consideration, was developed by
Frank H. Guenther at
Boston University
Boston University (BU) is a Private university, private research university in Boston, Massachusetts, United States. BU was founded in 1839 by a group of Boston Methodism, Methodists with its original campus in Newbury (town), Vermont, Newbur ...
. The ArtiSynth project, headed by Sidney Fel
at the
University of British Columbia
The University of British Columbia (UBC) is a Public university, public research university with campuses near University of British Columbia Vancouver, Vancouver and University of British Columbia Okanagan, Kelowna, in British Columbia, Canada ...
, is a 3D biomechanical modeling toolkit for the human vocal tract and upper airway. Biomechanical modeling of articulators such as the
tongue
The tongue is a Muscle, muscular organ (anatomy), organ in the mouth of a typical tetrapod. It manipulates food for chewing and swallowing as part of the digestive system, digestive process, and is the primary organ of taste. The tongue's upper s ...
has been pioneered by a number of scientists, including Reiner Wilhelms-Tricaric
Yohan Paya
and Jean-Michel Gerar
Jianwu Dang and Kiyoshi Hond
Commercial models
One of the few commercial articulatory speech synthesis systems is the
NeXT
NeXT, Inc. (later NeXT Computer, Inc. and NeXT Software, Inc.) was an American technology company headquartered in Redwood City, California that specialized in computer workstations for higher education and business markets, and later develope ...
-based system originally developed and marketed by Trillium Sound Research, a spin-off company of the
University of Calgary
{{Infobox university
, name = University of Calgary
, image = University of Calgary coat of arms without motto scroll.svg
, image_size = 150px
, caption = Coat of arms
, former ...
, where much of the original research was conducted. Following the demise of the various incarnations of
NeXT
NeXT, Inc. (later NeXT Computer, Inc. and NeXT Software, Inc.) was an American technology company headquartered in Redwood City, California that specialized in computer workstations for higher education and business markets, and later develope ...
(started by
Steve Jobs
Steven Paul Jobs (February 24, 1955 – October 5, 2011) was an American businessman, inventor, and investor best known for co-founding the technology company Apple Inc. Jobs was also the founder of NeXT and chairman and majority shareholder o ...
in the late 1980s and merged with
Apple Computer
Apple Inc. is an American multinational corporation and technology company headquartered in Cupertino, California, in Silicon Valley. It is best known for its consumer electronics, software, and services. Founded in 1976 as Apple Computer Co ...
in 1997), the Trillium software was published under a
GNU General Public Licence
The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or copyleft, ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was th ...
, with work continuing as
gnuspeech. The system, first marketed in 1994, provides full articulatory-based text-to-speech conversion using a waveguide or transmission-line analog of the human oral and nasal tracts controlled by Rene Carré's "distinctive region model".
Real-time articulatory speech-synthesis-by-rules
/ref>
See also
* Articulatory phonetics
The field of articulatory phonetics is a subfield of phonetics that studies articulation and ways that humans produce speech. Articulatory phoneticians explain how humans produce speech sounds via the interaction of different physiological struc ...
* Articulatory phonology
* Neurocomputational speech processing
* Praat
* Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...
Footnotes
Bibliography
* Baxter, Brent, and William J. Strong. (1969). WINDBAG—a vocal-tract analog speech synthesizer. ''Journal of the Acoustical Society of America'', 45, 309(A).
* Birkholz P, Jackel D, Kröger BJ (2007) Simulation of losses due to turbulence in the time-varying vocal system. ''IEEE Transactions on Audio, Speech, and Language Processing'' 15: 1218-1225
* Birkholz P, Jackel D, Kröger BJ (2006) Construction and control of a three-dimensional vocal tract model. ''Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006)'' (Toulouse, France) pp. 873–876
* Coker. C. H. (1968). Speech synthesis with a parametric articulatory model. ''Proc. Speech. Symp., Kyoto, Japan'', paper A-4.
*
*
* Dennis, Jack B. (1963). Computer control of an analog vocal tract. ''Journal of the Acoustical Society of America'', 35, 1115(A).
*
*
* Engwall, O. (2003). Combining MRI, EMA & EPG measurements in a three-dimensional tongue model. Speech Communication, 41, 303–329.
* Fant, C. Gunnar M. (1960). ''Acoustic theory of speech production''. The Hague, Mouton.
*
*
* Henke, W. L. (1966). Dynamic Articulatory Model of Speech Production Using Computer Simulation. Unpublished doctoral dissertation, MIT, Cambridge, MA.
* Honda, Takashi, Seiichi Inoue, and Yasuo Ogawa. (1968). A hybrid control system of a human vocal tract simulator. ''Reports of the 6th International Congress on Acoustics'', ed. by Y. Kohasi, pp. 175–8. Tokyo, International Council of Scientific Unions.
* Kelly, John L., and Carol Lochbaum. (1962). Speech synthesis. ''Proceedings of the Speech Communications Seminar'', paper F7. Stockholm, Speech Transmission Laboratory, Royal Institute of Technology.
* Kempelen, Wolfgang R. Von. (1791). ''Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine''. Wien, J. B. Degen.
* Maeda, S. (1988). Improved articulatory model. ''Journal of the Acoustical Society of America'', 84, Sup. 1, S146.
* Maeda, S. (1990). Compensatory articulation during speech: evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model. In W. J. Hardcastle and A. Marchal (Eds.), ''Speech Production and Speech Modelling'', Kluwer Academic, Dordrecht, 131–149.
* Matsui, Eiichi. (1968). Computer-simulated vocal organs. ''Reports of the 6th International Congress on Acoustics'', ed. by Y. Kohasi, pp. 151–4. Tokyo, International Council of Scientific Unions.
* Mermelstein, Paul. (1969). Computer simulation of articulatory activity in speech production. ''Proceedings of the International Joint Conference on Artificial Intelligence'', Washington, D.C., 1969, ed. by D. E. Walker and L. M. Norton. New York, Gordon & Breach.
*
*
*
*
*
* Rubin, P., Saltzman, E., Goldstein, L., McGowan, R., Tiede, M., & Browman, C. (1996). CASY and extensions to the task-dynamic model. ''Proceedings of the 1st ESCA Tutorial and Research Workshop on Speech Producing Modeling - 4th Speech Production Seminar'', 125–128.
*
External links
*
*
Introduction to Articulatory Speech Synthesis
* or a description from the BBC
The British Broadcasting Corporation (BBC) is a British public service broadcaster headquartered at Broadcasting House in London, England. Originally established in 1922 as the British Broadcasting Company, it evolved into its current sta ...
on .
Pink Trombone bare-handed speech synthesis online tool
&
{{Speech synthesis
Speech synthesis
Articles containing video clips