Articulatory synthesis refers to computational techniques for synthesizing speech based on models of the human

vocal tract The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source (larynx in mammals; syrinx in birds) is filtered. In birds, it consists of the trachea, the syrinx, the oral cavity, the upper part of t ...

and the articulation processes occurring there. The shape of the vocal tract can be controlled in a number of ways which usually involves modifying the position of the speech articulators, such as the

tongue The tongue is a Muscle, muscular organ (anatomy), organ in the mouth of a typical tetrapod. It manipulates food for chewing and swallowing as part of the digestive system, digestive process, and is the primary organ of taste. The tongue's upper s ...

jaw The jaws are a pair of opposable articulated structures at the entrance of the mouth, typically used for grasping and manipulating food. The term ''jaws'' is also broadly applied to the whole of the structures constituting the vault of the mouth ...

, and lips. Speech is created by digitally simulating the flow of air through the representation of the vocal tract.

Mechanical talking heads

There is a long history of attempts to build mechanical "

talking heads Talking Heads were an American Rock music, rock band formed in New York City in 1975.Talking Heads

". Gerbert (d. 1003),

Albertus Magnus Albertus Magnus ( 1200 – 15 November 1280), also known as Saint Albert the Great, Albert of Swabia, Albert von Bollstadt, or Albert of Cologne, was a German Dominican friar, philosopher, scientist, and bishop, considered one of the great ...

(1198–1280) and

Roger Bacon Roger Bacon (; or ', also '' Rogerus''; ), also known by the Scholastic accolades, scholastic accolade ''Doctor Mirabilis'', was a medieval English polymath, philosopher, scientist, theologian and Franciscans, Franciscan friar who placed co ...

(1214–1294) are all said to have built speaking heads ( Wheatstone 1837). However, historically confirmed speech synthesis begins with Wolfgang von Kempelen (1734–1804), who published an account of his research in 1791 (see also ).

Electrical vocal tract analogs

The first electrical vocal tract analogs were static, like those of Dunn (1950), Ken Stevens and colleagues (1953), Gunnar Fant (1960). Rosen (1958) built a dynamic vocal tract (DAVO), which Dennis (1963) later attempted to control by computer. Dennis et al. (1964), Hiki et al. (1968) and Baxter and Strong (1969) have also described hardware vocal-tract analogs. Kelly and Lochbaum (1962) made the first computer simulation; later digital computer simulations have been made, e.g. by Nakata and Mitsuoka (1965), Matsui (1968) and Paul Mermelstein (1971). Honda et al. (1968) have made an

analog computer An analog computer or analogue computer is a type of computation machine (computer) that uses physical phenomena such as Electrical network, electrical, Mechanics, mechanical, or Hydraulics, hydraulic quantities behaving according to the math ...

simulation.

Haskins and Maeda models

The first software articulatory synthesizer regularly used for laboratory experiments was developed at Haskins Laboratories in the mid-1970s by Philip Rubin, Tom Baer, and Paul Mermelstein. This synthesizer, known as ASY, was a computational model of speech production based on vocal tract models developed at

Bell Laboratories Nokia Bell Labs, commonly referred to as ''Bell Labs'', is an American industrial research and development company owned by Finnish technology company Nokia. With headquarters located in Murray Hill, New Jersey, the company operates several lab ...

in the 1960s and 1970s by Paul Mermelstein, Cecil Coker, and colleagues. Another popular model that has been frequently used is that of Shinji Maeda, which uses a factor-based approach to control

shape.

Modern models

Recent progress in speech production imaging, articulatory control modeling, and tongue biomechanics modeling has led to changes in the way articulatory synthesis is performe

Examples include the Haskins CASY model (Configurable Articulatory Synthesis), designed by Philip Rubin, Mark Tied

, and Louis Goldstei

which matches midsagittal vocal tracts to actual

magnetic resonance imaging Magnetic resonance imaging (MRI) is a medical imaging technique used in radiology to generate pictures of the anatomy and the physiological processes inside the body. MRI scanners use strong magnetic fields, magnetic field gradients, and ...

(MRI) data, and uses MRI data to construct a 3D model of the vocal tract. A full 3D articulatory synthesis model has been described by Olov Engwall. A geometrically based 3D articulatory speech synthesizer has been developed by Peter Birkholz (VocalTractLab). The Directions Into Velocities of Articulators (DIVA) model, a feedforward control approach which takes the neural computations underlying speech production into consideration, was developed by Frank H. Guenther at

Boston University Boston University (BU) is a Private university, private research university in Boston, Massachusetts, United States. BU was founded in 1839 by a group of Boston Methodism, Methodists with its original campus in Newbury (town), Vermont, Newbur ...

. The ArtiSynth project, headed by Sidney Fel

at the

University of British Columbia The University of British Columbia (UBC) is a Public university, public research university with campuses near University of British Columbia Vancouver, Vancouver and University of British Columbia Okanagan, Kelowna, in British Columbia, Canada ...

, is a 3D biomechanical modeling toolkit for the human vocal tract and upper airway. Biomechanical modeling of articulators such as the

has been pioneered by a number of scientists, including Reiner Wilhelms-Tricaric

Yohan Paya

and Jean-Michel Gerar

Jianwu Dang and Kiyoshi Hond

Commercial models

One of the few commercial articulatory speech synthesis systems is the

NeXT NeXT, Inc. (later NeXT Computer, Inc. and NeXT Software, Inc.) was an American technology company headquartered in Redwood City, California that specialized in computer workstations for higher education and business markets, and later develope ...

-based system originally developed and marketed by Trillium Sound Research, a spin-off company of the

University of Calgary {{Infobox university , name = University of Calgary , image = University of Calgary coat of arms without motto scroll.svg , image_size = 150px , caption = Coat of arms , former ...

, where much of the original research was conducted. Following the demise of the various incarnations of

(started by

Steve Jobs Steven Paul Jobs (February 24, 1955 – October 5, 2011) was an American businessman, inventor, and investor best known for co-founding the technology company Apple Inc. Jobs was also the founder of NeXT and chairman and majority shareholder o ...

in the late 1980s and merged with

Apple Computer Apple Inc. is an American multinational corporation and technology company headquartered in Cupertino, California, in Silicon Valley. It is best known for its consumer electronics, software, and services. Founded in 1976 as Apple Computer Co ...

in 1997), the Trillium software was published under a

GNU General Public Licence The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or copyleft, ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was th ...

, with work continuing as gnuspeech. The system, first marketed in 1994, provides full articulatory-based text-to-speech conversion using a waveguide or transmission-line analog of the human oral and nasal tracts controlled by Rene Carré's "distinctive region model".Real-time articulatory speech-synthesis-by-rules
/ref>

Footnotes

Bibliography

* Baxter, Brent, and William J. Strong. (1969). WINDBAG—a vocal-tract analog speech synthesizer. ''Journal of the Acoustical Society of America'', 45, 309(A). * Birkholz P, Jackel D, Kröger BJ (2007) Simulation of losses due to turbulence in the time-varying vocal system. ''IEEE Transactions on Audio, Speech, and Language Processing'' 15: 1218-1225 * Birkholz P, Jackel D, Kröger BJ (2006) Construction and control of a three-dimensional vocal tract model. ''Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006)'' (Toulouse, France) pp. 873–876 * Coker. C. H. (1968). Speech synthesis with a parametric articulatory model. ''Proc. Speech. Symp., Kyoto, Japan'', paper A-4. * * * Dennis, Jack B. (1963). Computer control of an analog vocal tract. ''Journal of the Acoustical Society of America'', 35, 1115(A). * * * Engwall, O. (2003). Combining MRI, EMA & EPG measurements in a three-dimensional tongue model. Speech Communication, 41, 303–329. * Fant, C. Gunnar M. (1960). ''Acoustic theory of speech production''. The Hague, Mouton. * * * Henke, W. L. (1966). Dynamic Articulatory Model of Speech Production Using Computer Simulation. Unpublished doctoral dissertation, MIT, Cambridge, MA. * Honda, Takashi, Seiichi Inoue, and Yasuo Ogawa. (1968). A hybrid control system of a human vocal tract simulator. ''Reports of the 6th International Congress on Acoustics'', ed. by Y. Kohasi, pp. 175–8. Tokyo, International Council of Scientific Unions. * Kelly, John L., and Carol Lochbaum. (1962). Speech synthesis. ''Proceedings of the Speech Communications Seminar'', paper F7. Stockholm, Speech Transmission Laboratory, Royal Institute of Technology. * Kempelen, Wolfgang R. Von. (1791). ''Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine''. Wien, J. B. Degen. * Maeda, S. (1988). Improved articulatory model. ''Journal of the Acoustical Society of America'', 84, Sup. 1, S146. * Maeda, S. (1990). Compensatory articulation during speech: evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model. In W. J. Hardcastle and A. Marchal (Eds.), ''Speech Production and Speech Modelling'', Kluwer Academic, Dordrecht, 131–149. * Matsui, Eiichi. (1968). Computer-simulated vocal organs. ''Reports of the 6th International Congress on Acoustics'', ed. by Y. Kohasi, pp. 151–4. Tokyo, International Council of Scientific Unions. * Mermelstein, Paul. (1969). Computer simulation of articulatory activity in speech production. ''Proceedings of the International Joint Conference on Artificial Intelligence'', Washington, D.C., 1969, ed. by D. E. Walker and L. M. Norton. New York, Gordon & Breach. * * * * * * Rubin, P., Saltzman, E., Goldstein, L., McGowan, R., Tiede, M., & Browman, C. (1996). CASY and extensions to the task-dynamic model. ''Proceedings of the 1st ESCA Tutorial and Research Workshop on Speech Producing Modeling - 4th Speech Production Seminar'', 125–128. *

External links

* *
Introduction to Articulatory Speech Synthesis
* or a description from the

BBC The British Broadcasting Corporation (BBC) is a British public service broadcaster headquartered at Broadcasting House in London, England. Originally established in 1922 as the British Broadcasting Company, it evolved into its current sta ...

on .
Pink Trombone bare-handed speech synthesis online tool
& {{Speech synthesis Speech synthesis Articles containing video clips