Phonetics is a branch of

linguistics Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...

that studies how humans produce and perceive sounds or, in the case of

sign language Sign languages (also known as signed languages) are languages that use the visual-manual modality to convey meaning, instead of spoken words. Sign languages are expressed through manual articulation in combination with #Non-manual elements, no ...

s, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. The field of phonetics is traditionally divided into three sub-disciplines on questions involved such as how humans plan and execute movements to produce speech (

articulatory phonetics The field of articulatory phonetics is a subfield of phonetics that studies articulation and ways that humans produce speech. Articulatory phoneticians explain how humans produce speech sounds via the interaction of different physiological struc ...

), how various movements affect the properties of the resulting sound (

acoustic phonetics Acoustic phonetics is a subfield of phonetics, which deals with acoustic aspects of speech sounds. Acoustic phonetics investigates time domain features such as the mean squared amplitude of a waveform, its duration, its fundamental frequency, ...

) or how humans convert sound waves to linguistic information (

auditory phonetics Auditory phonetics is the branch of phonetics concerned with the hearing of speech sounds and with speech perception. It thus entails the study of the relationships between speech stimuli and a listener's responses to such stimuli as mediated by me ...

). Traditionally, the minimal linguistic unit of phonetics is the

phone A telephone, colloquially referred to as a phone, is a telecommunications device that enables two or more users to conduct a conversation when they are too far apart to be easily heard directly. A telephone converts sound, typically and most ...

—a speech sound in a language which differs from the phonological unit of

phoneme A phoneme () is any set of similar Phone (phonetics), speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible Phonetics, phonetic unit—that helps distinguish one word fr ...

; the phoneme is an abstract categorization of phones and it is also defined as the smallest unit that discerns meaning between sounds in any given language. Phonetics deals with two aspects of human speech: production (the ways humans make sounds) and perception (the way speech is understood). The

communicative modality In semiotics, a modality is a particular way in which information is to be encoded for presentation to humans, i.e. to the type of sign and to the status of reality ascribed to or claimed by a sign, text, or genre. It is more closely associated w ...

of a language describes the method by which a language produces and perceives languages. Languages with oral-aural modalities such as English produce speech orally and perceive speech aurally (using the ears). Sign languages, such as Australian Sign Language (Auslan) and

American Sign Language American Sign Language (ASL) is a natural language that serves as the predominant sign language of Deaf communities in the United States and most of Anglophone Canadians, Anglophone Canada. ASL is a complete and organized visual language that i ...

(ASL), have a manual-visual modality, producing speech manually (using the hands) and perceiving speech visually. ASL and some other sign languages have in addition a manual-manual dialect for use in

tactile signing Tactile signing is a common means of communication used by people with deafblindness. It is based on a sign language or another system of manual communication. "Tactile signing" refers to the mode or medium, i.e. signing (using some form of signe ...

by deafblind speakers where signs are produced with the hands and perceived with the hands as well. Language production consists of several interdependent processes which transform a non-linguistic message into a spoken or signed linguistic signal. After identifying a message to be linguistically encoded, a speaker must select the individual words—known as

lexical item In lexicography, a lexical item is a single word, a part of a word, or a chain of words (catena (linguistics), catena) that forms the basic elements of a language's lexicon (≈ vocabulary). Examples are ''cat'', ''traffic light'', ''take ca ...

s—to represent that message in a process called lexical selection. During phonological encoding, the mental representation of the words are assigned their phonological content as a sequence of

s to be produced. The phonemes are specified for articulatory features which denote particular goals such as closed lips or the tongue in a particular location. These phonemes are then coordinated into a sequence of muscle commands that can be sent to the muscles and when these commands are executed properly the intended sounds are produced. These movements disrupt and modify an airstream which results in a sound wave. The modification is done by the articulators, with different places and manners of articulation producing different acoustic results. For example, the words ''tack'' and ''sack'' both begin with alveolar sounds in English, but differ in how far the tongue is from the alveolar ridge. This difference has large effects on the air stream and thus the sound that is produced. Similarly, the direction and source of the airstream can affect the sound. The most common airstream mechanism is pulmonic (using the lungs) but the glottis and tongue can also be used to produce airstreams. Language perception is the process by which a linguistic signal is decoded and understood by a listener. To perceive speech, the continuous acoustic signal must be converted into discrete linguistic units such as

phonemes A phoneme () is any set of similar speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible phonetic unit—that helps distinguish one word from another. All languages con ...

morphemes A morpheme is any of the smallest meaningful constituents within a linguistic expression and particularly within a word. Many words are themselves standalone morphemes, while other words contain multiple morphemes; in linguistic terminology, this ...

and

words A word is a basic element of language that carries meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguists on its ...

. To correctly identify and categorize sounds, listeners prioritize certain aspects of the signal that can reliably distinguish between linguistic categories. While certain cues are prioritized over others, many aspects of the signal can contribute to perception. For example, though oral languages prioritize acoustic information, the

McGurk effect The McGurk effect is a perceptual phenomenon that demonstrates an interaction between hearing and vision in speech perception. The illusion occurs when the auditory component of one sound is paired with the visual component of another sound, lea ...

shows that visual information is used to distinguish ambiguous information when the acoustic cues are unreliable. The three branches of modern phonetics are

, which addresses the way sounds are made,

, which addresses the acoustic results of different articulations, and

, which studies the way listeners perceive and understand linguistic signals.

History

Antiquity

The first known study of phonetics was undertaken by

Sanskrit Sanskrit (; stem form ; nominal singular , ,) is a classical language belonging to the Indo-Aryan languages, Indo-Aryan branch of the Indo-European languages. It arose in northwest South Asia after its predecessor languages had Trans-cultural ...

grammarians as early as the 6th century BCE. The Hindu scholar

Pāṇini (; , ) was a Sanskrit grammarian, logician, philologist, and revered scholar in ancient India during the mid-1st millennium BCE, dated variously by most scholars between the 6th–5th and 4th century BCE. The historical facts of his life ar ...

is among the most well known of these early investigators. His four-part grammar, written , is influential in modern linguistics and still represents "the most complete generative grammar of any language yet written". His grammar formed the basis of modern linguistics and described several important phonetic principles, including voicing. This early account described resonance as being produced either by tone, when vocal folds are closed, or noise, when vocal folds are open. The phonetic principles in the grammar are considered "primitives" in that they are the basis for his theoretical analysis rather than the objects of theoretical analysis themselves, and the principles can be inferred from his system of phonology. The Sanskrit study of phonetics is called ''

Shiksha ''Shiksha'' (, ) is a Sanskrit word, which means "instruction, lesson, learning, study of skill".Sir Monier Monier-Williams A DkSanskrit-English Dictionary: Etymologically and Philologically Arranged with Special Reference to Cognate Indo-Europ ...

'', which the 1st-millennium BCE

Taittiriya Upanishad The Taittiriya Upanishad (, ) is a Vedic era Sanskrit text, embedded as three chapters (''adhyāya'') of the Yajurveda. It is a '' mukhya'' (primary, principal) Upanishad, and likely composed about 6th century BCE. The Taittirīya Upanishad is ...

defines as follows: ''Om! We will explain the Shiksha.''
''Sounds and accentuation, Quantity (of vowels) and the expression (of consonants),''
''Balancing (Saman) and connection (of sounds), So much about the study of Shiksha.'' , , 1 , Taittiriya Upanishad 1.2, Shikshavalli, translated by Paul Deussen'.

Modern

Advancements in phonetics after Pāṇini and his contemporaries were limited until the modern era, save some limited investigations by Greek and Roman grammarians. In the millennia between Indic grammarians and modern phonetics, the focus shifted from the difference between spoken and written language, which was the driving force behind Pāṇini's account, and began to focus on the physical properties of speech alone. Sustained interest in phonetics began again around 1800 CE with the term "phonetics" being first used in the present sense in 1841. With new developments in medicine and the development of audio and visual recording devices, phonetic insights were able to use and review new and more detailed data. This early period of modern phonetics included the development of an influential phonetic alphabet based on articulatory positions by

Alexander Melville Bell Alexander Melville Bell (1 March 18197 August 1905) was a teacher and researcher of articulatory phonetics, physiological phonetics and was the author of numerous works on orthoepy and elocution. Additionally he was also the creator of Visible ...

. Known as visible speech, it gained prominence as a tool in the oral education of deaf children. Before the widespread availability of audio recording equipment, phoneticians relied heavily on a tradition of practical phonetics to ensure that transcriptions and findings were able to be consistent across phoneticians. This training involved both ear training—the recognition of speech sounds—as well as production training—the ability to produce sounds. Phoneticians were expected to learn to recognize by ear the various sounds on the

International Phonetic Alphabet The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standard written representation ...

and the IPA still tests and certifies speakers on their ability to accurately produce the phonetic patterns of English (though they have discontinued this practice for other languages). As a revision of his visible speech method, Melville Bell developed a description of vowels by height and backness resulting in 9

cardinal vowel Cardinal vowels are a set of reference vowels used by phoneticians in describing the sounds of languages. They are classified depending on the position of the tongue relative to the roof of the mouth, how far forward or back is the highest p ...

s. As part of their training in practical phonetics, phoneticians were expected to learn to produce these cardinal vowels to anchor their perception and transcription of these phones during fieldwork. This approach was critiqued by

Peter Ladefoged Peter Nielsen Ladefoged ( , ; 17 September 1925 – 24 January 2006) was a British linguist and phonetician. He was Professor of Phonetics at University of California, Los Angeles (UCLA), where he taught from 1962 to 1991. His book '' A Course ...

in the 1960s based on experimental evidence where he found that cardinal vowels were auditory rather than articulatory targets, challenging the claim that they represented articulatory anchors by which phoneticians could judge other articulations.

Production

Language production consists of several interdependent processes which transform a nonlinguistic message into a spoken or signed linguistic signal. Linguists debate whether the process of language production occurs in a series of stages (serial processing) or whether production processes occur in parallel. After identifying a message to be linguistically encoded, a speaker must select the individual words—known as

s—to represent that message in a process called lexical selection. The words are selected based on their meaning, which in linguistics is called

semantic Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...

information. Lexical selection activates the word's lemma, which contains both semantic and grammatical information about the word. After an utterance has been planned, it then goes through phonological encoding. In this stage of language production, the mental representation of the words are assigned their phonological content as a sequence of

Place of articulation

Sounds which are made by a full or partial constriction of the vocal tract are called

consonants In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract, except for the h sound, which is pronounced without any stricture in the vocal tract. Examples are and pronou ...

. Consonants are pronounced in the vocal tract, usually in the mouth, and the location of this constriction affects the resulting sound. Because of the close connection between the position of the tongue and the resulting sound, the place of articulation is an important concept in many subdisciplines of phonetics. Sounds are partly categorized by the location of a constriction as well as the part of the body doing the constricting. For example, in English the words ''fought'' and ''thought'' are a

minimal pair In phonology, minimal pairs are pairs of words or phrases in a particular language, spoken or signed, that differ in only one phonological element, such as a phoneme, toneme or chroneme, and have distinct meanings. They are used to demonstrate t ...

differing only in the organ making the construction rather than the location of the construction. The "f" in ''fought'' is a labiodental articulation made with the bottom lip against the teeth. The "th" in ''thought'' is a linguodental articulation made with the tongue against the teeth. Constrictions made by the lips are called labials while those made with the tongue are called lingual. Constrictions made with the tongue can be made in several parts of the vocal tract, broadly classified into coronal, dorsal and radical places of articulation. Coronal articulations are made with the front of the tongue,

dorsal Dorsal (from Latin ''dorsum'' ‘back’) may refer to: * Dorsal (anatomy), an anatomical term of location referring to the back or upper side of an organism or parts of an organism * Dorsal, positioned on top of an aircraft's fuselage The fus ...

articulations are made with the back of the tongue, and

radical Radical (from Latin: ', root) may refer to: Politics and ideology Politics *Classical radicalism, the Radical Movement that began in late 18th century Britain and spread to continental Europe and Latin America in the 19th century *Radical politics ...

articulations are made in the

pharynx The pharynx (: pharynges) is the part of the throat behind the human mouth, mouth and nasal cavity, and above the esophagus and trachea (the tubes going down to the stomach and the lungs respectively). It is found in vertebrates and invertebrates ...

. These divisions are not sufficient for distinguishing and describing all speech sounds. For example, in English the sounds and are both coronal, but they are produced in different places of the mouth. To account for this, more detailed places of articulation are needed based upon the area of the mouth in which the constriction occurs.

Labial

Articulations involving the lips can be made in three different ways: with both lips (bilabial), with one lip and the teeth, so they have the lower lip as the active articulator and the upper teeth as the passive articulator (labiodental), and with the tongue and the upper lip (linguolabial). Depending on the definition used, some or all of these kinds of articulations may be categorized into the class of labial articulations.

Bilabial consonant In phonetics, a bilabial consonant is a labial consonant articulated with both lips. Frequency Bilabial consonants are very common across languages. Only around 0.7% of the world's languages lack bilabial consonants altogether, including Tli ...

s are made with both lips. In producing these sounds the lower lip moves farthest to meet the upper lip, which also moves down slightly, though in some cases the force from air moving through the aperture (opening between the lips) may cause the lips to separate faster than they can come together. Unlike most other articulations, both articulators are made from soft tissue, and so bilabial stops are more likely to be produced with incomplete closures than articulations involving hard surfaces like the teeth or palate. Bilabial stops are also unusual in that an articulator in the upper section of the vocal tract actively moves downward, as the upper lip shows some active downward movement.

Linguolabial consonant Linguolabials or apicolabials are consonants articulated by placing the tongue tip or blade against the upper lip, which is drawn downward to meet the tongue. They represent one extreme of a coronal articulatory continuum which extends from lin ...

s are made with the blade of the tongue approaching or contacting the upper lip. Like in bilabial articulations, the upper lip moves slightly towards the more active articulator. Articulations in this group do not have their own symbols in the International Phonetic Alphabet, rather, they are formed by combining an apical symbol with a diacritic implicitly placing them in the coronal category. They exist in a number of languages indigenous to

Vanuatu Vanuatu ( or ; ), officially the Republic of Vanuatu (; ), is an island country in Melanesia located in the South Pacific Ocean. The archipelago, which is of volcanic origin, is east of northern Australia, northeast of New Caledonia, east o ...

such as Tangoa.

Labiodental consonant In phonetics, labiodentals are consonants articulated with the lower lip and the upper teeth, such as and . In English, labiodentalized /s/, /z/ and /r/ are characteristic of some individuals; these may be written . Labiodental consonants in ...

s are made by the lower lip rising to the upper teeth. Labiodental consonants are most often

fricative A fricative is a consonant produced by forcing air through a narrow channel made by placing two articulators close together. These may be the lower lip against the upper teeth, in the case of ; the back of the tongue against the soft palate in ...

s while labiodental nasals are also typologically common. There is debate as to whether true labiodental

plosive In phonetics, a plosive, also known as an occlusive or simply a stop, is a pulmonic consonant in which the vocal tract is blocked so that all airflow ceases. The occlusion may be made with the tongue tip or blade (, ), tongue body (, ), lip ...

s occur in any natural language, though a number of languages are reported to have labiodental plosives including Zulu,

Tonga Tonga, officially the Kingdom of Tonga, is an island country in Polynesia, part of Oceania. The country has 171 islands, of which 45 are inhabited. Its total surface area is about , scattered over in the southern Pacific Ocean. accordin ...

, and Shubi.

Coronal

Coronal consonants are made with the tip or blade of the tongue and, because of the agility of the front of the tongue, represent a variety not only in place but in the posture of the tongue. The coronal places of articulation represent the areas of the mouth where the tongue contacts or makes a constriction, and include dental, alveolar, and post-alveolar locations. Tongue postures using the tip of the tongue can be

apical Apical means "pertaining to an apex". It may refer to: *Apical ancestor, refers to the last common ancestor of an entire group, such as a species (biology) or a clan (anthropology) *Apical (anatomy), an anatomical term of location for features loc ...

if using the top of the tongue tip,

laminal A laminal consonant is a phone (speech sound) produced by obstructing the air passage with the blade of the tongue, the flat top front surface just behind the tip of the tongue, in contact with upper lip, teeth, alveolar ridge, to possibly, ...

if made with the blade of the tongue, or sub-apical if the tongue tip is curled back and the bottom of the tongue is used. Coronals are unique as a group in that every

manner of articulation articulatory phonetics, the manner of articulation is the configuration and interaction of the articulators ( speech organs such as the tongue, lips, and palate) when making a speech sound. One parameter of manner is ''stricture,'' that is, h ...

is attested.

Australian languages The Indigenous languages of Australia number in the hundreds, the precise number being quite uncertain, although there is a range of estimates from a minimum of around 250 (using the technical definition of 'language' as non-mutually intellig ...

are well known for the large number of coronal contrasts exhibited within and across languages in the region.

Dental consonant A dental consonant is a consonant articulated with the tongue against the upper teeth, such as , . In some languages, dentals are distinguished from other groups, such as alveolar consonants, in which the tongue contacts the gum ridge. Denta ...

s are made with the tip or blade of the tongue and the upper teeth. They are divided into two groups based upon the part of the tongue used to produce them: apical dental consonants are produced with the tongue tip touching the teeth; interdental consonants are produced with the blade of the tongue as the tip of the tongue sticks out in front of the teeth. No language is known to use both contrastively though they may exist allophonically.

Alveolar consonant Alveolar consonants (; UK also ) are articulated with the tongue against or close to the superior alveolar ridge, which is called that because it contains the alveoli (the sockets) of the upper teeth. Alveolar consonants may be articulated wi ...

s are made with the tip or blade of the tongue at the alveolar ridge just behind the teeth and can similarly be apical or laminal. Crosslinguistically, dental consonants and alveolar consonants are frequently contrasted leading to a number of generalizations of crosslinguistic patterns. The different places of articulation tend to also be contrasted in the part of the tongue used to produce them: most languages with dental stops have laminal dentals, while languages with apical stops usually have apical stops. Languages rarely have two consonants in the same place with a contrast in laminality, though

Taa Trans Australia Airlines (TAA), renamed Australian Airlines in 1986, was one of the two major Australian domestic airlines between its inception in 1946 and its merger with Qantas in September 1992. As a result of the "COBRA" (or Common Bran ...

(ǃXóõ) is a counterexample to this pattern. If a language has only one of a dental stop or an alveolar stop, it will usually be laminal if it is a dental stop, and the stop will usually be apical if it is an alveolar stop, though for example Temne and Bulgarian do not follow this pattern. If a language has both an apical and laminal stop, then the laminal stop is more likely to be affricated like in Isoko, though Dahalo show the opposite pattern with alveolar stops being more affricated.

Retroflex consonant A retroflex () or cacuminal () consonant is a coronal consonant where the tongue has a flat, concave, or even curled shape, and is articulated between the alveolar ridge and the hard palate. They are sometimes referred to as cerebral consona ...

s have several different definitions depending on whether the position of the tongue or the position on the roof of the mouth is given prominence. In general, they represent a group of articulations in which the tip of the tongue is curled upwards to some degree. In this way, retroflex articulations can occur in several different locations on the roof of the mouth including alveolar, post-alveolar, and palatal regions. If the underside of the tongue tip makes contact with the roof of the mouth, it is sub-apical though apical post-alveolar sounds are also described as retroflex. Typical examples of sub-apical retroflex stops are commonly found in

Dravidian languages The Dravidian languages are a language family, family of languages spoken by 250 million people, primarily in South India, north-east Sri Lanka, and south-west Pakistan, with pockets elsewhere in South Asia. The most commonly spoken Dravidian l ...

, and in some languages indigenous to the southwest United States the contrastive difference between dental and alveolar stops is a slight retroflexion of the alveolar stop. Acoustically, retroflexion tends to affect the higher formants. Articulations taking place just behind the alveolar ridge, known as

post-alveolar consonant Postalveolar (post-alveolar) consonants are consonants articulated with the tongue near or touching the ''back'' of the alveolar ridge. Articulation is farther back in the mouth than the alveolar consonants, which are at the ridge itself, but no ...

s, have been referred to using a number of different terms. Apical post-alveolar consonants are often called retroflex, while laminal articulations are sometimes called palato-alveolar; in the Australianist literature, these laminal stops are often described as 'palatal' though they are produced further forward than the palate region typically described as palatal. Because of individual anatomical variation, the precise articulation of palato-alveolar stops (and coronals in general) can vary widely within a speech community.

Dorsal

Dorsal consonants are those consonants made using the tongue body rather than the tip or blade and are typically produced at the palate, velum or uvula.

Palatal consonants Palatals are consonants articulated with the body of the tongue raised against the hard palate (the middle part of the roof of the mouth). Consonants with the tip of the tongue curled back against the palate are called retroflex. Characteristi ...

are made using the tongue body against the hard palate on the roof of the mouth. They are frequently contrasted with velar or uvular consonants, though it is rare for a language to contrast all three simultaneously, with

Jaqaru Jaqaru or Jacaru is a language of the Aymaran family. It is also known as Jaqi and Aru. It is spoken in the districts of Tupe and Catahuasi in Yauyos Province, Lima Region, Peru. Most of the 2,000 ethnic Jaqaru have migrated to Lima. Kawki, a ...

as a possible example of a three-way contrast.

Velar consonants Velar consonants are consonants articulated with the back part of the tongue (the dorsum) against the soft palate, the back part of the roof of the mouth (also known as the "velum"). Since the velar region of the roof of the mouth is relatively ...

are made using the tongue body against the

velum Velum may refer to: Human anatomy * Superior medullary velum, anterior medullary velum or valve of Vieussens, white matter, in the brain, which stretches between the superior cerebellar peduncles ** Frenulum of superior medullary velum, a sligh ...

. They are incredibly common cross-linguistically; almost all languages have a velar stop. Because both velars and vowels are made using the tongue body, they are highly affected by

coarticulation Coarticulation in its general sense refers to a situation in which a conceptually isolated speech sound is influenced by, and becomes more like, a preceding or following speech sound. There are two types of coarticulation: ''anticipatory coarticul ...

with vowels and can be produced as far forward as the hard palate or as far back as the uvula. These variations are typically divided into front, central, and back velars in parallel with the vowel space. They can be hard to distinguish phonetically from palatal consonants, though are produced slightly behind the area of prototypical palatal consonants.

Uvular consonants Uvulars are consonants articulated with the back of the tongue against or near the uvula, that is, further back in the mouth than velar consonants. Uvulars may be stops, fricatives, nasals, trills, or approximants, though the IPA does not pro ...

are made by the tongue body contacting or approaching the uvula. They are rare, occurring in an estimated 19 percent of languages, and large regions of the Americas and Africa have no languages with uvular consonants. In languages with uvular consonants, stops are most frequent followed by

continuant In phonetics Phonetics is a branch of linguistics that studies how humans produce and perceive sounds or, in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech ...

s (including nasals).

Pharyngeal and laryngeal

Consonants made by constrictions of the throat are pharyngeals, and those made by a constriction in the larynx are laryngeal. Laryngeals are made using the vocal folds as the larynx is too far down the throat to reach with the tongue. Pharyngeals however are close enough to the mouth that parts of the tongue can reach them. Radical consonants either use the root of the tongue or the

epiglottis The epiglottis (: epiglottises or epiglottides) is a leaf-shaped flap in the throat that prevents food and water from entering the trachea and the lungs. It stays open during breathing, allowing air into the larynx. During swallowing, it closes ...

during production and are produced very far back in the vocal tract.

Pharyngeal consonant A pharyngeal consonant is a consonant that is articulated primarily in the pharynx. Some phoneticians distinguish upper pharyngeal consonants, or "high" pharyngeals, pronounced by retracting the root of the tongue in the mid to upper pharynx ...

s are made by retracting the root of the tongue far enough to almost touch the wall of the

. Due to production difficulties, only fricatives and approximants can be produced this way.

Epiglottal consonant A pharyngeal consonant is a consonant that is articulated primarily in the pharynx. Some phoneticians distinguish upper pharyngeal consonants, or "high" pharyngeals, pronounced by retracting the root of the tongue in the mid to upper pharynx ...

s are made with the epiglottis and the back wall of the pharynx. Epiglottal stops have been recorded in Dahalo. Voiced epiglottal consonants are not deemed possible due to the cavity between the

glottis The glottis (: glottises or glottides) is the opening between the vocal folds (the rima glottidis). The glottis is crucial in producing sound from the vocal folds. Etymology From Ancient Greek ''γλωττίς'' (glōttís), derived from ''γ ...

and epiglottis being too small to permit voicing. Glottal consonants are those produced using the vocal folds in the larynx. Because the vocal folds are the source of phonation and below the oro-nasal vocal tract, a number of glottal consonants are impossible such as a voiced glottal stop. Three glottal consonants are possible, a voiceless glottal stop and two glottal fricatives, and all are attested in natural languages.

Glottal stop The glottal stop or glottal plosive is a type of consonantal sound used in many Speech communication, spoken languages, produced by obstructing airflow in the vocal tract or, more precisely, the glottis. The symbol in the International Phonetic ...

s, produced by closing the

vocal folds In humans, the vocal cords, also known as vocal folds, are folds of throat tissues that are key in creating sounds through Speech, vocalization. The length of the vocal cords affects the pitch of voice, similar to a violin string. Open when brea ...

, are notably common in the world's languages. While many languages use them to demarcate phrase boundaries, some languages like

Arabic Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...

and Huatla Mazatec have them as contrastive phonemes. Additionally, glottal stops can be realized as laryngealization of the following vowel in this language. Glottal stops, especially between vowels, do usually not form a complete closure. True glottal stops normally occur only when they are

geminated In phonetics and phonology, gemination (; from Latin 'doubling', itself from '' gemini'' 'twins'), or consonant lengthening, is an articulation of a consonant for a longer period of time than that of a singleton consonant. It is distinct from ...

The larynx

The larynx, commonly known as the "voice box", is a cartilaginous structure in the

trachea The trachea (: tracheae or tracheas), also known as the windpipe, is a cartilaginous tube that connects the larynx to the bronchi of the lungs, allowing the passage of air, and so is present in almost all animals' lungs. The trachea extends from ...

responsible for

phonation The term phonation has slightly different meanings depending on the subfield of phonetics. Among some phoneticians, ''phonation'' is the process by which the vocal folds produce certain sounds through quasi-periodic vibration. This is the defi ...

. The vocal folds (chords) are held together so that they vibrate, or held apart so that they do not. The positions of the vocal folds are achieved by movement of the

arytenoid cartilage The arytenoid cartilages () are a pair of small three-sided pyramids which form part of the larynx. They are the site of attachment of the vocal cords. Each is pyramidal or ladle-shaped and has three surfaces, a base, and an apex. The arytenoid ...

s. The intrinsic laryngeal muscles are responsible for moving the arytenoid cartilages as well as modulating the tension of the vocal folds. If the vocal folds are not close or tense enough, they will either vibrate sporadically or not at all. If they vibrate sporadically it will result in either creaky or breathy voice, depending on the degree; if do not vibrate at all, the result will be

voicelessness In linguistics, voicelessness is the property of sounds being pronounced without the larynx vibrating. Phonologically, it is a type of phonation, which contrasts with other states of the larynx, but some object that the word phonation implies v ...

. In addition to correctly positioning the vocal folds, there must also be air flowing across them or they will not vibrate. The difference in pressure across the glottis required for voicing is estimated at 1 – 2 cm H₂O (98.0665 – 196.133 pascals). The pressure differential can fall below levels required for phonation either because of an increase in pressure above the glottis (superglottal pressure) or a decrease in pressure below the glottis (subglottal pressure). The subglottal pressure is maintained by the

respiratory muscles The muscles of respiration are the muscles that contribute to inhalation and exhalation, by aiding in the expansion and contraction of the thoracic cavity. The diaphragm and, to a lesser extent, the intercostal muscles drive respiration during ...

. Supraglottal pressure, with no constrictions or articulations, is equal to about

atmospheric pressure Atmospheric pressure, also known as air pressure or barometric pressure (after the barometer), is the pressure within the atmosphere of Earth. The standard atmosphere (symbol: atm) is a unit of pressure defined as , which is equivalent to 1,013. ...

. However, because articulations—especially consonants—represent constrictions of the airflow, the pressure in the cavity behind those constrictions can increase resulting in a higher supraglottal pressure.

Lexical access

According to the lexical access model two different stages of cognition are employed; thus, this concept is known as the two-stage theory of lexical access. The first stage, lexical selection, provides information about lexical items required to construct the functional-level representation. These items are retrieved according to their specific semantic and syntactic properties, but phonological forms are not yet made available at this stage. The second stage, retrieval of wordforms, provides information required for building the positional level representation.

Articulatory models

When producing speech, the articulators move through and contact particular locations in space resulting in changes to the acoustic signal. Some models of speech production take this as the basis for modeling articulation in a coordinate system that may be internal to the body (intrinsic) or external (extrinsic). Intrinsic coordinate systems model the movement of articulators as positions and angles of joints in the body. Intrinsic coordinate models of the jaw often use two to three degrees of freedom representing translation and rotation. These face issues with modeling the tongue which, unlike joints of the jaw and arms, is a

muscular hydrostat A muscular hydrostat is a biological structure found in animals. It is used to manipulate items (including food) or to move its host about and consists mainly of muscles with no skeletal support. It performs its hydraulic movement without fluid ...

—like an elephant trunk—which lacks joints. Because of the different physiological structures, movement paths of the jaw are relatively straight lines during speech and mastication, while movements of the tongue follow curves. Straight-line movements have been used to argue articulations as planned in extrinsic rather than intrinsic space, though extrinsic coordinate systems also include acoustic coordinate spaces, not just physical coordinate spaces. Models that assume movements are planned in extrinsic space run into an

inverse problem An inverse problem in science is the process of calculating from a set of observations the causal factors that produced them: for example, calculating an image in X-ray computed tomography, sound source reconstruction, source reconstruction in ac ...

of explaining the muscle and joint locations which produce the observed path or acoustic signal. The arm, for example, has seven degrees of freedom and 22 muscles, so multiple different joint and muscle configurations can lead to the same final position. For models of planning in extrinsic acoustic space, the same one-to-many mapping problem applies as well, with no unique mapping from physical or acoustic targets to the muscle movements required to achieve them. Concerns about the inverse problem may be exaggerated, however, as speech is a highly learned skill using neurological structures which evolved for the purpose. The equilibrium-point model proposes a resolution to the inverse problem by arguing that movement targets be represented as the position of the muscle pairs acting on a joint. Importantly, muscles are modeled as springs, and the target is the equilibrium point for the modeled spring-mass system. By using springs, the equilibrium point model can easily account for compensation and response when movements are disrupted. They are considered a coordinate model because they assume that these muscle positions are represented as points in space, equilibrium points, where the spring-like action of the muscles converges. Gestural approaches to speech production propose that articulations are represented as movement patterns rather than particular coordinates to hit. The minimal unit is a gesture that represents a group of "functionally equivalent articulatory movement patterns that are actively controlled with reference to a given speech-relevant goal (e.g., a bilabial closure)." These groups represent coordinative structures or "synergies" which view movements not as individual muscle movements but as task-dependent groupings of muscles which work together as a single unit. This reduces the degrees of freedom in articulation planning, a problem especially in intrinsic coordinate models, which allows for any movement that achieves the speech goal, rather than encoding the particular movements in the abstract representation. Coarticulation is well described by gestural models as the articulations at faster speech rates can be explained as composites of the independent gestures at slower speech rates.

Acoustics

Waveform spectrogram and transcription of wikipedia in praat

Speech sounds are created by the modification of an airstream which results in a sound wave. The modification is done by the articulators, with different places and manners of articulation producing different acoustic results. Because the posture of the vocal tract, not just the position of the tongue can affect the resulting sound, the

is important for describing the speech sound. The words ''tack'' and ''sack'' both begin with alveolar sounds in English, but differ in how far the tongue is from the alveolar ridge. This difference has large effects on the air stream and thus the sound that is produced. Similarly, the direction and source of the airstream can affect the sound. The most common airstream mechanism is pulmonic—using the lungs—but the glottis and tongue can also be used to produce airstreams.

Voicing and phonation types

A major distinction between speech sounds is whether they are voiced. Sounds are voiced when the vocal folds begin to vibrate in the process of phonation. Many sounds can be produced with or without phonation, though physical constraints may make phonation difficult or impossible for some articulations. When articulations are voiced, the main source of noise is the periodic vibration of the vocal folds. Articulations like voiceless plosives have no acoustic source and are noticeable by their silence, but other voiceless sounds like fricatives create their own acoustic source regardless of phonation. Phonation is controlled by the muscles of the larynx, and languages make use of more acoustic detail than binary voicing. During phonation, the vocal folds vibrate at a certain rate. This vibration results in a periodic acoustic waveform comprising a

fundamental frequency The fundamental frequency, often referred to simply as the ''fundamental'' (abbreviated as 0 or 1 ), is defined as the lowest frequency of a Periodic signal, periodic waveform. In music, the fundamental is the musical pitch (music), pitch of a n ...

and its harmonics. The fundamental frequency of the acoustic wave can be controlled by adjusting the muscles of the larynx, and listeners perceive this fundamental frequency as pitch. Languages use pitch manipulation to convey lexical information in tonal languages, and many languages use pitch to mark prosodic or pragmatic information. For the vocal folds to vibrate, they must be in the proper position and there must be air flowing through the glottis. Phonation types are modeled on a continuum of glottal states from completely open (voiceless) to completely closed (glottal stop). The optimal position for vibration, and the phonation type most used in speech, modal voice, exists in the middle of these two extremes. If the glottis is slightly wider, breathy voice occurs, while bringing the vocal folds closer together results in creaky voice. The normal phonation pattern used in typical speech is modal voice, where the vocal folds are held close together with moderate tension. The vocal folds vibrate as a single unit periodically and efficiently with a full glottal closure and no aspiration. If they are pulled farther apart, they do not vibrate and so produce voiceless phones. If they are held firmly together they produce a glottal stop. If the vocal folds are held slightly further apart than in modal voicing, they produce phonation types like breathy voice (or murmur) and whispery voice. The tension across the vocal ligaments (

vocal cords In humans, the vocal cords, also known as vocal folds, are folds of throat tissues that are key in creating sounds through Speech, vocalization. The length of the vocal cords affects the pitch of voice, similar to a violin string. Open when brea ...

) is less than in modal voicing allowing for air to flow more freely. Both breathy voice and whispery voice exist on a continuum loosely characterized as going from the more periodic waveform of breathy voice to the more noisy waveform of whispery voice. Acoustically, both tend to dampen the first formant with whispery voice showing more extreme deviations. Holding the vocal folds more tightly together results in a creaky voice. The tension across the vocal folds is less than in modal voice, but they are held tightly together resulting in only the ligaments of the vocal folds vibrating. The pulses are highly irregular, with low pitch and frequency amplitude. Some languages do not maintain a voicing distinction for some consonants, but all languages use voicing to some degree. For example, no language is known to have a phonemic voicing contrast for vowels with all known vowels canonically voiced. Other positions of the glottis, such as breathy and creaky voice, are used in a number of languages, like

Jalapa Mazatec Jalapa Mazatec is a Mazatecan language. An estimate from 1990 suggested it was spoken by 15,000 people, one-third of whom are monolingual, in 13 villages in the vicinity of the town of San Felipe Jalapa de Díaz in the Tuxtepec District of the ...

, to contrast

while in other languages, like English, they exist allophonically. There are several ways to determine if a segment is voiced or not, the simplest being to feel the larynx during speech and note when vibrations are felt. More precise measurements can be obtained through acoustic analysis of a spectrogram or spectral slice. In a spectrographic analysis, voiced segments show a voicing bar, a region of high acoustic energy, in the low frequencies of voiced segments. In examining a spectral splice, the acoustic spectrum at a given point in time a model of the vowel pronounced reverses the filtering of the mouth producing the spectrum of the glottis. A computational model of the unfiltered glottal signal is then fitted to the inverse filtered acoustic signal to determine the characteristics of the glottis. Visual analysis is also available using specialized medical equipment such as ultrasound and endoscopy.

Vowels

Vowels are broadly categorized by the area of the mouth in which they are produced, but because they are produced without a constriction in the vocal tract their precise description relies on measuring acoustic correlates of tongue position. The location of the tongue during vowel production changes the frequencies at which the cavity resonates, and it is these resonances—known as

formants In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmon ...

—which are measured and used to characterize vowels. Vowel height traditionally refers to the highest point of the tongue during articulation. The height parameter is divided into four primary levels: high (close), close-mid, open-mid, and low (open). Vowels whose height are in the middle are referred to as mid. Slightly opened close vowels and slightly closed open vowels are referred to as near-close and near-open respectively. The lowest vowels are not just articulated with a lowered tongue, but also by lowering the jaw. While the IPA implies that there are seven levels of vowel height, it is unlikely that a given language can minimally contrast all seven levels.

Chomsky Avram Noam Chomsky (born December 7, 1928) is an American professor and public intellectual known for his work in linguistics, political activism, and social criticism. Sometimes called "the father of modern linguistics", Chomsky is also a ...

and Halle suggest that there are only three levels, although four levels of vowel height seem to be needed to describe Danish and it is possible that some languages might even need five. Vowel backness is dividing into three levels: front, central and back. Languages usually do not minimally contrast more than two levels of vowel backness. Some languages claimed to have a three-way backness distinction include

Nimboran Nimboran (Nambrong, Namblong, Namlong) is a Papuan language of Nimboran District, Jayapura Regency, Indonesia spoken by mostly older adults. Younger generations have shifted to Papuan Malay Papuan Malay or Irian Malay is a Malay-based creole, ...

and Norwegian. In most languages, the lips during vowel production can be classified as either rounded or unrounded (spread), although other types of lip positions, such as compression and protrusion, have been described. Lip position is correlated with height and backness: front and low vowels tend to be unrounded whereas back and high vowels are usually rounded. Paired vowels on the IPA chart have the spread vowel on the left and the rounded vowel on the right. Together with the universal vowel features described above, some languages have additional features such as nasality,

length Length is a measure of distance. In the International System of Quantities, length is a quantity with Dimension (physical quantity), dimension distance. In most systems of measurement a Base unit (measurement), base unit for length is chosen, ...

and different types of phonation such as

voiceless In linguistics, voicelessness is the property of sounds being pronounced without the larynx vibrating. Phonologically, it is a type of phonation, which contrasts with other states of the larynx, but some object that the word phonation implies v ...

creaky {{Short pages monitor * * * * * * * * * * *

External links

*
Collection of phonetics resources
by the University of North Carolina
"A Little Encyclopedia of Phonetics"
by Peter Roach.
Pink Trombone
an interactive articulation simulator by Neil Thapen. {{Authority control Branches of linguistics + Articles containing video clips