The origin of speech differs from the

origin of language The origin of language, its relationship with human evolution, and its consequences have been subjects of study for centuries. Scholars wishing to study the origins of language draw inferences from evidence such as the fossil record, archaeolog ...

because language is not necessarily spoken; it could equally be

written Writing is the act of creating a persistent representation of language. A writing system includes a particular set of symbols called a ''script'', as well as the rules by which they encode a particular spoken language. Every written language ...

or signed. Speech is a fundamental aspect of human communication and plays a vital role in the everyday lives of humans. It allows them to convey thoughts, emotions, and ideas, and providing the ability to connect with others and shape collective reality. Many attempts have been made to explain scientifically how speech emerged in humans, although to date no theory has generated agreement. Non-human primates, like many other animals, have evolved specialized mechanisms for producing sounds for purposes of social communication. On the other hand, no monkey or ape uses its ''tongue'' for such purposes. The human species' unprecedented use of the tongue, lips and other moveable parts seems to place speech in a quite separate category, making its evolutionary emergence an intriguing theoretical challenge in the eyes of many scholars.

Modality-independence

The term ''

modality Modality may refer to: Humanities * Modality (theology), the organization and structure of the church, as distinct from sodality or parachurch organizations * Modality (music), in music, the subject concerning certain diatonic scales * Modalit ...

'' means the chosen representational format for encoding and transmitting information. A striking feature of language is that it is ''modality-independent.'' Should an impaired child be prevented from hearing or producing sound, its innate capacity to master a language may equally find expression in signing.

Sign language Sign languages (also known as signed languages) are languages that use the visual-manual modality to convey meaning, instead of spoken words. Sign languages are expressed through manual articulation in combination with #Non-manual elements, no ...

s of the deaf are independently invented and have all the major properties of spoken language except for the modality of transmission. From this it appears that the language centres of the human brain must have evolved to function optimally, irrespective of the selected modality. Expression of the Emotions Figure 18

Animal communication systems routinely combine visible with audible properties and effects, but none is modality-independent. For example, no vocally-impaired whale, dolphin, or songbird could express its song repertoire equally in visual display. Indeed, in the case of animal communication, message and modality are not capable of being disentangled. Whatever message is being conveyed stems from the intrinsic properties of the signal. Modality independence should not be confused with the ordinary phenomenon of

multimodality Multimodality is the application of multiple literacies within one medium. Multiple literacies or "modes" contribute to an audience's understanding of a composition. Everything from the placement of images to the organization of the content to ...

. Monkeys and apes rely on a repertoire of species-specific "gesture-calls" – emotionally-expressive vocalisations inseparable from the visual displays which accompany them. Humans also have species-specific gesture-calls – laughs, cries, sobs, etc. – together with involuntary gestures accompanying speech. Many animal displays are polymodal in that each appears designed to exploit multiple channels simultaneously. The human linguistic property of modality independence is conceptually distinct from polymodality. It allows the speaker to encode the informational content of a message in a single channel whilst switching between channels as necessary. Modern city-dwellers switch effortlessly between the spoken word and writing in its various forms – handwriting, typing,

email Electronic mail (usually shortened to email; alternatively hyphenated e-mail) is a method of transmitting and receiving Digital media, digital messages using electronics, electronic devices over a computer network. It was conceived in the ...

, etc. Whichever modality is chosen, it can reliably transmit the full message content without external assistance of any kind. When talking on the

telephone A telephone, colloquially referred to as a phone, is a telecommunications device that enables two or more users to conduct a conversation when they are too far apart to be easily heard directly. A telephone converts sound, typically and most ...

, for example, any accompanying facial or manual gestures, however natural to the speaker, are not strictly necessary. When typing or manually signing, conversely, there is no need to add sounds. In many

Australian Aboriginal culture Australian Aboriginal culture includes a number of practices and ceremonies centered on a belief in the Dreamtime and other mythology. Reverence and respect for the land and oral traditions are emphasised. The words "law" and "lore", the latter ...

s, a section of the population – perhaps women observing a ritual

taboo A taboo is a social group's ban, prohibition or avoidance of something (usually an utterance or behavior) based on the group's sense that it is excessively repulsive, offensive, sacred or allowed only for certain people.''Encyclopædia Britannica ...

– traditionally restrict themselves for extended periods to a silent (manually-signed) version of their language. Then, when released from the taboo, these same individuals resume narrating stories by the fireside or in the dark, switching to pure sound without sacrifice of informational content.

Evolution of the speech organs

Speaking is the default modality for language in all cultures. Humans' first recourse is to encode their thoughts in sound – a method which depends on sophisticated capacities for controlling the lips, tongue and other components of the vocal apparatus. The speech organs evolved in the first instance not for speech but for more basic bodily functions such as feeding and breathing. Nonhuman primates have broadly similar organs, but with different neural controls. Non-human apes use their highly-flexible, maneuverable tongues for eating but not for vocalizing. When an ape is not eating, fine motor control over its tongue is deactivated. ''Either'' it is performing gymnastics with its tongue ''or'' it is vocalising; it cannot perform both activities simultaneously. Since this applies to

mammal A mammal () is a vertebrate animal of the Class (biology), class Mammalia (). Mammals are characterised by the presence of milk-producing mammary glands for feeding their young, a broad neocortex region of the brain, fur or hair, and three ...

s in general, ''Homo sapiens'' are exceptional in harnessing mechanisms designed for

respiration Respiration may refer to: Biology * Cellular respiration, the process in which nutrients are converted into useful energy in a cell ** Anaerobic respiration, cellular respiration without oxygen ** Maintenance respiration, the amount of cellul ...

and

ingestion Ingestion is the consumption of a substance by an organism. In animals, it normally is accomplished by taking in a substance through the mouth into the gastrointestinal tract, such as through eating or drinking. In single-celled organisms, inge ...

for the radically different requirements of articulate speech.

Tongue

The word "language" derives from the Latin ''lingua,'' "tongue".

Phoneticians Phonetics is a branch of linguistics that studies how humans produce and perceive sounds or, in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians ...

agree that the tongue is the most important speech articulator, followed by the lips. A

natural language A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...

can be viewed as a particular way of using the tongue to express thought. The human tongue has an unusual shape. In most mammals, it is a long, flat structure contained largely within the mouth. It is attached at the rear to the

hyoid bone The hyoid-bone (lingual-bone or tongue-bone) () is a horseshoe-shaped bone situated in the anterior midline of the neck between the chin and the thyroid-cartilage. At rest, it lies between the base of the mandible and the third cervical verte ...

, situated below the oral level in the

pharynx The pharynx (: pharynges) is the part of the throat behind the human mouth, mouth and nasal cavity, and above the esophagus and trachea (the tubes going down to the stomach and the lungs respectively). It is found in vertebrates and invertebrates ...

. In humans, the tongue has an almost circular

sagittal The sagittal plane (; also known as the longitudinal plane) is an anatomical plane that divides the body into right and left sections. It is perpendicular to the transverse plane, transverse and coronal plane, coronal planes. The plane may be in ...

(midline) contour, much of it lying vertically down an extended

, where it is attached to a hyoid bone in a lowered position. Partly as a result of this, the horizontal (inside-the-mouth) and vertical (down-the-throat) tubes forming the supralaryngeal vocal tract (SVT) are almost equal in length (whereas in other species, the vertical section is shorter). As humans move their jaws up and down, the tongue can vary the cross-sectional area of each tube independently by about 10:1, altering formant frequencies accordingly. That the tubes are joined at a right angle permits pronunciation of the

vowel A vowel is a speech sound pronounced without any stricture in the vocal tract, forming the nucleus of a syllable. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness a ...

s '' and '', which nonhuman primates cannot do. Even when not performed particularly accurately, in humans the articulatory gymnastics needed to distinguish these vowels yield consistent, distinctive acoustic results, illustrating the quantal nature of human speech sounds. It may not be coincidental that '' and '' are the most common vowels in the world's languages.Ladefoged, P. and Maddieson, I. 1996. ''The Sounds of the World's Languages.'' Oxford: Blackwell. Human tongues are a lot shorter and thinner than other mammals and are composed of a large number of muscles, which helps shape a variety of sounds within the oral cavity. The diversity of sound production is also increased with the human’s ability to open and close the airway, allowing varying amounts of air to exit through the nose. The fine motor movements associated with the tongue and the airway, make humans more capable of producing a wide range of intricate shapes in order to produce sounds at different rates and intensities.

Lips

In humans, the lips are important for the production of stops and

fricatives A fricative is a consonant produced by forcing air through a narrow channel made by placing two articulators close together. These may be the lower lip against the upper teeth, in the case of ; the back of the tongue against the soft palate in t ...

, in addition to

s. Nothing, however, suggests that the lips evolved for those reasons. During primate evolution, a shift from

nocturnal Nocturnality is a ethology, behavior in some non-human animals characterized by being active during the night and sleeping during the day. The common adjective is "nocturnal", versus diurnality, diurnal meaning the opposite. Nocturnal creatur ...

to diurnal activity in

tarsier Tarsiers ( ) are haplorhine primates of the family Tarsiidae, which is the lone extant family within the infraorder Tarsiiformes. Although the group was prehistorically more globally widespread, all of the existing species are restricted to M ...

s, monkeys and apes (the haplorhines) brought with it an increased reliance on vision at the expense of

olfaction The sense of smell, or olfaction, is the special sense through which smells (or odors) are perceived. The sense of smell has many functions, including detecting desirable foods, hazards, and pheromones, and plays a role in taste. In humans, ...

. As a result, the snout became reduced and the

rhinarium The rhinarium (Neo-Latin, "belonging to the nose"; : rhinaria) is the furless skin surface surrounding the external openings of the nostrils in many mammals. Commonly it is referred to as the tip of the ''snout'', and breeders of cats and dogs s ...

or "wet nose" was lost. The muscles of the face and lips consequently became less constrained, enabling their co-option to serve purposes of facial expression. The lips also became thicker, and the oral cavity hidden behind became smaller. Hence, according to Ann MacLarnon, "the evolution of mobile, muscular lips, so important to human speech, was the exaptive result of the evolution of diurnality and visual communication in the common ancestor of haplorhines". It is unclear whether human lips have undergone a more recent adaptation to the specific requirements of speech.

Respiratory control

Compared with nonhuman primates, humans have significantly enhanced control of breathing, enabling exhalations to be extended and inhalations shortened as we speak. Whilst we are speaking, intercostal and interior abdominal muscles are recruited to expand the

thorax The thorax (: thoraces or thoraxes) or chest is a part of the anatomy of mammals and other tetrapod animals located between the neck and the abdomen. In insects, crustaceans, and the extinct trilobites, the thorax is one of the three main di ...

and draw air into the lungs, and subsequently to control the release of air as the lungs deflate. The muscles concerned are markedly more

innervated A nerve is an enclosed, cable-like bundle of nerve fibers (called axons). Nerves have historically been considered the basic units of the peripheral nervous system. A nerve provides a common pathway for the electrochemical nerve impulses called ...

in humans than in nonhuman primates. Evidence from fossil hominins suggests that the necessary enlargement of the

vertebral canal In human anatomy, the spinal canal, vertebral canal or spinal cavity is an elongated body cavity enclosed within the dorsal bony arches of the vertebral column, which contains the spinal cord, spinal roots and dorsal root ganglia. It is a proc ...

, and therefore

spinal cord The spinal cord is a long, thin, tubular structure made up of nervous tissue that extends from the medulla oblongata in the lower brainstem to the lumbar region of the vertebral column (backbone) of vertebrate animals. The center of the spinal c ...

dimensions, may not have occurred in ''

Australopithecus ''Australopithecus'' (, ; or (, ) is a genus of early hominins that existed in Africa during the Pliocene and Early Pleistocene. The genera ''Homo'' (which includes modern humans), ''Paranthropus'', and ''Kenyanthropus'' evolved from some ''Aus ...

'' or ''

Homo erectus ''Homo erectus'' ( ) is an extinction, extinct species of Homo, archaic human from the Pleistocene, spanning nearly 2 million years. It is the first human species to evolve a humanlike body plan and human gait, gait, to early expansions of h ...

'' but was present in the

Neanderthal Neanderthals ( ; ''Homo neanderthalensis'' or sometimes ''H. sapiens neanderthalensis'') are an extinction, extinct group of archaic humans who inhabited Europe and Western and Central Asia during the Middle Pleistocene, Middle to Late Plei ...

s and early modern humans.

Larynx

The

larynx The larynx (), commonly called the voice box, is an organ (anatomy), organ in the top of the neck involved in breathing, producing sound and protecting the trachea against food aspiration. The opening of larynx into pharynx known as the laryngeal ...

or voice box is an organ in the neck housing the

vocal folds In humans, the vocal cords, also known as vocal folds, are folds of throat tissues that are key in creating sounds through Speech, vocalization. The length of the vocal cords affects the pitch of voice, similar to a violin string. Open when brea ...

, which are responsible for

phonation The term phonation has slightly different meanings depending on the subfield of phonetics. Among some phoneticians, ''phonation'' is the process by which the vocal folds produce certain sounds through quasi-periodic vibration. This is the defi ...

. In humans, the larynx is ''descended,'' it is positioned lower than in other primates. This is because the evolution of humans to an upright position shifted the head directly above the spinal cord, forcing everything else downward. The repositioning of the larynx resulted in a longer cavity called the pharynx, which is responsible for increasing the range and clarity of the sound being produced. Other primates have almost no pharynx; therefore, their vocal power is significantly lower. Humans are not unique in this respect: goats, dogs, pigs and tamarins lower the larynx temporarily, to emit loud calls. Several deer species have a permanently lowered larynx, which may be lowered still further by males during their roaring displays. Lions, jaguars, cheetahs and domestic cats also do this. However, laryngeal descent in nonhumans (according to

Philip Lieberman Philip Lieberman (October 25, 1934 – July 12, 2022) was a cognitive scientist at Brown University, Providence, Rhode Island, United States. Originally trained in phonetics, he wrote a dissertation on intonation. His career focused on to ...

) is not accompanied by descent of the hyoid; hence the tongue remains horizontal in the oral cavity, preventing it from acting as a pharyngeal articulator. Despite all this, scholars remain divided as to how "special" the human vocal tract really is. It has been shown that the larynx does descend to some extent during development in chimpanzees, followed by hyoidal descent. As against this, Philip Lieberman points out that only humans have evolved permanent and substantial laryngeal descent in association with hyoidal descent, resulting in a curved tongue and two-tube vocal tract with 1:1 proportions. Uniquely in the human case, simple contact between the

epiglottis The epiglottis (: epiglottises or epiglottides) is a leaf-shaped flap in the throat that prevents food and water from entering the trachea and the lungs. It stays open during breathing, allowing air into the larynx. During swallowing, it closes ...

and velum is no longer possible, disrupting the normal mammalian separation of the respiratory and digestive tracts during swallowing. Since this entails substantial costs – increasing the risk of choking whilst swallowing food – we are forced to ask what benefits might have outweighed those costs. Some claim the clear benefit must have been speech, but other contest this. One objection is that humans are in fact not seriously at risk of choking on food: medical statistics indicate that accidents of this kind are extremely rare. Another objection is that in the view of most scholars, speech as we know it emerged relatively late in human evolution, roughly contemporaneously with the emergence of ''Homo sapiens.'' A development as complex as the reconfiguration of the human vocal tract would have required much more time, implying an early date of origin. This discrepancy in timescales undermines the idea that human vocal flexibility was initially driven by selection pressures for speech. At least one orangutan has demonstrated the ability to control the voice box.

The size exaggeration hypothesis

To lower the larynx is to increase the length of the vocal tract, in turn lowering

formant In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmo ...

frequencies so that the voice sounds "deeper" – giving an impression of greater size. John Ohala argued that the function of the lowered larynx in humans, especially males, is probably to enhance threat displays rather than speech itself. Ohala pointed out that if the lowered larynx were an adaptation for speech, we would expect adult human males to be better adapted in this respect than adult females, whose larynx is considerably less low. In fact, females invariably outperform males in verbal tests, falsifying this whole line of reasoning. William Tecumseh Fitch likewise argues that this was the original selective advantage of laryngeal lowering in humans. Although, according to Fitch, the initial lowering of the larynx in humans had nothing to do with speech, the increased range of possible formant patterns was subsequently co-opted for speech. Size exaggeration remains the sole function of the extreme laryngeal descent observed in male deer. Consistent with the size exaggeration hypothesis, a second descent of the larynx occurs at puberty in humans, although only in males. In response to the objection that the larynx is descended in human females, Fitch suggests that mothers vocalising to protect their infants would also have benefited from this ability.

Neanderthal speech

Most specialists credit the Neanderthals with speech abilities not radically different from those of modern ''Homo sapiens''. An indirect line of argument is that their

tool A tool is an Physical object, object that can extend an individual's ability to modify features of the surrounding environment or help them accomplish a particular task. Although many Tool use by animals, animals use simple tools, only human bei ...

making and hunting tactics would have been difficult to learn or execute without some kind of speech. A recent extraction of

DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...

from Neanderthal bones indicates that Neanderthals had the same version of the

FOXP2 Forkhead box protein P2 (FOXP2) is a protein that, in humans, is encoded by the ''FOXP2'' gene. FOXP2 is a member of the forkhead box family of transcription factors, proteins that Regulation of gene expression, regulate gene expression by DNA- ...

gene as modern humans. This gene, mistakenly described as the "grammar gene", plays a role in controlling the orofacial movements which (in modern humans) are involved in speech. During the 1970s, it was widely believed that the Neanderthals lacked modern speech capacities. It was claimed that they possessed a hyoid bone so high up in the vocal tract as to preclude the possibility of producing certain vowel sounds. The hyoid bone is present in many mammals. It allows a wide range of tongue, pharyngeal and laryngeal movements by bracing these structures alongside each other in order to produce variation. It is now realised that its lowered position is not unique to ''Homo sapiens'', whilst its relevance to vocal flexibility may have been overstated: although men have a lower larynx, they do not produce a wider range of sounds than women or two-year-old babies. There is no evidence that the larynx position of the Neanderthals impeded the range of vowel sounds they could produce. The discovery of a modern-looking hyoid bone of a Neanderthal man in the

Kebara Cave Kebara Cave (, ) is a limestone cave locality in Wadi Kebara, situated at above sea level on the western escarpment of the Carmel Range, in the Ramat HaNadiv preserve of Zichron Yaakov. History The cave was inhabited between 60,000 and 48,0 ...

Israel Israel, officially the State of Israel, is a country in West Asia. It Borders of Israel, shares borders with Lebanon to the north, Syria to the north-east, Jordan to the east, Egypt to the south-west, and the Mediterranean Sea to the west. Isr ...

led its discoverers to argue that the Neanderthals had a descended

, and thus human-like

speech Speech is the use of the human voice as a medium for language. Spoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon. There are many different intentional speech acts, suc ...

capabilities. However, other researchers have claimed that the

morphology Morphology, from the Greek and meaning "study of shape", may refer to: Disciplines *Morphology (archaeology), study of the shapes or forms of artifacts *Morphology (astronomy), study of the shape of astronomical objects such as nebulae, galaxies, ...

of the hyoid is not indicative of the larynx's position. It is necessary to take into consideration the

skull base The base of skull, also known as the cranial base or the cranial floor, is the most Anatomical terms of location#Superior and inferior, inferior area of the human skull, skull. It is composed of the endocranium and the lower parts of the Calvaria ...

, the

mandible In jawed vertebrates, the mandible (from the Latin ''mandibula'', 'for chewing'), lower jaw, or jawbone is a bone that makes up the lowerand typically more mobilecomponent of the mouth (the upper jaw being known as the maxilla). The jawbone i ...

, the

cervical vertebrae In tetrapods, cervical vertebrae (: vertebra) are the vertebrae of the neck, immediately below the skull. Truncal vertebrae (divided into thoracic and lumbar vertebrae in mammals) lie caudal (toward the tail) of cervical vertebrae. In saurop ...

and a cranial reference plane. The morphology of the outer and

middle ear The middle ear is the portion of the ear medial to the eardrum, and distal to the oval window of the cochlea (of the inner ear). The mammalian middle ear contains three ossicles (malleus, incus, and stapes), which transfer the vibrations ...

Middle Pleistocene The Chibanian, more widely known as the Middle Pleistocene (its previous informal name), is an Age (geology), age in the international geologic timescale or a Stage (stratigraphy), stage in chronostratigraphy, being a division of the Pleistocen ...

hominins from Atapuerca, Spain, believed to be proto-Neanderthal, suggests they had an auditory sensitivity similar to modern humans and very different from chimpanzees. They were probably able to differentiate between many different speech sounds.

Hypoglossal canal

The

hypoglossal nerve The hypoglossal nerve, also known as the twelfth cranial nerve, cranial nerve XII, or simply CN XII, is a cranial nerve that innervates all the extrinsic and intrinsic muscles of the tongue except for the palatoglossus, which is innervated b ...

plays an important role in controlling movements of the tongue. In 1998, a research team used the size of the hypoglossal canal in the base of fossil skulls in an attempt to estimate the relative number of

nerve fibres An axon (from Greek ἄξων ''áxōn'', axis) or nerve fiber (or nerve fibre: see spelling differences) is a long, slender projection of a nerve cell, or neuron, in vertebrates, that typically conducts electrical impulses known as action pote ...

, claiming on this basis that Middle Pleistocene hominins and Neanderthals had more fine-tuned tongue control than either

Australopithecine The australopithecines (), formally Australopithecina or Hominina, are generally any species in the related genera of ''Australopithecus'' and ''Paranthropus''. It may also include members of '' Kenyanthropus'', ''Ardipithecus'', and '' Praeant ...

s or apes. Subsequently, however, it was demonstrated that hypoglossal canal size and nerve sizes are not correlated, and it is now accepted that such evidence is uninformative about the timing of human speech evolution.

Distinctive features theory

According to one influential school, the human vocal apparatus is intrinsically digital on the model of a keyboard or digital computer (see below). Nothing about a chimpanzee's vocal apparatus suggests a digital keyboard, notwithstanding the anatomical and physiological similarities. This poses the question as to when and how, during the course of human evolution, the transition from analog to digital structure and function occurred. The human supralaryngeal tract is said to be digital in the sense that it is an arrangement of moveable toggles or switches, each of which, at any one time, must be in one state or another. The vocal cords, for example, are either vibrating (producing a sound) or not vibrating (in silent mode). By virtue of simple physics, the corresponding

distinctive feature In linguistics, a distinctive feature is the most basic unit of phonology, phonological structure that distinguishes one Phone (phonetics), sound from another within a language. For example, the feature

Voice (phonetics), voice The human voice consists of sound made by a human being using the vocal tract, including talking, singing, laughing, crying, screaming, shouting, humming or yelling. The human voice frequency is specifically a part of human sound productio ...

''distinguishes ...

– in this case, "voicing" – cannot be somewhere in between. The options are limited to "off" and "on". Equally digital is the feature known as "

nasalisation In phonetics, nasalization (or nasalisation in British English) is the production of a sound while the velum is lowered, so that some air escapes through the nose during the production of the sound by the mouth. An archetypal nasal sound is . ...

". At any given moment the

soft palate The soft palate (also known as the velum, palatal velum, or muscular palate) is, in mammals, the soft biological tissue, tissue constituting the back of the roof of the mouth. The soft palate is part of the palate of the mouth; the other part is ...

or velum either allows or does not allow sound to resonate in the nasal chamber. In the case of lip and tongue positions, more than two digital states may be allowed. The theory that speech sounds are composite entities constituted by complexes of binary phonetic features was first advanced in 1938 by the Russian linguist

Roman Jakobson Roman Osipovich Jakobson (, ; 18 July 1982) was a Russian linguist and literary theorist. A pioneer of structural linguistics, Jakobson was one of the most celebrated and influential linguists of the twentieth century. With Nikolai Trubetzk ...

. A prominent early supporter of this approach was

Noam Chomsky Avram Noam Chomsky (born December 7, 1928) is an American professor and public intellectual known for his work in linguistics, political activism, and social criticism. Sometimes called "the father of modern linguistics", Chomsky is also a ...

, who went on to extend it from phonology to language more generally, in particular to the study of

syntax In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...

and

semantics Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...

. In his 1965 book, ''

Aspects of the Theory of Syntax ''Aspects of the Theory of Syntax'' (known in linguistic circles simply as ''Aspects'') is a book on linguistics written by American linguist Noam Chomsky, first published in 1965. In ''Aspects'', Chomsky presented a deeper, more extensive reformu ...

,'' Chomsky treated semantic concepts as combinations of binary-digital atomic elements explicitly on the model of distinctive features theory. The lexical item "bachelor", on this basis, would be expressed as Human Male Married Supporters of this approach view the vowels and consonants recognised by speakers of a particular language or

dialect A dialect is a Variety (linguistics), variety of language spoken by a particular group of people. This may include dominant and standard language, standardized varieties as well as Vernacular language, vernacular, unwritten, or non-standardize ...

at a particular time as cultural entities of little scientific interest. From a natural science standpoint, the units which matter are those common to ''Homo sapiens'' by virtue of biological nature. By combining the atomic elements or "features" with which all humans are innately equipped, anyone may in principle generate the entire range of vowels and consonants to be found in any of the world's languages, whether past, present or future. The distinctive features are in this sense atomic components of a universal language.

Criticism

In recent years, the notion of an innate "universal grammar" underlying phonological variation has been called into question. In a

monograph A monograph is generally a long-form work on one (usually scholarly) subject, or one aspect of a subject, typically created by a single author or artist (or, sometimes, by two or more authors). Traditionally it is in written form and published a ...

on speech sounds, ''

The Sounds of the World's Languages ''The Sounds of the World's Languages'', sometimes abbreviated ''SOWL'', is a 1996 book by Peter Ladefoged and Ian Maddieson which documents a global survey of the sound patterns of natural languages. Drawing from the authors' own fieldwork and e ...

,''

Peter Ladefoged Peter Nielsen Ladefoged ( , ; 17 September 1925 – 24 January 2006) was a British linguist and phonetician. He was Professor of Phonetics at University of California, Los Angeles (UCLA), where he taught from 1962 to 1991. His book '' A Course ...

and

Ian Maddieson Ian Maddieson (1 September 1942 – 2 February 2025) was a British-American Linguistics, linguist and professor emeritus of linguistics at the University of New Mexico. He is best known for his work in phonetics and phonological Linguistic typolo ...

,found virtually no basis for the postulation of some small number of fixed, discrete, universal phonetic features. Examining 305 languages, for example, they encountered vowels that were positioned basically everywhere along the articulatory and acoustic continuum. Ladefoged concluded that phonological features are not determined by human nature: "Phonological features are best regarded as artifacts that linguists have devised in order to describe linguistic systems".

Self-organisation theory

Self-organisation Self-organization, also called spontaneous order in the social sciences, is a process where some form of overall order and disorder, order arises from local interactions between parts of an initially disordered system. The process can be spont ...

characterises systems where macroscopic structures are spontaneously formed out of local interactions between the many components of the system. In self-organised systems, global organisational properties are not to be found at the local level. In colloquial terms, self-organisation is roughly captured by the idea of "bottom-up" (as opposed to "top-down") organisation. Examples of self-organised systems range from ice crystals to galaxy spirals in the inorganic world. According to many phoneticians, the sounds of language arrange and re-arrange themselves through self-organisation. Speech sounds have both perceptual (how one hears them) and articulatory (how one produces them) properties, all with continuous values. Speakers tend to minimise effort, favouring ease of articulation over clarity. Listeners do the opposite, favouring sounds that are easy to distinguish even if difficult to pronounce. Since speakers and listeners are constantly switching roles, the syllable systems actually found in the world's languages turn out to be a compromise between acoustic distinctiveness on the one hand, and articulatory ease on the other. Agent-based computer models take the perspective of self-organisation at the level of the speech community or population. The two main paradigms are (1) the iterated learning model and (2) the language game model. Iterated learning focuses on transmission from generation to generation, typically with just one agent in each generation. In the language game model, a whole population of agents simultaneously produce, perceive and learn language, inventing novel forms when the need arises. Several models have shown how relatively simple peer-to-peer vocal interactions, such as imitation, can spontaneously self-organise a system of sounds shared by the whole population, and different in different populations. For example, models elaborated by Berrah et al. (1996) and de Boer (2000), and recently reformulated using Bayesian theory, showed how a group of individuals playing imitation games can self-organise repertoires of vowel sounds which share substantial properties with human vowel systems. For example, in de Boer's model, initially vowels are generated randomly, but agents learn from each other as they interact repeatedly over time. Agent A chooses a vowel from her repertoire and produces it, inevitably with some noise. Agent B hears this vowel and chooses the closest equivalent from her own repertoire. To check whether this truly matches the original, B produces the vowel ''she thinks she has heard'', whereupon A refers once again to her own repertoire to find the closest equivalent. If this matches the one she initially selected, the game is successful, otherwise, it has failed. "Through repeated interactions", according to de Boer, "vowel systems emerge that are very much like the ones found in human languages". In a different model, the phonetician Björn Lindblom was able to predict, on self-organisational grounds, the favoured choices of vowel systems ranging from three to nine vowels on the basis of a principle of optimal perceptual differentiation. Further models studied the role of self-organisation in the origins of phonemic coding and combinatoriality, which is the existence of

phoneme A phoneme () is any set of similar Phone (phonetics), speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible Phonetics, phonetic unit—that helps distinguish one word fr ...

s and their systematic reuse to build structured syllables. Pierre-Yves Oudeyer developed models which showed that basic neural equipment for adaptive holistic vocal imitation, coupling directly motor and perceptual representations in the brain, can generate spontaneously shared combinatorial systems of vocalisations, including phonotactic patterns, in a society of babbling individuals. These models also characterised how morphological and physiological innate constraints can interact with these self-organised mechanisms to account for both the formation of statistical regularities and diversity in vocalisation systems.

Gestural theory

The gestural theory states that speech was a relatively late development, evolving by degrees from a system that was originally gestural. Human ancestors were unable to control their vocalisation at the time when gestures were used to communicate; however, as they slowly began to control their vocalisations, spoken language began to evolve. Three types of evidence support this theory: # Gestural language and vocal language depend on similar neural systems. The regions on the

cortex Cortex or cortical may refer to: Biology * Cortex (anatomy), the outermost layer of an organ ** Cerebral cortex, the outer layer of the vertebrate cerebrum, part of which is the ''forebrain'' *** Motor cortex, the regions of the cerebral cortex i ...

that are responsible for mouth and hand movements border each other. # Nonhuman

primates Primates is an order of mammals, which is further divided into the strepsirrhines, which include lemurs, galagos, and lorisids; and the haplorhines, which include tarsiers and simians ( monkeys and apes). Primates arose 74–63 ...

minimise vocal signals in favour of manual, facial and other visible gestures in order to express simple concepts and communicative intentions in the wild. Some of these gestures resemble those of humans, such as the "begging posture", with the hands stretched out, which humans share with chimpanzees. #

Mirror Neurons A mirror neuron is a neuron that fires both when an animal acts and when the animal observes the same action performed by another. Thus, the neuron "mirrors" the behavior of the other, as though the observer were itself acting. Mirror neurons a ...

Research has found strong support for the idea that spoken language and signing depend on similar neural structures. Patients who used sign language, and who suffered from a left-

hemisphere Hemisphere may refer to: In geometry * Hemisphere (geometry), a half of a sphere As half of Earth or any spherical astronomical object * A hemisphere of Earth ** Northern Hemisphere ** Southern Hemisphere ** Eastern Hemisphere ** Western Hemi ...

lesion A lesion is any damage or abnormal change in the tissue of an organism, usually caused by injury or diseases. The term ''Lesion'' is derived from the Latin meaning "injury". Lesions may occur in both plants and animals. Types There is no de ...

, showed the same disorders with their sign language as vocal patients did with their oral language. Other researchers found that the same left-hemisphere brain regions were active during sign language as during the use of vocal or written language. Humans spontaneously use hand and facial gestures when formulating ideas to be conveyed in speech. There are also, of course, many

sign languages Sign languages (also known as signed languages) are languages that use the visual-manual modality to convey meaning, instead of spoken words. Sign languages are expressed through manual articulation in combination with non-manual markers. Sig ...

in existence, commonly associated with

deaf Deafness has varying definitions in cultural and medical contexts. In medical contexts, the meaning of deafness is hearing loss that precludes a person from understanding spoken language, an audiological condition. In this context it is written ...

communities; as noted above, these are equal in complexity, sophistication, and expressive power, to any oral language. The main difference is that the "phonemes" are produced on the outside of the body, articulated with hands, body, and facial expression, rather than inside the body articulated with tongue, teeth, lips, and breathing. Many psychologists and scientists have looked into the mirror system in the brain to answer this theory as well as other behavioural theories. Evidence to support mirror neurons as a factor in the evolution of speech includes mirror neurons in primates, the success of teaching apes to communicate gesturally, and pointing/gesturing to teach young children language. Fogassi and Ferrari (2014) monitored motor cortex activity in monkeys, specifically area F5 in the Broca’s area, where mirror neurons are located. They observed changes in electrical activity in this area when the monkey executed or observed different hand actions performed by someone else. Broca’s area is a region in the frontal lobe responsible for language production and processing. The discovery of mirror neurons in this region, which fire when an action is done or observed specifically with the hand, strongly supports the belief that communication was once accomplished with gestures. The same is true when teaching young children language. When one points at a specific object or location, mirror neurons in the child fire as though they were doing the action, which results in long-term learning

Criticism

Critics note that for mammals in general, sound turns out to be the best medium in which to encode information for transmission over distances at speed. Given the probability that this applied also to early humans, it is hard to see why they should have abandoned this efficient method in favour of more costly and cumbersome systems of visual gesturing – only to return to sound at a later stage. By way of explanation, it has been proposed that at a relatively late stage in human evolution, hands became so much in demand for making and using tools that the competing demands of manual gesturing became a hindrance. The transition to spoken language is said to have occurred only at that point. Since humans throughout evolution have been making and using tools, however, most scholars remain unconvinced by this argument. (For a different approach to this issue – one setting out from considerations of signal reliability and trust – see "from pantomime to speech" below).

Possible semi-aquatic adaptations

Recent insights in human evolution – more specifically, human Pleistocene littoral evolution – may help understand how human speech evolved. One controversial suggestion is that certain pre-adaptations for spoken language evolved during a time when ancestral hominins lived close to river banks and lake shores rich in fatty acids and other brain-specific nutrients. Occasional wading or swimming may also have led to enhanced breath-control ( breath-hold diving). Independent lines of evidence suggest that "archaic" ''

Homo ''Homo'' () is a genus of great ape (family Hominidae) that emerged from the genus ''Australopithecus'' and encompasses only a single extant species, ''Homo sapiens'' (modern humans), along with a number of extinct species (collectively called ...

'' spread intercontinentally along the

Indian Ocean The Indian Ocean is the third-largest of the world's five oceanic divisions, covering or approximately 20% of the water area of Earth#Surface, Earth's surface. It is bounded by Asia to the north, Africa to the west and Australia (continent), ...

shores (they even reached overseas islands such as

Flores Flores is one of the Lesser Sunda Islands, a group of islands in the eastern half of Indonesia. Administratively, it forms the largest island in the East Nusa Tenggara Province. The area is 14,250 km2. Including Komodo and Rinca islands ...

) where they regularly dived for

littoral The littoral zone, also called litoral or nearshore, is the part of a sea, lake, or river that is close to the shore. In coastal ecology, the littoral zone includes the intertidal zone extending from the high water mark (which is rarely i ...

foods such as shell- and

crayfish Crayfish are freshwater crustaceans belonging to the infraorder Astacidea, which also contains lobsters. Taxonomically, they are members of the superfamilies Astacoidea and Parastacoidea. They breathe through feather-like gills. Some spe ...

, which are extremely rich in brain-specific nutrients, explaining Homo's brain enlargement. Shallow diving for seafoods requires voluntary airway control, a prerequisite for spoken language. Seafood such as shellfish generally does not require biting and chewing, but stone tool use and suction feeding. This finer control of the oral apparatus was arguably another biological pre-adaptation to human speech, especially for the production of consonants.

Timeline of speech evolution

Little is known about the timing of language's emergence in the human species. Unlike writing, speech leaves no material trace, making it archaeologically invisible. Lacking direct linguistic evidence, specialists in human origins have resorted to the study of anatomical features and genes arguably associated with speech production. Whilst such studies may provide information as to whether pre-modern ''Homo'' species had speech ''capacities'', it is still unknown whether they actually spoke. Whilst they may have communicated vocally, the anatomical and genetic data lack the resolution necessary to differentiate proto-language from speech. Using statistical methods to estimate the time required to achieve the current spread and diversity in modern languages today,

Johanna Nichols Johanna Nichols (born 1945, Iowa City, Iowa) is an American linguist and professor emerita in the Department of Slavic Languages and Literatures at the University of California, Berkeley. Career She earned her Ph.D. in Linguistics at the Univer ...

– a linguist at the University of California, Berkeley – argued in 1998 that vocal languages must have begun diversifying at least 100,000 years ago. In 2012, anthropologists Charles Perreault and Sarah Mathew used phonemic diversity to suggest a date consistent with this. "Phonemic diversity" denotes the number of perceptually distinct units of sound – consonants, vowels and tones – in a language. The current worldwide pattern of phonemic diversity potentially contains the statistical signal of the expansion of modern ''Homo sapiens'' out of Africa, beginning around 60-70 thousand years ago. Some scholars argue that phonemic diversity evolves slowly and can be used as a clock to calculate how long the oldest African languages would have to have been around in order to accumulate the number of phonemes they possess today. As human populations left Africa and expanded into the rest of the world, they underwent a series of bottlenecks – points at which only a very small population survived to colonise a new continent or region. Allegedly such a population crash led to a corresponding reduction in genetic, phenotypic and phonemic diversity.

African languages The number of languages natively spoken in Africa is variously estimated (depending on the delineation of language vs. dialect) at between 1,250 and 2,100, and by some counts at over 3,000. Nigeria alone has over 500 languages (according to SI ...

today have some of the largest phonemic inventories in the world, whilst the smallest inventories are found in South America and Oceania, some of the last regions of the globe to be colonised. For example,

Rotokas Rotokas is a North Bougainville language spoken by about 4,320 people on Bougainville Island in Papua New Guinea. Central Rotokas is most notable for its extremely small phonemic consonantal inventory, which lacks phonemic nasals. Dialects ...

, a language of New Guinea, and Pirahã, spoken in South America, both have just 11 phonemes, whilst !Xun, a language spoken in Southern Africa has 141 phonemes. The authors use a natural experiment – the colonization of mainland Southeast Asia on the one hand, the long-isolated

Andaman Islands The Andaman Islands () are an archipelago, made up of 200 islands, in the northeastern Indian Ocean about southwest off the coasts of Myanmar's Ayeyarwady Region. Together with the Nicobar Islands to their south, the Andamans serve as a mari ...

on the other – to estimate the rate at which phonemic diversity increases through time. Using this rate, they estimate that the world's languages date back to the

Middle Stone Age The Middle Stone Age (or MSA) was a period of African prehistory between the Early Stone Age and the Late Stone Age. It is generally considered to have begun around 280,000 years ago and ended around 50–25,000 years ago. The beginnings of ...

in Africa, sometime between 350 thousand and 150 thousand years ago. This corresponds to the speciation event which gave rise to ''Homo sapiens''. These and similar studies have however been criticised by linguists who argue that they are based on a flawed analogy between genes and phonemes, since phonemes are frequently transferred laterally between languages unlike genes, and on a flawed sampling of the world's languages, since both Oceania and the Americas also contain languages with very high numbers of phonemes, and Africa contains languages with very few. They argue that the actual distribution of phonemic diversity in the world reflects recent language contact and not deep language history - since it is well demonstrated that languages can lose or gain many phonemes over very short periods. In other words, there is no valid linguistic reason to expect genetic founder effects to influence phonemic diversity.

Notes

External links

Interactive sagittal section

Evolution of speech (anatomical and neural bases).

Ritual and the origins of language.

Decoding Chomsky
{{Human Evolution Language

Modality-independence

Evolution of the speech organs

Tongue

Lips

Respiratory control

Larynx

The size exaggeration hypothesis

Neanderthal speech

Hypoglossal canal

Distinctive features theory

Criticism

Self-organisation theory

Gestural theory

Criticism

Possible semi-aquatic adaptations

Timeline of speech evolution

See also

Notes

Further reading

External links