Vocal Emotion
   HOME

TheInfoList



OR:

Emotional prosody or affective prosody is the various
paralinguistic Paralanguage, also known as vocalics, is a component of meta-communication that may modify meaning, give nuanced meaning, or convey emotion, by using suprasegmental techniques such as prosody, pitch, volume, intonation, etc. It is sometimes def ...
aspects of
language Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
use that convey
emotion Emotions are physical and mental states brought on by neurophysiology, neurophysiological changes, variously associated with thoughts, feelings, behavior, behavioral responses, and a degree of pleasure or suffering, displeasure. There is ...
. It includes an individual's tone of voice in
speech Speech is the use of the human voice as a medium for language. Spoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon. There are many different intentional speech acts, suc ...
that is conveyed through changes in pitch,
loudness In acoustics, loudness is the subjectivity, subjective perception of sound pressure. More formally, it is defined as the "attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud". The relat ...
,
timbre In music, timbre (), also known as tone color or tone quality (from psychoacoustics), is the perceived sound of a musical note, sound or tone. Timbre distinguishes sounds according to their source, such as choir voices and musical instrument ...
, speech rate, and pauses. It can be isolated from
semantic Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
information, and interacts with verbal content (e.g.
sarcasm Sarcasm is the caustic use of words, often in a humorous way, to mock someone or something. Sarcasm may employ ambivalence, although it is not necessarily ironic. Most noticeable in spoken word, sarcasm is mainly distinguished by the inflectio ...
). Emotional prosody in speech is perceived or decoded slightly worse than
facial expression Facial expression is the motion and positioning of the muscles beneath the skin of the face. These movements convey the emotional state of an individual to observers and are a form of nonverbal communication. They are a primary means of conveying ...
s but accuracy varies with emotions. Anger and sadness are perceived most easily, followed by fear and happiness, with disgust being the most poorly perceived.


Production of vocal emotion

Studies have found that some
emotion Emotions are physical and mental states brought on by neurophysiology, neurophysiological changes, variously associated with thoughts, feelings, behavior, behavioral responses, and a degree of pleasure or suffering, displeasure. There is ...
s, such as fear,
joy Joy is the state of being that allows one to experience feelings of intense, long-lasting happiness and contentment of life. It is closely related to, and often evoked by, well-being, success, or good fortune. Happiness, pleasure, and gratitu ...
and anger, are portrayed at a higher frequency than emotions such as sadness. *
Anger Anger, also known as wrath ( ; ) or rage (emotion), rage, is an intense emotional state involving a strong, uncomfortable and non-cooperative response to a perceived provocation, hurt, or threat. A person experiencing anger will often experie ...
: Anger can be divided into two types: "anger" and "hot anger". In comparison to neutral speech, anger is produced with a lower pitch, higher intensity, more energy (500 Hz) across the vocalization, higher first
formant In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmo ...
(first sound produced) and faster attack times at voice onset (the start of speech). "Hot anger", in contrast, is produced with a higher, more varied pitch, and even greater energy (2000 Hz). *
Disgust Disgust (, from Latin , ) is an emotional response of rejection or revulsion to something potentially contagious or something considered offensive, distasteful or unpleasant. In ''The Expression of the Emotions in Man and Animals'', Charles D ...
: In comparison to neutral speech, disgust is produced with a lower, downward directed pitch, with energy (500 Hz), lower first formant, and fast attack times similar to anger. Less variation and shorter durations are also characteristics of disgust. *
Fear Fear is an unpleasant emotion that arises in response to perception, perceived dangers or threats. Fear causes physiological and psychological changes. It may produce behavioral reactions such as mounting an aggressive response or fleeing the ...
: Fear can be divided into two types: "panic" and "anxiety". In comparison to neutral speech, fearful emotions have a higher pitch, little variation, lower energy, and a faster speech rate with more pauses. *
Sadness Sadness is an emotional pain associated with, or characterized by, feelings of disadvantage, loss, despair, grief, helplessness, disappointment and sorrow. An individual experiencing sadness may become quiet or lethargic, and withdraw the ...
: In comparison to neutral speech, sad emotions are produced with a higher pitch, less intensity but more vocal energy (2000 Hz), longer duration with more pauses, and a lower first formant.


Perception of vocal emotion

Decoding emotions in speech includes three stages: determining acoustic features, creating meaningful connections with these features, and processing the acoustic patterns in relation to the connections established. In the processing stage, connections with basic emotional knowledge is stored separately in memory network specific to associations. These associations can be used to form a baseline for emotional expressions encountered in the future. Emotional meanings of speech are implicitly and automatically registered after the circumstances, importance and other surrounding details of an event have been analyzed. On average, listeners are able to perceive intended emotions exhibited to them at a rate significantly better than chance (chance=approximately 10%). However, error rates are also high. This is partly due to the observation that listeners are more accurate at emotional inference from particular voices and perceive some emotions better than others. Vocal expressions of anger and sadness are perceived most easily, fear and happiness are only moderately well-perceived, and disgust has low perceptibility.


Vocal emotions and the brain

Language can be split into two components: the verbal and vocal channels. The verbal channel is the semantic content made by the speaker's chosen words. In the verbal channel, the semantic content of the speakers words determines the meaning of the sentence. The way a sentence is spoken, however, can change its meaning which is the vocal channel. This channel of language conveys emotions felt by the speaker and gives us as listeners a better idea of the intended meaning. Nuances in this channel are expressed through intonation, intensity, a rhythm which combined for prosody. Usually these channels convey the same emotion, but sometimes they differ. Sarcasm and
irony Irony, in its broadest sense, is the juxtaposition of what, on the surface, appears to be the case with what is actually or expected to be the case. Originally a rhetorical device and literary technique, in modernity, modern times irony has a ...
are two forms of humor based on this incongruent style. Neurological processes integrating verbal and vocal (prosodic) components are relatively unclear. However, it is assumed that verbal content and vocal are processed in different hemispheres of the
brain The brain is an organ (biology), organ that serves as the center of the nervous system in all vertebrate and most invertebrate animals. It consists of nervous tissue and is typically located in the head (cephalization), usually near organs for ...
. Verbal content composed of syntactic and semantic information is processed in the
left hemisphere The lateralization of brain function (or hemispheric dominance/ lateralization) is the tendency for some neural functions or cognitive processes to be specialized to one side of the brain or the other. The median longitudinal fissure separates ...
.
Syntactic In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituency ...
information is processed primarily in the frontal regions and a small part of the
temporal lobe The temporal lobe is one of the four major lobes of the cerebral cortex in the brain of mammals. The temporal lobe is located beneath the lateral fissure on both cerebral hemispheres of the mammalian brain. The temporal lobe is involved in pr ...
of the brain while semantic information is processed primarily in the temporal regions with a smaller part of the
frontal lobes The frontal lobe is the largest of the four major lobes of the brain in mammals, and is located at the front of each cerebral hemisphere (in front of the parietal lobe and the temporal lobe). It is parted from the parietal lobe by a groove betwe ...
incorporated. In contrast, prosody is processed primarily in the same pathway as verbal content, but in the
right hemisphere The lateralization of brain function (or hemispheric dominance/ lateralization) is the tendency for some neural functions or cognitive processes to be specialized to one side of the brain or the other. The median longitudinal fissure separates ...
. Neuroimaging studies using
functional magnetic resonance imaging Functional magnetic resonance imaging or functional MRI (fMRI) measures brain activity by detecting changes associated with blood flow. This technique relies on the fact that cerebral blood flow and neuronal activation are coupled. When an area o ...
(fMRI) machines provide further support for this hemisphere lateralization and temporo-frontal activation. Some studies however show evidence that prosody perception is not exclusively lateralized to the right hemisphere and may be more bilateral. There is some evidence that the basal ganglia may also play an important role in the perception of prosody.


Impairment of emotion recognition

Deficits in expressing and understanding prosody, caused by right hemisphere lesions, are known as
aprosodia Aprosodia is a neurological condition characterized by the inability of a person to properly convey or interpret emotional prosody. Prosody in language refers to the ranges of rhythm, pitch, stress, intonation, etc. These neurological deficits can ...
s. These can manifest in different forms and in various
mental illness A mental disorder, also referred to as a mental illness, a mental health condition, or a psychiatric disability, is a behavioral or mental pattern that causes significant distress or impairment of personal functioning. A mental disorder is ...
es or diseases. Aprosodia can be caused by
stroke Stroke is a medical condition in which poor cerebral circulation, blood flow to a part of the brain causes cell death. There are two main types of stroke: brain ischemia, ischemic, due to lack of blood flow, and intracranial hemorrhage, hemor ...
and
alcohol abuse Alcohol abuse encompasses a spectrum of alcohol-related substance abuse. This spectrum can range from being mild, moderate, or severe. This can look like consumption of more than 2 drinks per day on average for men, or more than 1 drink per ...
as well. The types of aprosodia include: motor (the inability to produce vocal inflection), expressive (when brain limitations and not motor functions are the cause of this inability), and receptive (when a person cannot decipher the emotional speech). It has been found that it gets increasingly difficult to recognize vocal expressions of emotion with increasing age. Older adults have slightly more difficulty labeling vocal expressions of emotion, particularly sadness and anger than young adults but have a much greater difficulty integrating vocal emotions and corresponding facial expressions. A possible explanation for this difficulty is that combining two sources of emotion requires greater activation of emotion areas of the brain, in which adults show decreased volume and activity. Another possible explanation is that
hearing loss Hearing loss is a partial or total inability to hear. Hearing loss may be present at birth or acquired at any time afterwards. Hearing loss may occur in one or both ears. In children, hearing problems can affect the ability to acquire spo ...
could have led to a mishearing of vocal expressions. High frequency hearing loss is known to begin occurring around the age of 50, particularly in men. Because the right hemisphere of the brain is associated with prosody, patients with right hemisphere lesions have difficulty varying speech patterns to convey emotion. Their speech may therefore sound monotonous. In addition, people with right-hemisphere damage have been studied to be impaired when it comes to identifying the emotion in intoned sentences. Difficulty in decoding both syntactic and affective prosody is also found in people with
autism spectrum disorder Autism, also known as autism spectrum disorder (ASD), is a neurodevelopmental disorder characterized by differences or difficulties in social communication and interaction, a preference for predictability and routine, sensory processing di ...
and
schizophrenia Schizophrenia () is a mental disorder characterized variously by hallucinations (typically, Auditory hallucination#Schizophrenia, hearing voices), delusions, thought disorder, disorganized thinking and behavior, and Reduced affect display, f ...
, where "patients have deficits in a large number of functional domains, including
social skill A social skill is any competence facilitating interaction and communication with others where social rules and relations are created, communicated, and changed in verbal and nonverbal ways. The process of learning these skills is called socia ...
s and social cognition. These social impairments consist of difficulties in perceiving, understanding, anticipating and reacting to social cues that are crucial for normal social interaction." This has been determined in multiple studies, such as Hoekert et al.'s 2017 study on emotional prosody in schizophrenia, which illustrated that more research must be done to fully confirm the correlation between the illness and emotional prosody. However, people with schizophrenia have no problem deciphering non-emotional prosody.


Non-linguistic emotional prosody

Emotional states such as happiness, sadness, anger, and disgust can be determined solely based on the acoustic structure of a non-linguistic speech act. These acts can be grunts, sighs, exclamations, etc. There is some research that supports the notion that these non-linguistic acts are universal, eliciting the same assumptions even from speakers of different languages. In addition, it has been proven that emotion can be expressed in non-linguistic vocalizations differently than in speech. As Laukka et al. state: Speech requires highly precise and coordinated movement of the articulators (e.g.,
lips The lips are a horizontal pair of soft appendages attached to the jaws and are the most visible part of the mouth of many animals, including humans. Mammal lips are soft, movable and serve to facilitate the ingestion of food (e.g. sucklin ...
,
tongue The tongue is a Muscle, muscular organ (anatomy), organ in the mouth of a typical tetrapod. It manipulates food for chewing and swallowing as part of the digestive system, digestive process, and is the primary organ of taste. The tongue's upper s ...
, and
larynx The larynx (), commonly called the voice box, is an organ (anatomy), organ in the top of the neck involved in breathing, producing sound and protecting the trachea against food aspiration. The opening of larynx into pharynx known as the laryngeal ...
) in order to transmit linguistic information, whereas non-linguistic vocalizations are not constrained by linguistic codes and thus do not require such precise articulations. This entails that non-linguistic vocalizations can exhibit larger ranges for many acoustic features than prosodic expressions. In their study, actors were instructed to vocalize an array of different emotions without words. The study showed that listeners could identify a wide range of positive and negative emotions above chance. However, emotions like guilt and pride were less easily recognized. In a 2015 study by Verena Kersken, Klaus Zuberbühler and Juan-Carlos Gomez, non-linguistic vocalizations of infants were presented to adults to see if the adults could distinguish from infant vocalizations indicating requests for help, pointing to an object, or indicating an event. Infants show different prosodic elements in crying, depending on what they are crying for. They also have differing outbursts for positive and negative emotional states. Decipherment ability of this information was determined to be applicable across cultures and independent of the adult's level of experience with infants.


Sex differences

Men and women differ in both how they use language and also how they understand it. It is known that there is a difference in the rate of speech, the range of pitch, and the duration of speech, and pitch slope (Fitzsimmons et al.). For example, "In a study of relationship of spectral and prosodic signs, it was established that the dependence of pitch and duration differed in men and women uttering the sentences in affirmative and inquisitive intonation.
Tempo of speech Speech tempo is a measure of the number of speech units of a given type produced within a given amount of time. Speech tempo is believed to vary within the speech of one person according to contextual and emotional factors, between speakers and als ...
, pitch range, and pitch steepness differ between the genders" (Nesic et al.). One such illustration is how women are more likely to speak faster, elongate the ends of words, and raise their pitch at the end of sentences. Women and men are also different in how they neurologically process emotional prosody. In an fMRI study, men showed a stronger activation in more cortical areas than female subjects when processing the meaning or manner of an emotional phrase. In the manner task, men had more activation in the bilateral middle temporal gyri. For women, the only area of significance was the right posterior cerebellar lobe. Male subjects in this study showed stronger activation in the
prefrontal cortex In mammalian brain anatomy, the prefrontal cortex (PFC) covers the front part of the frontal lobe of the cerebral cortex. It is the association cortex in the frontal lobe. The PFC contains the Brodmann areas BA8, BA9, BA10, BA11, BA12, ...
, and on average needed a longer response time than female subjects. This result was interpreted to mean that men need to make conscious inferences about the acts and intentions of the speaker, while women may do this sub-consciously. Therefore, men needed to integrate linguistic semantics and emotional intent "at a higher stage than the semantic processing stage."


Considerations

Most research regarding vocal expression of emotion has been studied through the use of synthetic speech or portrayals of emotion by professional actors. Little research has been done with spontaneous, "natural" speech samples. These artificial speech samples have been considered to be close to natural speech but specifically portrayals by actors may be influenced stereotypes of emotional vocal expression and may exhibit intensified characteristics of speech skewing listeners perceptions. Another consideration lies in listeners individual perceptions. Studies typically take the average of responses but few examine individual differences in great depth. This may provide a better insight into the vocal expressions of emotions.


See also

*
Affect (psychology) Affect, in psychology, is the underlying experience of feeling, emotion, attachment theory, attachment, or Mood (psychology), mood. It encompasses a wide range of emotional states and can be positive (e.g., happiness, joy, excitement) or negat ...
*
Nonverbal communication Nonverbal communication is the transmission of messages or signals through a nonverbal platform such as eye contact (oculesics), body language (kinesics), social distance (proxemics), touch (Haptic communication, haptics), voice (prosody (lingui ...
*
Prosody (linguistics) In linguistics, prosody () is the study of elements of speech, including intonation, stress, rhythm and loudness, that occur simultaneously with individual phonetic segments: vowels and consonants. Often, prosody specifically refers to such e ...


References

{{Nonverbal communication Speech Emotion