Audio Latency
   HOME

TheInfoList



OR:

Latency refers to a short period of delay (usually measured in
milliseconds A millisecond (from '' milli-'' and second; symbol: ms) is a unit of time in the International System of Units equal to one thousandth (0.001 or 10−3 or 1/1000) of a second or 1000 microseconds. A millisecond is to one second, as one second i ...
) between when an audio signal enters a system, and when it emerges. Potential contributors to latency in an audio system include
analog-to-digital conversion In electronics, an analog-to-digital converter (ADC, A/D, or A-to-D) is a system that converts an analog signal, such as a sound picked up by a microphone or light entering a digital camera, into a digital signal. An ADC may also provide ...
, buffering,
digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a ...
, transmission time,
digital-to-analog conversion In electronics, a digital-to-analog converter (DAC, D/A, D2A, or D-to-A) is a system that converts a digital signal into an analog signal. An analog-to-digital converter (ADC) performs the reverse function. DACs are commonly used in musi ...
, and the
speed of sound The speed of sound is the distance travelled per unit of time by a sound wave as it propagates through an elasticity (solid mechanics), elastic medium. More simply, the speed of sound is how fast vibrations travel. At , the speed of sound in a ...
in the
transmission medium A transmission medium is a system or substance that can mediate the propagation of signals for the purposes of telecommunication. Signals are typically imposed on a wave of some kind suitable for the chosen medium. For example, data can modula ...
. Latency can be a critical performance metric in
professional audio Professional audio, abbreviated as pro audio, refers to both an activity and a category of high-quality, studio-grade audio equipment. Typically it encompasses sound recording, sound reinforcement system setup and audio mixing, and studio mus ...
including
sound reinforcement system A sound reinforcement system is the combination of microphones, signal processors, amplifiers, and loudspeakers in Loudspeaker enclosure, enclosures all controlled by a mixing console that makes live or pre-recorded sounds louder and may also ...
s, foldback systems (especially those using
in-ear monitor An in-ear monitor (IEMs), in-ear, or colloquially earpiece is a listening device placed into the ear. More narrowly, the term in-ear monitor is defined as such a device used by musicians, audio engineers and audiophiles to listen to music or to ...
s)
live radio Live radio is radio broadcast without delay. Before the days of television, audiences listened to live Drama (film and television), dramas, Comedy, comedies, Game show, quiz shows and Concert, concerts on the radio much the same way that they n ...
and
television Television (TV) is a telecommunication medium for transmitting moving images and sound. Additionally, the term can refer to a physical television set rather than the medium of transmission. Television is a mass medium for advertising, ...
. Excessive audio latency has the potential to degrade call quality in
telecommunications Telecommunication, often used in its plural form or abbreviated as telecom, is the transmission of information over a distance using electronic means, typically through cables, radio waves, or other communication technologies. These means of ...
applications. Low latency audio in
computer A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...
s is important for
interactivity Across the many fields concerned with interactivity, including information science, computer science, human-computer interaction, communication, and industrial design, there is little agreement over the meaning of the term "interactivity", but ...
.


Telephone calls

In all systems, latency can be said to consist of three elements:
codec A codec is a computer hardware or software component that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder o ...
delay, playout delay and network delay. Latency in telephone calls is sometimes referred to as delay; the telecommunications industry also uses the term ''quality of experience'' (QoE). Voice quality is measured according to the
ITU The International Telecommunication Union (ITU)In the other common languages of the ITU: * * is a specialized agency of the United Nations responsible for many matters related to information and communication technologies. It was established ...
model; measurable quality of a call degrades rapidly where the mouth-to-ear delay latency exceeds 200 milliseconds. The
mean opinion score Mean opinion score (MOS) is a measure used in the domain of Quality of Experience and telecommunications engineering, representing overall quality of a stimulus or system. It is the arithmetic mean over all individual "values on a predefined scale ...
(MOS) is also comparable in a near-linear fashion with the ITU's quality scale - defined in standards G.107, G.108 and G.109 - with a quality factor ''R'' ranging from 0 to 100. An MOS of 4 ('Good') would have an ''R'' score of 80 or above; to achieve 100R requires an MOS exceeding 4.5. The ITU and
3GPP The 3rd Generation Partnership Project (3GPP) is an umbrella term for a number of standards organizations which develop protocols for mobile telecommunications. Its best known work is the development and maintenance of: * GSM and related 2G and ...
groups end-user services into classes based on latency sensitivity: Similarly, the G.114 recommendation regarding mouth-to-ear delay indicates that most users are "very satisfied" as long as latency does not exceed 200 ms, with an according ''R'' of 90+. Codec choice also plays an important role; the highest quality (and highest bandwidth) codecs like G.711 are usually configured to incur the least encode-decode latency, so on a network with sufficient throughput latencies can be achieved. G.711 at a bitrate of 64 kbit/s is the encoding method predominantly used on the
public switched telephone network The public switched telephone network (PSTN) is the aggregate of the world's telephone networks that are operated by national, regional, or local telephony operators. It provides infrastructure and services for public telephony. The PSTN consists o ...
.


Mobile calls

The AMR narrowband codec, used in
GSM The Global System for Mobile Communications (GSM) is a family of standards to describe the protocols for second-generation (2G) digital cellular networks, as used by mobile devices such as mobile phones and Mobile broadband modem, mobile broadba ...
and
UMTS The Universal Mobile Telecommunications System (UMTS) is a 3G mobile cellular system for networks based on the GSM standard. UMTS uses Wideband Code Division Multiple Access, wideband code-division multiple access (W-CDMA) radio access technolog ...
networks, introduces latency in the encode and decode processes. As mobile operators upgrade existing ''best-effort'' networks to support concurrent multiple types of service over all-IP networks, services such as Hierarchical Quality of Service (''H-QoS'') allow for per-user, per-service QoS policies to prioritise time-sensitive protocols like voice calls, and other wireless backhaul traffic. Another aspect of mobile latency is the inter-network handoff; as a customer on Network A calls a Network B customer the call must traverse two separate
Radio Access Network A radio access network (RAN) is part of a mobile telecommunication system implementing a radio access technology (RAT). Conceptually, it resides between a device such as a mobile phone, a computer, or any remotely controlled machine and provi ...
s, two core networks, and an interlinking Gateway Mobile Switching Centre (GMSC) which performs the physical interconnecting between the two providers.


IP calls

With end-to-end QoS managed and assured rate connections, latency can be reduced to analogue PSTN/POTS levels. On a stable connection with sufficient bandwidth and minimal latency,
VoIP Voice over Internet Protocol (VoIP), also known as IP telephony, is a set of technologies used primarily for voice communication sessions over Internet Protocol (IP) networks, such as the Internet. VoIP enables voice calls to be transmitted as ...
systems typically have a minimum of 20 ms inherent latency. Under less ideal network conditions a 150 ms maximum latency is sought for general consumer use. Many popular videoconferencing systems rely on data buffering and data redundancy to cope for network jitter and packet loss. Measurements have shown that mouth-to-ear delay are between 160 and 300 ms over a 500-mile distance, on an average US network conditions. Latency is a larger consideration when an echo is present and systems must perform
echo suppression and cancellation Echo suppression and echo cancellation are methods used in telephony to improve voice quality by preventing echo from being created or removing it after it is already present. In addition to improving subjective audio quality, echo suppression i ...
.


Computer audio

Latency can be a particular problem in audio platforms on computers. Supported interface optimizations reduce the delay down to times that are too short for the human ear to detect. By reducing buffer sizes, latency can be reduced. A popular optimization solution is Steinberg's
ASIO ''Asio'' is a genus of typical owls, or true owls, in the family Strigidae. This group has representatives over most of the planet, and the short-eared owl is one of the most widespread of all bird species, breeding in Europe, Asia, North Ameri ...
, which bypasses the audio platform, and connects audio signals directly to the sound card's hardware. Many professional and semi-professional audio applications utilize the ASIO driver, allowing users to work with audio in real time. Pro Tools HD offers a low latency system similar to ASIO. Pro Tools 10 and 11 are also compatible with ASIO interface drivers. The Linux realtime kernel is a modified kernel, that alters the standard timer frequency the Linux kernel uses and gives all processes or threads the ability to have realtime priority. This means that a time-critical process like an audio stream can get priority over another, less-critical process like network activity. This is also configurable per user (for example, the processes of user "tux" could have priority over processes of user "nobody" or over the processes of several system
daemon A demon is a malevolent supernatural being, evil spirit or fiend in religion, occultism, literature, fiction, mythology and folklore. Demon, daemon or dæmon may also refer to: Entertainment Fictional entities * Daemon (G.I. Joe), a character ...
s).


Digital television audio

Many modern digital television receivers,
set-top box A set-top box (STB), also known as a cable converter box, cable box, receiver, or simply box, and historically television decoder or a converter, is an information appliance device that generally contains a Tuner (radio)#Television, TV tuner inpu ...
es and
AV receiver An audio/video receiver (AVR) or a stereo receiver is a consumer electronics component used in a home theater, home audio, or hi-fi system. Its purpose is to receive audio and video signals from a number of sources, and to process them and prov ...
s use sophisticated audio processing, which can create a delay between the time when the audio signal is received and the time when it is heard on the speakers. Since TVs also introduce delays in processing the video signal this can result in the two signals being sufficiently synchronized to be unnoticeable by the viewer. However, if the difference between the audio and video delay is significant, the effect can be disconcerting. Some systems have a
lip sync Lip sync or lip synch (pronounced , like the word ''sink'', despite the Hard and soft C, spelling of the participial forms ''synced'' and ''syncing''), short for lip synchronization, is a technical term for matching a Speech, speaking or singin ...
setting that allows the audio lag to be adjusted to synchronize with the video, and others may have advanced settings where some of the audio processing steps can be turned off. Audio lag is also a significant detriment in
rhythm games Rhythm (from Greek , ''rhythmos'', "any regular recurring motion, symmetry") generally means a " movement marked by the regulated succession of strong and weak elements, or of opposite or different conditions". This general meaning of regular r ...
, where precise timing is required to succeed. Most of these games have a lag calibration setting whereupon the game will adjust the timing windows by a certain number of milliseconds to compensate. In these cases, the notes of a song will be sent to the speakers before the game even receives the required input from the player in order to maintain the illusion of rhythm. Games that rely upon
musical improvisation Musical improvisation (also known as musical extemporization) is the creative activity of immediate ("in the moment") musical composition, which combines performance with communication of Emotion, emotions and Musical technique, instrumental techn ...
, such as
Rock Band ''Rock Band'' is a series of rhythm games first released in 2007 and developed by Harmonix. Based on their previous development work from the Guitar Hero, ''Guitar Hero'' series, the main ''Rock Band'' games have players use game controllers mod ...
drums or
DJ Hero ''DJ Hero'' is a 2009 rhythm game developed by FreeStyleGames and published by Activision. It is the first spin-off of the ''Guitar Hero'' series. It was released on October 27, 2009, in North America and on October 29, 2009, in Europe. The gam ...
, can still suffer tremendously, as the game cannot predict what the player will hit in these cases, and excessive lag will still create a noticeable delay between hitting notes, and hearing them play.


Broadcast audio

Audio latency can be experienced in broadcast systems where someone is contributing to a live broadcast over a
satellite A satellite or an artificial satellite is an object, typically a spacecraft, placed into orbit around a celestial body. They have a variety of uses, including communication relay, weather forecasting, navigation ( GPS), broadcasting, scient ...
or similar link with high delay. The person in the main studio has to wait for the contributor at the other end of the link to react to questions. Latency in this context could be between several hundred
milliseconds A millisecond (from '' milli-'' and second; symbol: ms) is a unit of time in the International System of Units equal to one thousandth (0.001 or 10−3 or 1/1000) of a second or 1000 microseconds. A millisecond is to one second, as one second i ...
and a few seconds. Dealing with audio latencies as high as this takes special training in order to make the resulting combined audio output reasonably acceptable to the listeners. Wherever practical, it is important to try to keep live production audio latency low in order to keep the reactions and interchange of participants as natural as possible. A latency of 10 milliseconds or better is the target for audio circuits within professional production structures.


Live performance audio

Latency in live performance occurs naturally from the
speed of sound The speed of sound is the distance travelled per unit of time by a sound wave as it propagates through an elasticity (solid mechanics), elastic medium. More simply, the speed of sound is how fast vibrations travel. At , the speed of sound in a ...
. It takes sound about 3 milliseconds to travel 1 meter. Small amounts of latency occur between performers depending on how they are spaced from each other and from stage monitors if these are used. This creates a practical limit to how far apart the artists in a group can be from one another. Stage monitoring extends that limit, as sound travels close to the
speed of light The speed of light in vacuum, commonly denoted , is a universal physical constant exactly equal to ). It is exact because, by international agreement, a metre is defined as the length of the path travelled by light in vacuum during a time i ...
through the cables that connect stage monitors. Performers, particularly in large spaces, will also hear
reverberation In acoustics, reverberation (commonly shortened to reverb) is a persistence of sound after it is produced. It is often created when a sound is reflection (physics), reflected on surfaces, causing multiple reflections that build up and then de ...
, or echo of their music, as the sound that projects from stage bounces off of walls and structures, and returns with latency and distortion. A primary purpose of stage monitoring is to provide artists with more primary sound so that they are not confused by the latency of these reverberations.


Live signal processing

While analog audio equipment has no appreciable latency,
digital audio Digital audio is a representation of sound recorded in, or converted into, digital signal (signal processing), digital form. In digital audio, the sound wave of the audio signal is typically encoded as numerical sampling (signal processing), ...
equipment has latency associated with two general processes: conversion from one format to another, and
digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a ...
(DSP) tasks such as equalization, compression and routing. Digital conversion processes include
analog-to-digital converter In electronics, an analog-to-digital converter (ADC, A/D, or A-to-D) is a system that converts an analog signal, such as a sound picked up by a microphone or light entering a digital camera, into a Digital signal (signal processing), digi ...
s (ADC),
digital-to-analog converter In electronics, a digital-to-analog converter (DAC, D/A, D2A, or D-to-A) is a system that converts a digital signal into an analog signal. An analog-to-digital converter (ADC) performs the reverse function. DACs are commonly used in musi ...
s (DAC), and various changes from one digital format to another, such as
AES3 AES3 is a technical standard, standard for the exchange of digital audio signals between professional audio devices. An AES3 signal can carry two channels of pulse-code modulation, pulse-code-modulated digital audio over several transmission medi ...
which carries low-voltage electrical signals to
ADAT Alesis Digital Audio Tape, commonly referred to as ADAT, is a magnetic tape format used for the Sound recording and reproduction, recording of eight digital audio tracks onto the same S-VHS tape used by consumer VCRs, and the basis of a serie ...
, an optical transport. Any such process takes a small amount of time to accomplish; typical latencies are in the range of 0.2 to 1.5 milliseconds, depending on sampling rate, software design and hardware architecture. Different
audio signal processing Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting ...
operations such as
finite impulse response In signal processing, a finite impulse response (FIR) filter is a filter whose impulse response (or response to any finite length input) is of ''finite'' duration, because it settles to zero in finite time. This is in contrast to infinite impuls ...
(FIR) and
infinite impulse response Infinite impulse response (IIR) is a property applying to many linear time-invariant systems that are distinguished by having an impulse response h(t) that does not become exactly zero past a certain point but continues indefinitely. This is in ...
(IIR) filters take different mathematical approaches to the same end and can have different latencies. In addition, input and output sample buffering add delay. Typical latencies range from 0.5 to ten milliseconds with some designs having as much as 30 milliseconds of delay.ProSoundWeb. David McNell. ''Networked Audio Transport: Looking at the methods and factors''
Latency in digital audio equipment is most noticeable when a singer's voice is transmitted through their microphone, through digital audio mixing, processing and routing paths, then sent to their own ears via
in-ear monitor An in-ear monitor (IEMs), in-ear, or colloquially earpiece is a listening device placed into the ear. More narrowly, the term in-ear monitor is defined as such a device used by musicians, audio engineers and audiophiles to listen to music or to ...
s or headphones. In this case, the singer's vocal sound is conducted to their own ear through the bones of the head, then through the digital pathway to their ears some milliseconds later. In one study, listeners found latency greater than 15 ms to be noticeable. Latency for other musical activities such as playing guitar does not have the same critical concern. Ten milliseconds of latency isn't as noticeable to a listener who is not hearing his or her own voice.Whirlwind. ''Opening Pandora's Box? The "L" word - latency and digital audio systems''
/ref>


Delayed loudspeakers

In
sound reinforcement A sound reinforcement system is the combination of microphones, signal processors, amplifiers, and loudspeakers in enclosures all controlled by a mixing console that makes live or pre-recorded sounds louder and may also distribute those sou ...
for music or speech presentation in large venues, it is optimal to deliver sufficient sound volume to the back of the venue without resorting to excessive sound volumes near the front. One way for
audio engineer An audio engineer (also known as a sound engineer or recording engineer) helps to produce a recording or a live performance, balancing and adjusting sound sources using equalization, dynamics processing and audio effects, mixing, reproduc ...
s to achieve this is to use additional loudspeakers placed at a distance from the stage but closer to the rear of the audience. Sound travels through air at the
speed of sound The speed of sound is the distance travelled per unit of time by a sound wave as it propagates through an elasticity (solid mechanics), elastic medium. More simply, the speed of sound is how fast vibrations travel. At , the speed of sound in a ...
(around per second depending on air temperature and humidity). By measuring or estimating the difference in latency between the loudspeakers near the stage and the loudspeakers nearer the audience, the audio engineer can introduce an appropriate delay in the audio signal going to the latter loudspeakers, so that the wavefronts from near and far loudspeakers arrive at the same time. Because of the
Haas effect Haas may refer to: Auto racing * Haas F1 Team Haas Formula LLC, competing as MoneyGram Haas F1 Team, is an List of Formula One constructors#Team's nationality, American-licensed Formula One racing team established by NASCAR Cup Series team o ...
an ''additional'' 15 milliseconds can be added to the delay time of the loudspeakers nearer the audience, so that the stage's wavefront reaches them first, to focus the audience's attention on the stage rather than the local loudspeaker. The slightly later sound from delayed loudspeakers simply increases the perceived sound level without negatively affecting localization.


See also

*
Delay (audio effect) Delay is an audio signal processing technique that records an input signal to a storage medium and then plays it back after a period of time. When the delayed playback is electronic mixer, mixed with the live audio, it creates an echo-like effec ...
*
Group delay and phase delay In signal processing, group delay and phase delay are functions that describe in different ways the delay times experienced by a signal’s various sinusoidal frequency components as they pass through a linear time-invariant (LTI) system (such as ...


References

{{Reflist, refs= {{cite web , url = http://www.itu.int/rec/T-REC-G.107 , title = G.107 : The E-model: a computational model for use in transmission planning , access-date = 2013-01-14 , date = 2000-06-07 , format = PDF , publisher =
International Telecommunication Union The International Telecommunication Union (ITU)In the other common languages of the ITU: * * is a list of specialized agencies of the United Nations, specialized agency of the United Nations responsible for many matters related to information ...
{{cite web , url = http://www.itu.int/rec/T-REC-G.108 , title = G.108 : Application of the E-model: A planning guide , access-date = 2013-01-14 , date = 2000-07-28 , format = PDF , publisher =
International Telecommunication Union The International Telecommunication Union (ITU)In the other common languages of the ITU: * * is a list of specialized agencies of the United Nations, specialized agency of the United Nations responsible for many matters related to information ...
{{cite web , url = http://www.itu.int/rec/T-REC-G.109 , title = G.109 : Definition of categories of speech transmission quality - ITU , access-date = 2013-01-14 , date = 2000-05-11 , format = PDF , publisher =
International Telecommunication Union The International Telecommunication Union (ITU)In the other common languages of the ITU: * * is a list of specialized agencies of the United Nations, specialized agency of the United Nations responsible for many matters related to information ...
{{cite web , url = http://www.o3bnetworks.com/media/45606/latency%2520matters.pdf , title = Why Latency Matters to Mobile Backhaul - O3b Networks , access-date = 2013-01-11 , author = O3b Networks and Sofrecom , publisher = O3b Networks


External links


Music Collaboration Will Never Happen Online in Real Time
Audio engineering