Audio signal processing is a subfield of

signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, Scalar potential, potential fields, Seismic tomograph ...

that is concerned with the electronic manipulation of

audio signal An audio signal is a representation of sound, typically using either a changing level of electrical voltage for analog signals or a series of binary numbers for Digital signal (signal processing), digital signals. Audio signals have frequencies i ...

s. Audio signals are electronic representations of

sound wave In physics, sound is a vibration that propagates as an acoustic wave through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the ''reception'' of such waves and their ''perception'' by the ...

s—

longitudinal wave Longitudinal waves are waves which oscillate in the direction which is parallel to the direction in which the wave travels and displacement of the medium is in the same (or opposite) direction of the wave propagation. Mechanical longitudinal ...

s which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals or sound power level is typically measured in

decibel The decibel (symbol: dB) is a relative unit of measurement equal to one tenth of a bel (B). It expresses the ratio of two values of a Power, root-power, and field quantities, power or root-power quantity on a logarithmic scale. Two signals whos ...

s. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.

History

The motivation for audio signal processing began at the beginning of the 20th century with inventions like the

telephone A telephone, colloquially referred to as a phone, is a telecommunications device that enables two or more users to conduct a conversation when they are too far apart to be easily heard directly. A telephone converts sound, typically and most ...

phonograph A phonograph, later called a gramophone, and since the 1940s a record player, or more recently a turntable, is a device for the mechanical and analogue reproduction of sound. The sound vibration Waveform, waveforms are recorded as correspond ...

, and

radio Radio is the technology of communicating using radio waves. Radio waves are electromagnetic waves of frequency between 3 hertz (Hz) and 300 gigahertz (GHz). They are generated by an electronic device called a transmitter connec ...

that allowed for the transmission and storage of audio signals. Audio processing was necessary for early

radio broadcasting Radio broadcasting is the broadcasting of audio signal, audio (sound), sometimes with related metadata, by radio waves to radio receivers belonging to a public audience. In terrestrial radio broadcasting the radio waves are broadcast by a lan ...

, as there were many problems with studio-to-transmitter links. The theory of signal processing and its application to audio was largely developed at

Bell Labs Nokia Bell Labs, commonly referred to as ''Bell Labs'', is an American industrial research and development company owned by Finnish technology company Nokia. With headquarters located in Murray Hill, New Jersey, Murray Hill, New Jersey, the compa ...

in the mid 20th century.

Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer, computer scientist, cryptographer and inventor known as the "father of information theory" and the man who laid the foundations of th ...

and Harry Nyquist's early work on

communication theory Communication theory is a proposed description of communication phenomena, the relationships among them, a storyline describing these relationships, and an argument for these three elements. Communication theory provides a way of talking about a ...

, sampling theory and

pulse-code modulation Pulse-code modulation (PCM) is a method used to digitally represent analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the amplitud ...

(PCM) laid the foundations for the field. In 1957, Max Mathews became the first person to synthesize audio from a

computer A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...

, giving birth to computer music. Major developments in digital audio coding and audio data compression include differential pulse-code modulation (DPCM) by C. Chapin Cutler at Bell Labs in 1950, linear predictive coding (LPC) by Fumitada Itakura ( Nagoya University) and Shuzo Saito ( Nippon Telegraph and Telephone) in 1966, adaptive DPCM (ADPCM) by P. Cummiskey, Nikil S. Jayant and James L. Flanagan at Bell Labs in 1973,

discrete cosine transform A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequency, frequencies. The DCT, first proposed by Nasir Ahmed (engineer), Nasir Ahmed in 1972, is a widely ...

(DCT) coding by Nasir Ahmed, T. Natarajan and K. R. Rao in 1974, and modified discrete cosine transform (MDCT) coding by J. P. Princen, A. W. Johnson and A. B. Bradley at the

University of Surrey The University of Surrey is a public research university in Guildford, Surrey, England. The university received its Royal Charter, royal charter in 1966, along with a Plate glass university, number of other institutions following recommendations ...

in 1987. LPC is the basis for perceptual coding and is widely used in

speech coding Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...

, while MDCT coding is widely used in modern audio coding formats such as MP3 and

Advanced Audio Coding Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. It was developed by Dolby, AT&T, Fraunhofer and Sony, originally as part of the MPEG-2 specification but later improved under MPEG-4.ISO (2006ISO/ ...

(AAC).

Types

Analog

An analog audio signal is a continuous signal represented by an electrical voltage or current that is ''analogous'' to the sound waves in the air. Analog signal processing then involves physically altering the continuous signal by changing the voltage or current or charge via

electrical circuits An electrical network is an interconnection of electrical components (e.g., batteries, resistors, inductors, capacitors, switches, transistors) or a model of such an interconnection, consisting of electrical elements (e.g., voltage so ...

. Historically, before the advent of widespread digital technology, analog was the only method by which to manipulate a signal. Since that time, as computers and software have become more capable and affordable, digital signal processing has become the method of choice. However, in music applications, analog technology is often still desirable as it often produces nonlinear responses that are difficult to replicate with digital filters.

Digital

A digital representation expresses the audio waveform as a sequence of symbols, usually binary numbers. This permits signal processing using digital circuits such as digital signal processors,

microprocessor A microprocessor is a computer processor (computing), processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, a ...

s and general-purpose computers. Most modern audio systems use a digital approach as the techniques of digital signal processing are much more powerful and efficient than analog domain signal processing.

Applications

Processing methods and application areas include storage,

data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...

, music information retrieval, speech processing, localization, acoustic detection, transmission, noise cancellation, acoustic fingerprinting, sound recognition, synthesis, and enhancement (e.g. equalization, filtering, level compression, echo and

reverb In acoustics, reverberation (commonly shortened to reverb) is a persistence of sound after it is produced. It is often created when a sound is reflected on surfaces, causing multiple reflections that build up and then decay as the sound is a ...

removal or addition, etc.).

Audio broadcasting

Audio signal processing is used when broadcasting audio signals in order to enhance their fidelity or optimize for bandwidth or latency. In this domain, the most important audio processing takes place just before the transmitter. The audio processor here must prevent or minimize overmodulation, compensate for non-linear transmitters (a potential issue with

medium wave Medium wave (MW) is a part of the medium frequency (MF) radio band used mainly for AM radio broadcasting. The spectrum provides about 120 channels with more limited sound quality than FM stations on the FM broadcast band. During the daytim ...

and shortwave broadcasting), and adjust overall loudness to the desired level.

Active noise control

Active noise control is a technique designed to reduce unwanted sound. By creating a signal that is identical to the unwanted noise but with the opposite polarity, the two signals cancel out due to destructive interference.

Audio synthesis

Audio synthesis is the electronic generation of audio signals. A musical instrument that accomplishes this is called a synthesizer. Synthesizers can either imitate sounds or generate new ones. Audio synthesis is also used to generate human

speech Speech is the use of the human voice as a medium for language. Spoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon. There are many different intentional speech acts, suc ...

using

speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...

Audio effects

Audio effects alter the sound of a

musical instrument A musical instrument is a device created or adapted to make Music, musical sounds. In principle, any object that produces sound can be considered a musical instrument—it is through purpose that the object becomes a musical instrument. A person ...

or other audio source. Common effects include

distortion In signal processing, distortion is the alteration of the original shape (or other characteristic) of a signal. In communications and electronics it means the alteration of the waveform of an information-bearing signal, such as an audio signal ...

, often used with electric guitar in

electric blues Electric blues is blues music distinguished by the use of electric amplification for musical instruments. The guitar was the first instrument to be popularly amplified and used by early pioneers T-Bone Walker in the late 1930s and John Lee Ho ...

and

rock music Rock is a Music genre, genre of popular music that originated in the United States as "rock and roll" in the late 1940s and early 1950s, developing into a range of styles from the mid-1960s, primarily in the United States and the United Kingdo ...

; dynamic effects such as volume pedals and compressors, which affect loudness; filters such as wah-wah pedals and graphic equalizers, which modify frequency ranges;

modulation Signal modulation is the process of varying one or more properties of a periodic waveform in electronics and telecommunication for the purpose of transmitting information. The process encodes information in form of the modulation or message ...

effects, such as chorus, flangers and phasers; pitch effects such as pitch shifters; and time effects, such as

and delay, which create echoing sounds and emulate the sound of different spaces. Musicians,

audio engineer An audio engineer (also known as a sound engineer or recording engineer) helps to produce a recording or a live performance, balancing and adjusting sound sources using equalization, dynamics processing and audio effects, mixing, reproduc ...

s and record producers use effects units during live performances or in the studio, typically with electric guitar, bass guitar,

electronic keyboard An electronic keyboard, portable keyboard, or digital keyboard is an electronic musical instrument based on keyboard instruments. Electronic keyboards include synthesizers, digital pianos, stage pianos, electronic organs and digital audio work ...

or electric piano. While effects are most frequently used with

electric Electricity is the set of physical phenomena associated with the presence and motion of matter possessing an electric charge. Electricity is related to magnetism, both being part of the phenomenon of electromagnetism, as described by Maxwel ...

or electronic instruments, they can be used with any audio source, such as acoustic instruments, drums, and vocals.

Computer audition

Computer audition (CA) or machine listening is the general field of study of

algorithms In mathematics and computer science, an algorithm () is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for per ...

and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in '' Technology Review'', talks about these systems "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents."Paris Smaragdis taught computers how to play more life-like music
/ref> Inspired by models of human audition, CA deals with questions of representation, transduction, grouping, use of musical knowledge and general sound

semantics Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...

for the purpose of performing intelligent operations on audio and music signals by the computer. Technically this requires a combination of methods from the fields of

, auditory modelling, music perception and

cognition Cognition is the "mental action or process of acquiring knowledge and understanding through thought, experience, and the senses". It encompasses all aspects of intellectual functions and processes such as: perception, attention, thought, ...

pattern recognition Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess PR capabilities but their p ...

, and

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

, as well as more traditional methods of

artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...

for musical knowledge representation.