An acoustic fingerprint is a condensed digital summary, a digital fingerprint, deterministically generated from an

audio signal An audio signal is a representation of sound, typically using either a changing level of electrical voltage for analog signals or a series of binary numbers for Digital signal (signal processing), digital signals. Audio signals have frequencies i ...

, that can be used to identify an

audio sample In sound and music, sampling is the reuse of a portion (or sample) of a sound recording in another recording. Samples may comprise elements such as rhythm, melody, speech, or sound effects. A sample might comprise only a fragment of sound, or a l ...

or quickly locate similar items in a

music database Below is a table of online music databases that are largely free of charge. Many of the sites provide a specialized service or focus on a particular music genre. Some of these operate as an online music store or purchase referral service in some ...

. Practical uses of acoustic fingerprinting include identifying

song A song is a musical composition performed by the human voice. The voice often carries the melody (a series of distinct and fixed pitches) using patterns of sound and silence. Songs have a structure, such as the common ABA form, and are usu ...

melodies A melody (), also tune, voice, or line, is a linear succession of musical tones that the listener perceives as a single entity. In its most literal sense, a melody is a combination of pitch and rhythm, while more figuratively, the term ca ...

, tunes, or

advertisements Advertising is the practice and techniques employed to bring attention to a product or service. Advertising aims to present a product or service in terms of utility, advantages, and qualities of interest to consumers. It is typically us ...

;

sound effect A sound effect (or audio effect) is an artificially created or enhanced sound, or sound process used to emphasize artistic or other content of films, television shows, live performance, animation, video games, music, or other media. In m ...

library management; and

video file A video file format is a type of file format for storing digital video data on a computer system. Video is almost always stored using lossy compression to reduce the file size. A video file normally consists of a container (e.g. in the Matro ...

identification. Media identification using acoustic fingerprints can be used to monitor the use of specific musical works and performances on

radio broadcast Radio broadcasting is the broadcasting of audio (sound), sometimes with related metadata, by radio waves to radio receivers belonging to a public audience. In terrestrial radio broadcasting the radio waves are broadcast by a land-based radio ...

, records, CDs,

streaming media Streaming media refers to multimedia delivered through a Computer network, network for playback using a Media player (disambiguation), media player. Media is transferred in a ''stream'' of Network packet, packets from a Server (computing), ...

, and

peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network, forming a peer-to-peer network of Node ...

networks. This identification has been used in copyright compliance, licensing, and other

monetization Monetization ( also spelled monetisation in the UK) is, broadly speaking, the process of converting something into money. The term has a broad range of uses. In banking, the term refers to the process of converting or establishing something into ...

schemes.

Attributes

A robust acoustic fingerprint algorithm must take into account the perceptual characteristics of the audio. If two files sound alike to the human ear, their acoustic fingerprints should match, even if their binary representations are quite different. Acoustic fingerprints are not

hash function A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a ...

s, which are sensitive to any small changes in the data. Acoustic fingerprints are more analogous to human fingerprints where small variations that are insignificant to the features the fingerprint uses are tolerated. One can imagine the case of a smeared human fingerprint impression that can accurately be matched to another fingerprint sample in a reference database; acoustic fingerprints work similarly. Perceptual characteristics often exploited by audio fingerprints include average

zero crossing A zero-crossing is a point where the sign of a mathematical function changes (e.g. from positive to negative), represented by an intercept of the axis (zero value) in the graph of the function. It is a commonly used term in electronics, mathema ...

rate, estimated

tempo In musical terminology, tempo (Italian for 'time'; plural 'tempos', or from the Italian plural), measured in beats per minute, is the speed or pace of a given musical composition, composition, and is often also an indication of the composition ...

, average

spectrum A spectrum (: spectra or spectrums) is a set of related ideas, objects, or properties whose features overlap such that they blend to form a continuum. The word ''spectrum'' was first used scientifically in optics to describe the rainbow of co ...

spectral flatness Spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used in digital signal processing to characterize an audio spectrum. Spectral flatness is typically measured in decibels, and provides a way to quantify how mu ...

, prominent tones across a set of

frequency band Spectral bands are regions of a given spectrum, having a specific range of wavelengths or frequencies. Most often, it refers to electromagnetic bands, regions of the electromagnetic spectrum. More generally, spectral bands may also be means in ...

s, and

bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...

. Most

audio compression Audio compression may refer to: *Audio compression (data), a type of lossy or lossless compression in which the amount of data in a recorded waveform is reduced to differing extents for transmission respectively with or without some loss of quality ...

techniques will make radical changes to the binary encoding of an audio file, without radically affecting the way it is perceived by the human ear. A robust acoustic fingerprint will allow a recording to be identified after it has gone through such compression, even if the audio quality has been reduced significantly. For use in

monitoring, acoustic fingerprints should also be insensitive to analog

transmission Transmission or transmit may refer to: Science and technology * Power transmission ** Electric power transmission ** Transmission (mechanical device), technology that allows controlled application of power *** Automatic transmission *** Manual tra ...

artifacts.

Spectrogram

Generating a signature from the audio is essential for searching by sound. One common technique is creating a time-frequency graph called a

spectrogram A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data are represen ...

. Any piece of audio can be translated into a spectrogram. Each piece of audio is split into segments over time. In some cases, adjacent segments share a common time boundary, in other cases adjacent segments might overlap. The result is a graph that plots three dimensions of audio: frequency vs amplitude (intensity) vs time.

Shazam

Shazam's algorithm picks out points where there are peaks in the spectrogram that represent higher energy content. Focusing on peaks in the audio greatly reduces the impact that

background noise Background noise or ambient noise is any sound other than the sound being monitored (primary sound). Background noise is a form of noise pollution or interference. Background noise is an important concept in setting noise levels. Background no ...

has on audio identification. Shazam builds their fingerprint catalog out as a

hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...

, where the key is the frequency. They do not just mark a single point in the spectrogram, rather they mark a pair of points: the ''peak intensity'' plus a second ''anchor point''. So their database key is not just a single frequency, it is a hash of the frequencies of both points. This leads to fewer

hash collision In computer science, a hash collision or hash clash is when two distinct pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of ...

s improving the performance of the hash table.

Chromaprint, AccoustID, and MusicBrainz

When commercial acoustic fingerprinting companies were creating uncertainty over proprietary algorithms in the late 2000s, one of

open data Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license. The goals of the open data movement are similar to those of other "open(-so ...

service

MusicBrainz MusicBrainz is a MetaBrainz project that aims to create a collaborative music database that is similar to the freedb project. MusicBrainz was founded in response to the restrictions placed on the CDDB, Compact Disc Database (CDDB), a database for ...

' contributors, Lukáš Lalinský developed an open source algorithm Chromaprint and the

AcoustID AcoustID is a webservice for the identification of music recordings based on the Chromaprint acoustic fingerprint algorithm. It can identify entire songs but not short snippets. By 2017, the free service had 34 million "fingerprints" in-store a ...

service which uses it. MusicBrainz now uses this service.

References

External links

A Review of Algorithms for Audio Fingerprinting (P. Cano et al. In International Workshop on Multimedia Signal Processing, US Virgin Islands, December 2002)

Content-Based Retrieval of Music and Audio by Jonathan Foote, ISS, National University of Singapore.
{{Computer audition Fingerprinting algorithms ca:Empremta digital multimèdia