An acoustic fingerprint is a condensed digital summary, a digital fingerprint, deterministically generated from an

audio signal An audio signal is a representation of sound, typically using either a changing level of electrical voltage for analog signals or a series of binary numbers for Digital signal (signal processing), digital signals. Audio signals have frequencies i ...

, that can be used to identify an audio sample or quickly locate similar items in a music database. Practical uses of acoustic fingerprinting include identifying

song A song is a musical composition performed by the human voice. The voice often carries the melody (a series of distinct and fixed pitches) using patterns of sound and silence. Songs have a structure, such as the common ABA form, and are usu ...

s, melodies, tunes, or

advertisements Advertising is the practice and techniques employed to bring attention to a product or service. Advertising aims to present a product or service in terms of utility, advantages, and qualities of interest to consumers. It is typically us ...

;

sound effect A sound effect (or audio effect) is an artificially created or enhanced sound, or sound process used to emphasize artistic or other content of films, television shows, live performance, animation, video games, music, or other media. In m ...

library management; and video file identification. Media identification using acoustic fingerprints can be used to monitor the use of specific musical works and performances on radio broadcast, records, CDs,

streaming media Streaming media refers to multimedia delivered through a Computer network, network for playback using a Media player (disambiguation), media player. Media is transferred in a ''stream'' of Network packet, packets from a Server (computing), ...

, and

peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network, forming a peer-to-peer network of Node ...

networks. This identification has been used in copyright compliance, licensing, and other monetization schemes.

Attributes

A robust acoustic fingerprint algorithm must take into account the perceptual characteristics of the audio. If two files sound alike to the human ear, their acoustic fingerprints should match, even if their binary representations are quite different. Acoustic fingerprints are not

hash function A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a ...

s, which are sensitive to any small changes in the data. Acoustic fingerprints are more analogous to human fingerprints where small variations that are insignificant to the features the fingerprint uses are tolerated. One can imagine the case of a smeared human fingerprint impression that can accurately be matched to another fingerprint sample in a reference database; acoustic fingerprints work similarly. Perceptual characteristics often exploited by audio fingerprints include average

zero crossing A zero-crossing is a point where the sign of a mathematical function changes (e.g. from positive to negative), represented by an intercept of the axis (zero value) in the graph of the function. It is a commonly used term in electronics, mathema ...

rate, estimated

tempo In musical terminology, tempo (Italian for 'time'; plural 'tempos', or from the Italian plural), measured in beats per minute, is the speed or pace of a given musical composition, composition, and is often also an indication of the composition ...

, average

spectrum A spectrum (: spectra or spectrums) is a set of related ideas, objects, or properties whose features overlap such that they blend to form a continuum. The word ''spectrum'' was first used scientifically in optics to describe the rainbow of co ...

, spectral flatness, prominent tones across a set of

frequency band Spectral bands are regions of a given spectrum, having a specific range of wavelengths or frequencies. Most often, it refers to electromagnetic bands, regions of the electromagnetic spectrum. More generally, spectral bands may also be means in ...

s, and bandwidth. Most audio compression techniques will make radical changes to the binary encoding of an audio file, without radically affecting the way it is perceived by the human ear. A robust acoustic fingerprint will allow a recording to be identified after it has gone through such compression, even if the audio quality has been reduced significantly. For use in radio broadcast monitoring, acoustic fingerprints should also be insensitive to analog transmission artifacts.

Spectrogram

Generating a signature from the audio is essential for searching by sound. One common technique is creating a time-frequency graph called a

spectrogram A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data are represen ...

. Any piece of audio can be translated into a spectrogram. Each piece of audio is split into segments over time. In some cases, adjacent segments share a common time boundary, in other cases adjacent segments might overlap. The result is a graph that plots three dimensions of audio: frequency vs amplitude (intensity) vs time.

Shazam

Shazam's algorithm picks out points where there are peaks in the spectrogram that represent higher energy content. Focusing on peaks in the audio greatly reduces the impact that background noise has on audio identification. Shazam builds their fingerprint catalog out as a

hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...

, where the key is the frequency. They do not just mark a single point in the spectrogram, rather they mark a pair of points: the ''peak intensity'' plus a second ''anchor point''. So their database key is not just a single frequency, it is a hash of the frequencies of both points. This leads to fewer

hash collision In computer science, a hash collision or hash clash is when two distinct pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of ...

s improving the performance of the hash table.

Chromaprint, AccoustID, and MusicBrainz

When commercial acoustic fingerprinting companies were creating uncertainty over proprietary algorithms in the late 2000s, one of

open data Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license. The goals of the open data movement are similar to those of other "open(-so ...

service

MusicBrainz MusicBrainz is a MetaBrainz project that aims to create a collaborative music database that is similar to the freedb project. MusicBrainz was founded in response to the restrictions placed on the CDDB, Compact Disc Database (CDDB), a database for ...

' contributors, Lukáš Lalinský developed an open source algorithm Chromaprint and the AcoustID service which uses it. MusicBrainz now uses this service.

References

External links

A Review of Algorithms for Audio Fingerprinting (P. Cano et al. In International Workshop on Multimedia Signal Processing, US Virgin Islands, December 2002)

Content-Based Retrieval of Music and Audio by Jonathan Foote, ISS, National University of Singapore.
{{Computer audition Fingerprinting algorithms ca:Empremta digital multimèdia