An acoustic fingerprint is a condensed digital summary, a
digital fingerprint,
deterministically generated from an
audio signal
An audio signal is a representation of sound, typically using either a changing level of electrical voltage for analog signals or a series of binary numbers for Digital signal (signal processing), digital signals. Audio signals have frequencies i ...
, that can be used to identify an
audio sample or quickly locate similar items in a
music database.
Practical uses of acoustic fingerprinting include identifying
song
A song is a musical composition performed by the human voice. The voice often carries the melody (a series of distinct and fixed pitches) using patterns of sound and silence. Songs have a structure, such as the common ABA form, and are usu ...
s,
melodies,
tunes, or
advertisements
Advertising is the practice and techniques employed to bring attention to a product or service. Advertising aims to present a product or service in terms of utility, advantages, and qualities of interest to consumers. It is typically us ...
;
sound effect
A sound effect (or audio effect) is an artificially created or enhanced sound, or sound process used to emphasize artistic or other content of films, television shows, live performance, animation, video games, music, or other media.
In m ...
library management; and
video file identification. Media identification using acoustic fingerprints can be used to monitor the use of specific musical works and performances on
radio broadcast,
records,
CDs,
streaming media
Streaming media refers to multimedia delivered through a Computer network, network for playback using a Media player (disambiguation), media player. Media is transferred in a ''stream'' of Network packet, packets from a Server (computing), ...
, and
peer-to-peer
Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network, forming a peer-to-peer network of Node ...
networks. This identification has been used in copyright compliance, licensing, and other
monetization schemes.
Attributes
A robust acoustic fingerprint algorithm must take into account the perceptual characteristics of the audio. If two files sound alike to the human ear, their acoustic fingerprints should match, even if their binary representations are quite different. Acoustic fingerprints are not
hash function
A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a ...
s, which are sensitive to any small changes in the data. Acoustic fingerprints are more analogous to human fingerprints where small variations that are insignificant to the features the fingerprint uses are tolerated. One can imagine the case of a smeared human fingerprint impression that can accurately be matched to another fingerprint sample in a reference database; acoustic fingerprints work similarly.
Perceptual characteristics often exploited by audio fingerprints include average
zero crossing
A zero-crossing is a point where the sign of a mathematical function changes (e.g. from positive to negative), represented by an intercept of the axis (zero value) in the graph of the function. It is a commonly used term in electronics, mathema ...
rate, estimated
tempo
In musical terminology, tempo (Italian for 'time'; plural 'tempos', or from the Italian plural), measured in beats per minute, is the speed or pace of a given musical composition, composition, and is often also an indication of the composition ...
, average
spectrum
A spectrum (: spectra or spectrums) is a set of related ideas, objects, or properties whose features overlap such that they blend to form a continuum. The word ''spectrum'' was first used scientifically in optics to describe the rainbow of co ...
,
spectral flatness, prominent tones across a set of
frequency band
Spectral bands are regions of a given spectrum, having a specific range of wavelengths or frequencies. Most often, it refers to electromagnetic bands, regions of the electromagnetic spectrum.
More generally, spectral bands may also be means in ...
s, and
bandwidth.
Most
audio compression techniques will make radical changes to the binary encoding of an audio file, without radically affecting the way it is perceived by the human ear. A robust acoustic fingerprint will allow a recording to be identified after it has gone through such compression, even if the audio quality has been reduced significantly. For use in
radio broadcast monitoring, acoustic fingerprints should also be insensitive to analog
transmission artifacts.
Spectrogram
Generating a signature from the audio is essential for
searching by sound. One common technique is creating a time-frequency graph called a
spectrogram
A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time.
When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data are represen ...
.
Any piece of audio can be translated into a spectrogram. Each piece of audio is split into segments over time. In some cases, adjacent segments share a common time boundary, in other cases adjacent segments might overlap. The result is a graph that plots three dimensions of audio: frequency vs amplitude (intensity) vs time.
Shazam
Shazam's algorithm picks out points where there are peaks in the spectrogram that represent higher energy content. Focusing on peaks in the audio greatly reduces the impact that
background noise has on audio identification. Shazam builds their fingerprint catalog out as a
hash table
In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
, where the key is the frequency. They do not just mark a single point in the spectrogram, rather they mark a pair of points: the ''peak intensity'' plus a second ''anchor point''. So their database key is not just a single frequency, it is a hash of the frequencies of both points. This leads to fewer
hash collision
In computer science, a hash collision or hash clash is when two distinct pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of ...
s improving the performance of the hash table.
Chromaprint, AccoustID, and MusicBrainz
When commercial acoustic fingerprinting companies were creating uncertainty over proprietary algorithms in the late 2000s, one of
open data
Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license.
The goals of the open data movement are similar to those of other "open(-so ...
service
MusicBrainz
MusicBrainz is a MetaBrainz project that aims to create a collaborative music database that is similar to the freedb project. MusicBrainz was founded in response to the restrictions placed on the CDDB, Compact Disc Database (CDDB), a database for ...
' contributors,
Lukáš Lalinský developed an open source algorithm Chromaprint and the
AcoustID service which uses it.
MusicBrainz now uses this service.
See also
*
Automatic content recognition
Automatic content recognition (ACR) is a technology used to identify content played on a media device or presented within a media file. Devices with ACR can allow for the collection of content consumption information automatically at the screen or ...
*
Digital video fingerprinting
*
Feature extraction
Feature may refer to:
Computing
* Feature recognition, could be a hole, pocket, or notch
* Feature (computer vision), could be an edge, corner or blob
* Feature (machine learning), in statistics: individual measurable properties of the phenome ...
*
Parsons code
The Parsons code, formally named the Parsons code for melodic contours, is a simple notation used to identify a piece of music through melodic motion – movements of the pitch up and down. Denys Parsons (father of Alan Parsons) developed thi ...
*
Perceptual hashing
*
Search by sound
*
Sound recognition
References
External links
A Review of Algorithms for Audio Fingerprinting (P. Cano et al. In International Workshop on Multimedia Signal Processing, US Virgin Islands, December 2002)Content-Based Retrieval of Music and Audio by Jonathan Foote, ISS, National University of Singapore.
{{Computer audition
Fingerprinting algorithms
ca:Empremta digital multimèdia