Acoustic Fingerprinting
   HOME

TheInfoList



OR:

An acoustic fingerprint is a condensed digital summary, a digital fingerprint, deterministically generated from an
audio signal An audio signal is a representation of sound, typically using either a changing level of electrical voltage for analog signals or a series of binary numbers for Digital signal (signal processing), digital signals. Audio signals have frequencies i ...
, that can be used to identify an
audio sample In sound and music, sampling is the reuse of a portion (or sample) of a sound recording in another recording. Samples may comprise elements such as rhythm, melody, speech, or sound effects. A sample might comprise only a fragment of sound, or a l ...
or quickly locate similar items in a
music database Below is a table of online music databases that are largely free of charge. Many of the sites provide a specialized service or focus on a particular music genre. Some of these operate as an online music store or purchase referral service in some ...
. Practical uses of acoustic fingerprinting include identifying
song A song is a musical composition performed by the human voice. The voice often carries the melody (a series of distinct and fixed pitches) using patterns of sound and silence. Songs have a structure, such as the common ABA form, and are usu ...
s,
melodies A melody (), also tune, voice, or line, is a linear succession of musical tones that the listener perceives as a single entity. In its most literal sense, a melody is a combination of pitch and rhythm, while more figuratively, the term ca ...
, tunes, or
advertisements Advertising is the practice and techniques employed to bring attention to a product or service. Advertising aims to present a product or service in terms of utility, advantages, and qualities of interest to consumers. It is typically us ...
;
sound effect A sound effect (or audio effect) is an artificially created or enhanced sound, or sound process used to emphasize artistic or other content of films, television shows, live performance, animation, video games, music, or other media. In m ...
library management; and
video file A video file format is a type of file format for storing digital video data on a computer system. Video is almost always stored using lossy compression to reduce the file size. A video file normally consists of a container (e.g. in the Matro ...
identification. Media identification using acoustic fingerprints can be used to monitor the use of specific musical works and performances on
radio broadcast Radio broadcasting is the broadcasting of audio (sound), sometimes with related metadata, by radio waves to radio receivers belonging to a public audience. In terrestrial radio broadcasting the radio waves are broadcast by a land-based radio ...
, records, CDs,
streaming media Streaming media refers to multimedia delivered through a Computer network, network for playback using a Media player (disambiguation), media player. Media is transferred in a ''stream'' of Network packet, packets from a Server (computing), ...
, and
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network, forming a peer-to-peer network of Node ...
networks. This identification has been used in copyright compliance, licensing, and other
monetization Monetization ( also spelled monetisation in the UK) is, broadly speaking, the process of converting something into money. The term has a broad range of uses. In banking, the term refers to the process of converting or establishing something into ...
schemes.


Attributes

A robust acoustic fingerprint algorithm must take into account the perceptual characteristics of the audio. If two files sound alike to the human ear, their acoustic fingerprints should match, even if their binary representations are quite different. Acoustic fingerprints are not
hash function A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a ...
s, which are sensitive to any small changes in the data. Acoustic fingerprints are more analogous to human fingerprints where small variations that are insignificant to the features the fingerprint uses are tolerated. One can imagine the case of a smeared human fingerprint impression that can accurately be matched to another fingerprint sample in a reference database; acoustic fingerprints work similarly. Perceptual characteristics often exploited by audio fingerprints include average
zero crossing A zero-crossing is a point where the sign of a mathematical function changes (e.g. from positive to negative), represented by an intercept of the axis (zero value) in the graph of the function. It is a commonly used term in electronics, mathema ...
rate, estimated
tempo In musical terminology, tempo (Italian for 'time'; plural 'tempos', or from the Italian plural), measured in beats per minute, is the speed or pace of a given musical composition, composition, and is often also an indication of the composition ...
, average
spectrum A spectrum (: spectra or spectrums) is a set of related ideas, objects, or properties whose features overlap such that they blend to form a continuum. The word ''spectrum'' was first used scientifically in optics to describe the rainbow of co ...
,
spectral flatness Spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used in digital signal processing to characterize an audio spectrum. Spectral flatness is typically measured in decibels, and provides a way to quantify how mu ...
, prominent tones across a set of
frequency band Spectral bands are regions of a given spectrum, having a specific range of wavelengths or frequencies. Most often, it refers to electromagnetic bands, regions of the electromagnetic spectrum. More generally, spectral bands may also be means in ...
s, and
bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...
. Most
audio compression Audio compression may refer to: *Audio compression (data), a type of lossy or lossless compression in which the amount of data in a recorded waveform is reduced to differing extents for transmission respectively with or without some loss of quality ...
techniques will make radical changes to the binary encoding of an audio file, without radically affecting the way it is perceived by the human ear. A robust acoustic fingerprint will allow a recording to be identified after it has gone through such compression, even if the audio quality has been reduced significantly. For use in
radio broadcast Radio broadcasting is the broadcasting of audio (sound), sometimes with related metadata, by radio waves to radio receivers belonging to a public audience. In terrestrial radio broadcasting the radio waves are broadcast by a land-based radio ...
monitoring, acoustic fingerprints should also be insensitive to analog
transmission Transmission or transmit may refer to: Science and technology * Power transmission ** Electric power transmission ** Transmission (mechanical device), technology that allows controlled application of power *** Automatic transmission *** Manual tra ...
artifacts.


Spectrogram

Generating a signature from the audio is essential for searching by sound. One common technique is creating a time-frequency graph called a
spectrogram A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data are represen ...
. Any piece of audio can be translated into a spectrogram. Each piece of audio is split into segments over time. In some cases, adjacent segments share a common time boundary, in other cases adjacent segments might overlap. The result is a graph that plots three dimensions of audio: frequency vs amplitude (intensity) vs time.


Shazam

Shazam's algorithm picks out points where there are peaks in the spectrogram that represent higher energy content. Focusing on peaks in the audio greatly reduces the impact that
background noise Background noise or ambient noise is any sound other than the sound being monitored (primary sound). Background noise is a form of noise pollution or interference. Background noise is an important concept in setting noise levels. Background no ...
has on audio identification. Shazam builds their fingerprint catalog out as a
hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
, where the key is the frequency. They do not just mark a single point in the spectrogram, rather they mark a pair of points: the ''peak intensity'' plus a second ''anchor point''. So their database key is not just a single frequency, it is a hash of the frequencies of both points. This leads to fewer
hash collision In computer science, a hash collision or hash clash is when two distinct pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of ...
s improving the performance of the hash table.


Chromaprint, AccoustID, and MusicBrainz

When commercial acoustic fingerprinting companies were creating uncertainty over proprietary algorithms in the late 2000s, one of
open data Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license. The goals of the open data movement are similar to those of other "open(-so ...
service
MusicBrainz MusicBrainz is a MetaBrainz project that aims to create a collaborative music database that is similar to the freedb project. MusicBrainz was founded in response to the restrictions placed on the CDDB, Compact Disc Database (CDDB), a database for ...
' contributors, Lukáš Lalinský developed an open source algorithm Chromaprint and the
AcoustID AcoustID is a webservice for the identification of music recordings based on the Chromaprint acoustic fingerprint algorithm. It can identify entire songs but not short snippets. By 2017, the free service had 34 million "fingerprints" in-store a ...
service which uses it. MusicBrainz now uses this service.


See also

*
Automatic content recognition Automatic content recognition (ACR) is a technology used to identify content played on a media device or presented within a media file. Devices with ACR can allow for the collection of content consumption information automatically at the screen or ...
*
Digital video fingerprinting Video fingerprinting or video hashing are a class of dimension reduction techniques in which a system identifies, extracts and then summarizes characteristic components of a video as a unique or a set of multiple '' perceptual hashes'' or ''finger ...
*
Feature extraction Feature may refer to: Computing * Feature recognition, could be a hole, pocket, or notch * Feature (computer vision), could be an edge, corner or blob * Feature (machine learning), in statistics: individual measurable properties of the phenome ...
*
Parsons code The Parsons code, formally named the Parsons code for melodic contours, is a simple notation used to identify a piece of music through melodic motion – movements of the pitch up and down. Denys Parsons (father of Alan Parsons) developed thi ...
*
Perceptual hashing Perceptual hashing is the use of a fingerprinting algorithm that produces a snippet, hash, or fingerprint of various forms of multimedia. A perceptual hash is a type of locality-sensitive hash, which is analogous if features of the multimedia ...
*
Search by sound Search by sound is the retrieval of information based on audio input. There are a handful of applications, specifically for mobile devices that utilize search by sound. Shazam, Soundhound, Axwave, ACRCloud and others have seen considerable su ...
*
Sound recognition Sound recognition is a technology, which is based on both traditional pattern recognition theories and audio signal analysis methods. Sound recognition technologies contain preliminary data processing, feature extraction and classification algori ...


References


External links


A Review of Algorithms for Audio Fingerprinting (P. Cano et al. In International Workshop on Multimedia Signal Processing, US Virgin Islands, December 2002)

Content-Based Retrieval of Music and Audio by Jonathan Foote, ISS, National University of Singapore.
{{Computer audition Fingerprinting algorithms ca:Empremta digital multimèdia