Time Stretching (audio)
   HOME

TheInfoList



OR:

Time stretching is the process of changing the speed or duration of an
audio signal An audio signal is a representation of sound, typically using either a changing level of electrical voltage for analog signals or a series of binary numbers for Digital signal (signal processing), digital signals. Audio signals have frequencies i ...
without affecting its pitch. Pitch scaling is the opposite: the process of changing the pitch without affecting the speed.
Pitch shift Pitch shifting is a sound recording technique in which the original pitch of a sound is raised or lowered. Effects units that raise or lower pitch by a pre-designated musical interval ( transposition) are known as pitch shifters. Pitch and ...
is pitch scaling implemented in an
effects unit An effects unit, effects processor, or effects pedal is an electronic device that alters the sound of a musical instrument or other audio source through audio signal processing. Common effects include distortion (music), distortion/overdrive, ...
and intended for live performance.
Pitch control A variable speed pitch control (or vari-speed) is a control on an audio device such as a turntable, tape recorder, or CD player that allows the operator to deviate from a standard speed (such as 33, 45 or even 78 rpm on a turntable), resulting ...
is a simpler process which affects pitch and speed simultaneously by slowing down or speeding up a recording. These processes are often used to match the pitches and tempos of two pre-recorded clips for mixing when the clips cannot be reperformed or resampled. Time stretching is often used to adjust
radio commercial In the United States, commercial radio stations make most of their revenue by selling airtime to be used for running radio advertisements. These advertisements are the result of a business or a service providing a valuable consideration, usuall ...
s and the audio of
television advertisement A television advertisement (also called a commercial, spot, break, advert, or ad) is a span of television programming produced and paid for by an organization. It conveys a message promoting, and aiming to market, a product, service or idea. ...
s to fit exactly into the 30 or 60 seconds available. It can be used to conform longer material to a designated time slot, such as a 1-hour broadcast.


Resampling

The simplest way to change the duration or pitch of an audio recording is to change the playback speed. For a
digital audio Digital audio is a representation of sound recorded in, or converted into, digital signal (signal processing), digital form. In digital audio, the sound wave of the audio signal is typically encoded as numerical sampling (signal processing), ...
recording, this can be accomplished through
sample rate conversion Sample-rate conversion, sampling-frequency conversion or resampling is the process of changing the sampling rate or sampling frequency of a discrete signal to obtain a new discrete representation of the underlying continuous signal. Application a ...
. When using this method, the frequencies in the recording are always scaled at the same ratio as the speed, transposing its perceived pitch up or down in the process. Slowing down the recording to increase duration also lowers the pitch, while speeding it up for a shorter duration respectively raises the pitch, creating the so-called Chipmunk effect. When resampling audio to a notably lower pitch, it may be preferred that the source audio is of a higher sample rate, as slowing down the playback rate will reproduce an audio signal of a lower resolution, and therefore reduce the perceived clarity of the sound. On the contrary, when resampling audio to a notably higher pitch, it may be preferred to incorporate an interpolation filter, as frequencies that surpass the
Nyquist frequency In signal processing, the Nyquist frequency (or folding frequency), named after Harry Nyquist, is a characteristic of a Sampling (signal processing), sampler, which converts a continuous function or signal into a discrete sequence. For a given S ...
(determined by the sampling rate of the audio reproduction software or device) will create usually undesired sound distortions, a phenomenon that is also known as aliasing.


Frequency domain


Phase vocoder

One way of stretching the length of a signal without affecting the pitch is to build a
phase vocoder A phase vocoder is a type of vocoder-purposed algorithm which can interpolate information present in the frequency and time domains of audio signals by using phase information extracted from a frequency transform. The computer algorithm allows fr ...
after Flanagan, Golden, and Portnoff. Basic steps: #compute the instantaneous frequency/amplitude relationship of the signal using the STFT, which is the
discrete Fourier transform In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced Sampling (signal processing), samples of a function (mathematics), function into a same-length sequence of equally-spaced samples of the discre ...
of a short, overlapping and smoothly windowed block of samples; #apply some processing to the Fourier transform magnitudes and phases (like resampling the FFT blocks); and #perform an inverse STFT by taking the inverse Fourier transform on each chunk and adding the resulting waveform chunks, also called overlap and add (OLA). The phase vocoder handles
sinusoid A sine wave, sinusoidal wave, or sinusoid (symbol: ∿) is a periodic wave whose waveform (shape) is the trigonometric sine function. In mechanics, as a linear motion over time, this is '' simple harmonic motion''; as rotation, it correspond ...
components well, but early implementations introduced considerable smearing on
transient Transience or transient may refer to: Music * ''Transient'' (album), a 2004 album by Gaelle * ''Transience'' (Steven Wilson album), 2015 * Transience (Wreckless Eric album) Science and engineering * Transient state, when a process variable or ...
("beat") waveforms at all non-integer compression/expansion rates, which renders the results phasey and diffuse. Recent improvements allow better quality results at all compression/expansion ratios but a residual smearing effect still remains. The phase vocoder technique can also be used to perform pitch shifting, chorusing, timbre manipulation, harmonizing, and other unusual modifications, all of which can be changed as a function of time.


Sinusoidal spectral modeling

Another method for time stretching relies on a spectral model of the signal. In this method, peaks are identified in frames using the STFT of the signal, and sinusoidal "tracks" are created by connecting peaks in adjacent frames. The tracks are then re-synthesized at a new time scale. This method can yield good results on both polyphonic and percussive material, especially when the signal is separated into sub-bands. However, this method is more computationally demanding than other methods.


Time domain


SOLA

Rabiner Rabiner is a surname. Notable people with the surname include: * Igor Rabiner Igor Yakovlevich Rabiner (; born 13 February 1973 in Moscow) is a Russian association football, football journalist and writer known for his work with Sport-Express an ...
and Schafer in 1978 put forth an alternate solution that works in the
time domain In mathematics and signal processing, the time domain is a representation of how a signal, function, or data set varies with time. It is used for the analysis of mathematical functions, physical signals or time series of economic or environmental ...
: attempt to find the
period Period may refer to: Common uses * Period (punctuation) * Era, a length or span of time *Menstruation, commonly referred to as a "period" Arts, entertainment, and media * Period (music), a concept in musical composition * Periodic sentence (o ...
(or equivalently the
fundamental frequency The fundamental frequency, often referred to simply as the ''fundamental'' (abbreviated as 0 or 1 ), is defined as the lowest frequency of a Periodic signal, periodic waveform. In music, the fundamental is the musical pitch (music), pitch of a n ...
) of a given section of the wave using some
pitch detection algorithm A pitch detection algorithm (PDA) is an algorithm designed to estimate the pitch or fundamental frequency of a quasiperiodic or oscillating signal, usually a digital recording of speech or a musical note or tone. This can be done in the time do ...
(commonly the peak of the signal's
autocorrelation Autocorrelation, sometimes known as serial correlation in the discrete time case, measures the correlation of a signal with a delayed copy of itself. Essentially, it quantifies the similarity between observations of a random variable at differe ...
, or sometimes cepstral processing), and crossfade one period into another. This is called
time-domain harmonic scaling Time-domain harmonic scaling (TDHS) is a method for time-scale modification of speech (or other audio signals), allowing the apparent rate of speech articulation to be changed without affecting the pitch-contour and the time-evolution of the forman ...
or the synchronized overlap-add method (SOLA) and performs somewhat faster than the phase vocoder on slower machines but fails when the autocorrelation mis-estimates the period of a signal with complicated harmonics (such as
orchestra An orchestra (; ) is a large instrumental ensemble typical of classical music, which combines instruments from different families. There are typically four main sections of instruments: * String instruments, such as the violin, viola, cello, ...
l pieces).
Adobe Audition Adobe Audition is a digital audio workstation developed by Adobe Inc. featuring both a multitrack, non-destructive mix/edit environment and a destructive-approach waveform editing view. Origins Syntrillium Software was founded in the early 1 ...
(formerly Cool Edit Pro) seems to solve this by looking for the period closest to a center period that the user specifies, which should be an integer multiple of the tempo, and between 30 Hz and the lowest bass frequency. This is much more limited in scope than the phase vocoder based processing, but can be made much less processor intensive, for real-time applications. It provides the most coherent results for single-pitched sounds like voice or musically monophonic instrument recordings. High-end commercial audio processing packages either combine the two techniques (for example by separating the signal into sinusoid and transient waveforms), or use other techniques based on the
wavelet A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the n ...
transform, or artificial neural network processing, producing the highest-quality time stretching.


Frame-based approach

In order to preserve an audio signal's pitch when stretching or compressing its duration, many time-scale modification (TSM) procedures follow a frame-based approach. Given an original discrete-time audio signal, this strategy's first step is to split the signal into short ''analysis frames'' of fixed length. The analysis frames are spaced by a fixed number of samples, called the ''analysis hopsize'' H_a\in\mathbb. To achieve the actual time-scale modification, the analysis frames are then temporally relocated to have a ''synthesis hopsize'' H_s\in\mathbb. This frame relocation results in a modification of the signal's duration by a ''stretching factor'' of \alpha=H_s/H_a. However, simply superimposing the unmodified analysis frames typically results in undesired artifacts such as phase discontinuities or amplitude fluctuations. To prevent these kinds of artifacts, the analysis frames are adapted to form ''synthesis frames'', prior to the reconstruction of the time-scale modified output signal. The strategy of how to derive the synthesis frames from the analysis frames is a key difference among different TSM procedures.


Speed hearing and speed talking

For the specific case of speech, time stretching can be performed using
PSOLA PSOLA (Pitch Synchronous Overlap and Add) is a digital signal processing technique used for speech processing and more specifically speech synthesis. It can be used to modify the pitch and duration of a speech signal. It was invented around 1986 ...
.
Time-compressed speech Time-compressed speech refers to an audio recording of verbal text in which the text is presented in a much shorter time interval than it would through normally-paced real time speech. The basic purpose is to make recorded speech contain more wor ...
is the representation of verbal text in compressed time. While one might expect speeding up to reduce comprehension, Herb Friedman says that "Experiments have shown that the brain works most efficiently if the information rate through the ears—via speech—is the 'average' reading rate, which is about 200–300 wpm (words per minute), yet the average rate of speech is in the neighborhood of 100–150 wpm." Listening to time-compressed speech is seen as the equivalent of
speed reading Speed reading is any of many techniques claiming to improve one's ability to read quickly. Speed-reading methods include chunking and minimizing subvocalization. The many available speed-reading training programs may utilize books, videos, ...
.


Pitch scaling

These techniques can also be used to
transpose In linear algebra, the transpose of a Matrix (mathematics), matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other ...
an audio sample while holding speed or duration constant. This may be accomplished by time stretching and then resampling back to the original length. Alternatively, the frequency of the sinusoids in a
sinusoidal model In statistics, signal processing, and time series analysis, a sinusoidal model is used to approximate a sequence ''Yi'' to a sine function: :Y_i = C + \alpha\sin(\omega T_i + \phi) + E_i where ''C'' is constant defining a mean level, α is an ...
may be altered directly, and the signal reconstructed at the appropriate time scale. Transposing can be called ''
frequency Frequency is the number of occurrences of a repeating event per unit of time. Frequency is an important parameter used in science and engineering to specify the rate of oscillatory and vibratory phenomena, such as mechanical vibrations, audio ...
scaling'' or ''
pitch shift Pitch shifting is a sound recording technique in which the original pitch of a sound is raised or lowered. Effects units that raise or lower pitch by a pre-designated musical interval ( transposition) are known as pitch shifters. Pitch and ...
ing'', depending on perspective. For example, one could move the pitch of every note up by a perfect fifth, keeping the tempo the same. One can view this transposition as "pitch shifting", "shifting" each note up 7 keys on a piano keyboard, or adding a fixed amount on the
Mel scale The mel scale (after the word ''melody'') is a perceptual scale of pitches judged by listeners to be equal in distance from one another. The reference point between this scale and normal frequency measurement is defined by assigning a percept ...
, or adding a fixed amount in linear
pitch space In music theory, pitch spaces model relationships between pitches. These models typically use distance to model the degree of relatedness, with closely related pitches placed near one another, and less closely related pitches farther apart. Depe ...
. One can view the same transposition as "frequency scaling", "scaling" (multiplying) the frequency of every note by 3/2. Musical transposition preserves the ratios of the
harmonic In physics, acoustics, and telecommunications, a harmonic is a sinusoidal wave with a frequency that is a positive integer multiple of the ''fundamental frequency'' of a periodic signal. The fundamental frequency is also called the ''1st har ...
frequencies that determine the sound's
timbre In music, timbre (), also known as tone color or tone quality (from psychoacoustics), is the perceived sound of a musical note, sound or tone. Timbre distinguishes sounds according to their source, such as choir voices and musical instrument ...
, unlike the ''frequency shift'' performed by
amplitude modulation Amplitude modulation (AM) is a signal modulation technique used in electronic communication, most commonly for transmitting messages with a radio wave. In amplitude modulation, the instantaneous amplitude of the wave is varied in proportion t ...
, which adds a fixed frequency offset to the frequency of every note. (In theory one could perform a literal ''pitch scaling'' in which the musical pitch space location is scaled higher note would be shifted at a greater interval in linear pitch space than a lower note but that is highly unusual, and not musical.) Time domain processing works much better here, as smearing is less noticeable, but scaling vocal samples distorts the
formant In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmo ...
s into a sort of
Alvin and the Chipmunks Alvin and the Chipmunks, originally David Seville and the Chipmunks and billed for their first two decades as the Chipmunks, are an American animated virtual band and media franchise first created by Ross Bagdasarian for Novelty records in ...
-like effect, which may be desirable or undesirable. A process that preserves the formants and character of a voice involves analyzing the signal with a channel vocoder or LPC vocoder plus any of several
pitch detection algorithm A pitch detection algorithm (PDA) is an algorithm designed to estimate the pitch or fundamental frequency of a quasiperiodic or oscillating signal, usually a digital recording of speech or a musical note or tone. This can be done in the time do ...
s and then resynthesizing it at a different fundamental frequency. A detailed description of older analog recording techniques for pitch shifting can be found at .


DJing

Time stretching and pitch scaling is used extensively by DJs in addition to
beatmixing Beatmatching or pitch cue is a DJ technique of pitch shifting or time stretching an upcoming track to match its tempo to that of the currently playing track, and to adjust them such that the beats (and, usually, the bars) are synchronized—e ...
when playing and creating
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
. In order to seamlessly blend two tracks together, the tempo of a track can be adjusted to match another track such that the beats line up. Pitch scaling is commonly used to retain the pitch of a track. Pitch scaling is also used by DJs for
harmonic mixing Harmonic mixing or key mixing (also referred to as mixing in key) is a DJ technique of matching the musical key of tracks in a DJ mix to avoid dissonance and create harmonious mixes or mashups. Tracks may be matched if they are in the same ...
, to transform tracks into compatible keys so that they sound pleasing when mixed together. Time stretching and pitch scaling are included in modern DJ hardware (
CDJs A CDJ is a specialized digital music player for DJing. Originally designed to play music from compact discs, many CDJs can play digital music files stored on USB flash drives or SD cards. In typical use, at least two CDJs are plugged into a DJ m ...
and
DJ controllers DJ controllers are devices used to help DJs mix music with DJ software using knobs, encoders, jog wheels, faders, backlit buttons, touch strips, and other components. Overview DJ controllers are microprocessor-based control surfaces used to ...
) and software (such as
VirtualDJ VirtualDJ (VDJ) is audio and video mixing software for DJs for Microsoft Windows and macOS, developed by Atomix Productions. History The first version of VirtualDJ appeared on 1 July 2003. VirtualDJ is the successor to AtomixMP3, the first v ...
,
Mixxx Mixxx is free and open-source software for DJing.James, Daniel. "Drafting Digital Media". Apress, 2009, p. 213. It is cross-platform and supports most common music file formats. Mixxx can be controlled with MIDI and HID controllers and timecod ...
,
Serato Serato (stylized in all lowercase; ) is a music software company founded in 1998 in Auckland, New Zealand by Steve West and AJ Bertenshaw. History West and Bertenshaw met in computer science class at the University of Auckland. When West created ...
and Rekordbox).


Music production

Time stretching and pitch scaling is used in
digital audio workstation A digital audio workstation (DAW ) is an electronic device or application software used for Sound recording and reproduction, recording, editing and producing audio files. DAWs come in a wide variety of configurations from a single software pr ...
software for working with music loops, sound clips which can be repeated and transposed to form a song. The pitch and tempo of multiple loops are aligned to create tracks. Notable software includes
Acid Pro Acid Pro (often stylized ACID) is a professional digital audio workstation (DAW) software program currently developed by Magix Software. It was originally called Acid pH1 and published by Sonic Foundry, later by Sony Creative Software as Acid ...
with its "Acidized" loops feature and
FL Studio FL Studio (known as FruityLoops before 2003) is a digital audio workstation (DAW) developed by the Belgian company Image-Line. It features a graphical user interface with a pattern-based music sequencer. It is available in four different ...
.


In consumer software

Pitch-corrected audio timestretch is found in every modern
web browser A web browser, often shortened to browser, is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's scr ...
as part of the
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
standard for media playback. Similar controls are ubiquitous in media applications and frameworks such as
GStreamer GStreamer is a Pipeline (computing), pipeline-based multimedia framework that links together a wide variety of media processing systems to complete complex workflows. For instance, GStreamer can be used to build a system that reads files in one f ...
and
Unity Unity is the state of being as one (either literally or figuratively). It may also refer to: Buildings * Unity Building, Oregon, Illinois, US; a historic building * Unity Building (Chicago), Illinois, US; a skyscraper * Unity Buildings, Liverpoo ...
.


See also

*
Beatmatching Beatmatching or pitch cue is a DJ technique of pitch shifting or time stretching an upcoming track to match its tempo to that of the currently playing track, and to adjust them such that the beats (and, usually, the bars) are synchronized†...
* Dynamic tonality — real-time changes of tuning and
timbre In music, timbre (), also known as tone color or tone quality (from psychoacoustics), is the perceived sound of a musical note, sound or tone. Timbre distinguishes sounds according to their source, such as choir voices and musical instrument ...
*
Pitch correction Pitch correction is an electronic effects unit or audio software that changes the intonation (highness or lowness in pitch) of an audio signal so that all pitches will be notes from the equally tempered system (i.e., like the pitches on a piano) ...
*
Scrubbing (audio) In digital audio editing, scrubbing is an interaction in which a user drags a cursor or play head across a segment of a waveform to hear it. Scrubbing is a convenient way to quickly navigate an audio file, and is a common feature of modern digi ...
*
Nightcore A nightcore (also known as sped-up song, sped-up version, sped-up remix, or sped-up edit) is a version of a music track that increases the pitch and speeds up its source material by approximately 35%. This gives an effect identical to playing a ...


References


External links


Time Stretching and Pitch Shifting Overview
A comprehensive overview of current time and pitch modification techniques by Stephan Bernsee
Stephan Bernsee's smbPitchShift C source code
C source code for doing frequency domain pitch manipulation
pitchshift.js from KievII
A Javascript pitchshifter based on smbPitchShift code, from the open sourc
KievII libraryThe Phase Vocoder: A Tutorial
- A good description of the phase vocoder
New Phase-Vocoder Techniques for Pitch-Shifting, Harmonizing and Other Exotic EffectsA new Approach to Transient Processing in the Phase VocoderHow to build a pitch shifter
Theory, equations, figures and performances of a real-time guitar pitch shifter running on a DSP chip
ZTX Time Stretching Library
Free and commercial versions of a popular 3rd party time stretching library for iOS, Linux, Windows and Mac OS X
Elastique by zplane
commercial cross-platform library, mainly used by DJ and DAW manufacturers
Voice Synth
from Qneo - specialized synthesizer for creative voice sculpting
TSM toolbox
Free MATLAB implementations of various Time-Scale Modification procedures *, a well-known algorithm for extreme (>10×) time stretching
Bungee
open source and commercial libraries for real time audio stretching
Rubber Band
— open source library for time stretching and pitch shifting
SoundTouch
— open-source library for changing the tempo, pitch and playback rate {{DEFAULTSORT:Audio time-scale pitch modification Audio engineering Digital signal processing Sound effects