Spectral modeling synthesis (SMS) is an
acoustic modeling approach for speech and other signals.
SMS considers
sounds as a combination of
harmonic
A harmonic is a wave with a frequency that is a positive integer multiple of the '' fundamental frequency'', the frequency of the original periodic signal, such as a sinusoidal wave. The original signal is also called the ''1st harmonic'', ...
content and
noise
Noise is unwanted sound considered unpleasant, loud or disruptive to hearing. From a physics standpoint, there is no distinction between noise and desired sound, as both are vibrations through a medium, such as air or water. The difference aris ...
content. Harmonic components are identified based on peaks in the
frequency spectrum
The power spectrum S_(f) of a time series x(t) describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, ...
of the signal, normally as found by the
short-time Fourier transform
The short-time Fourier transform (STFT), is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. In practice, the procedure for computing STFTs is to divi ...
. The signal that remains following removal of the spectral components, sometimes referred to as the residual, is then modeled as
white noise
In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines, ...
passed through a time-varying filter. The output of the model, then, are the frequencies and levels of the detected harmonic components and the
coefficients
In mathematics, a coefficient is a multiplicative factor in some term of a polynomial, a series, or an expression; it is usually a number, but may be any expression (including variables such as , and ). When the coefficients are themselves ...
of the time-varying filter.
Intuitively, the model can be applied to many types of audio signals. Speech signals, for example, include slowly changing harmonic sounds caused by vibration of the
vocal cords
In humans, vocal cords, also known as vocal folds or voice reeds, are folds of throat tissues that are key in creating sounds through vocalization. The size of vocal cords affects the pitch of voice. Open when breathing and vibrating for speech ...
plus wideband, noise-like sounds caused by the lips and mouth. Musical instruments also produce sounds containing both harmonic components and percussive, noise-like sounds when the notes are struck or changed.
See also
*
Speech coding
Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic ...
*
CELP
*
Source-filter model of speech production
*
FM synthesis
Frequency modulation synthesis (or FM synthesis) is a form of sound synthesis whereby the frequency of a waveform is changed by modulating its frequency with a modulator. The frequency of an oscillator is altered "in accordance with the amplitud ...
*
Sound synthesis
A synthesizer (also spelled synthesiser) is an electronic musical instrument that generates audio signals. Synthesizers typically create sounds by generating waveforms through methods including subtractive synthesis, additive synthesis and ...
SPEAR - Sinusoidal Partial Editing Analysis and Resynthesis
References
*
*
*
*
*
Speech recognition
{{science-software-stub