statistical signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing '' signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...

, the goal of spectral density estimation (SDE) or simply spectral estimation is to

estimate Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...

the

spectral density The power spectrum S_(f) of a time series x(t) describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, ...

(also known as the

power spectral density The power spectrum S_(f) of a time series x(t) describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, ...

) of a signal from a sequence of time samples of the signal. Intuitively speaking, the spectral density characterizes the

frequency Frequency is the number of occurrences of a repeating event per unit of time. It is also occasionally referred to as ''temporal frequency'' for clarity, and is distinct from ''angular frequency''. Frequency is measured in hertz (Hz) which is eq ...

content of the signal. One purpose of estimating the spectral density is to detect any periodicities in the data, by observing peaks at the frequencies corresponding to these periodicities. Some SDE techniques assume that a signal is composed of a limited (usually small) number of generating frequencies plus noise and seek to find the location and intensity of the generated frequencies. Others make no assumption on the number of components and seek to estimate the whole generating spectrum.

Overview

Comparison of periodogram and Welch methods of spectral density estimation

Spectrum analysis, also referred to as

frequency domain In physics, electronics, control systems engineering, and statistics, the frequency domain refers to the analysis of mathematical functions or signals with respect to frequency, rather than time. Put simply, a time-domain graph shows how a s ...

analysis or spectral density estimation, is the technical process of decomposing a complex signal into simpler parts. As described above, many physical processes are best described as a sum of many individual frequency components. Any process that quantifies the various amounts (e.g. amplitudes, powers, intensities) versus frequency (or phase) can be called spectrum analysis. Spectrum analysis can be performed on the entire signal. Alternatively, a signal can be broken into short segments (sometimes called ''frames''), and spectrum analysis may be applied to these individual segments.

Periodic function A periodic function is a function that repeats its values at regular intervals. For example, the trigonometric functions, which repeat at intervals of 2\pi radians, are periodic functions. Periodic functions are used throughout science to des ...

s (such as

\sin (t)

) are particularly well-suited for this sub-division. General mathematical techniques for analyzing non-periodic functions fall into the category of Fourier analysis. The Fourier transform of a function produces a frequency spectrum which contains all of the information about the original signal, but in a different form. This means that the original function can be completely reconstructed (''synthesized'') by an

inverse Fourier transform In mathematics, the Fourier inversion theorem says that for many types of functions it is possible to recover a function from its Fourier transform. Intuitively it may be viewed as the statement that if we know all frequency and phase information a ...

. For perfect reconstruction, the spectrum analyzer must preserve both the

amplitude The amplitude of a periodic variable is a measure of its change in a single period (such as time or spatial period). The amplitude of a non-periodic signal is its magnitude compared with a reference value. There are various definitions of am ...

and phase of each frequency component. These two pieces of information can be represented as a 2-dimensional vector, as a

complex number In mathematics, a complex number is an element of a number system that extends the real numbers with a specific element denoted , called the imaginary unit and satisfying the equation i^= -1; every complex number can be expressed in the fo ...

, or as magnitude (amplitude) and phase in

polar coordinates In mathematics, the polar coordinate system is a two-dimensional coordinate system in which each point on a plane is determined by a distance from a reference point and an angle from a reference direction. The reference point (analogous to th ...

(i.e., as a phasor). A common technique in signal processing is to consider the squared amplitude, or

power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may a ...

; in this case the resulting plot is referred to as a

power spectrum The power spectrum S_(f) of a time series x(t) describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, ...

. Because of reversibility, the Fourier transform is called a ''representation'' of the function, in terms of frequency instead of time; thus, it is a

representation. Linear operations that could be performed in the time domain have counterparts that can often be performed more easily in the frequency domain. Frequency analysis also simplifies the understanding and interpretation of the effects of various time-domain operations, both linear and non-linear. For instance, only non-linear or

time-variant A time-variant system is a system whose output response depends on moment of observation as well as moment of input signal application. In other words, a time delay or time advance of input not only shifts the output signal in time but also change ...

operations can create new frequencies in the frequency spectrum. In practice, nearly all software and electronic devices that generate frequency spectra utilize a

discrete Fourier transform In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a comple ...

(DFT), which operates on samples of the signal, and which provides a mathematical approximation to the full integral solution. The DFT is almost invariably implemented by an efficient algorithm called '' fast Fourier transform'' (FFT). The array of squared-magnitude components of a DFT is a type of power spectrum called

periodogram In signal processing, a periodogram is an estimate of the spectral density of a signal. The term was coined by Arthur Schuster in 1898. Today, the periodogram is a component of more sophisticated methods (see spectral estimation). It is the most ...

, which is widely used for examining the frequency characteristics of noise-free functions such as filter impulse responses and window functions. But the periodogram does not provide processing-gain when applied to noiselike signals or even sinusoids at low signal-to-noise ratios. In other words, the variance of its spectral estimate at a given frequency does not decrease as the number of samples used in the computation increases. This can be mitigated by averaging over time (

Welch's method Welch's method, named after Peter D. Welch, is an approach for spectral density estimation. It is used in physics, engineering, and applied mathematics for estimating the power of a signal at different frequencies. The method is based on the ...

) or over frequency (

smoothing In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. In smoothing, the dat ...

). Welch's method is widely used for spectral density estimation (SDE). However, periodogram-based techniques introduce small biases that are unacceptable in some applications. So other alternatives are presented in the next section.

Techniques

Many other techniques for spectral estimation have been developed to mitigate the disadvantages of the basic periodogram. These techniques can generally be divided into ''

non-parametric Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distri ...

,'' '' parametric,'' and more recently semi-parametric (also called sparse) methods. The non-parametric approaches explicitly estimate the

covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the ...

or the spectrum of the process without assuming that the process has any particular structure. Some of the most common estimators in use for basic applications (e.g.

) are non-parametric estimators closely related to the periodogram. By contrast, the parametric approaches assume that the underlying

stationary stochastic process In mathematics and statistics, a stationary process (or a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Con ...

has a certain structure that can be described using a small number of parameters (for example, using an auto-regressive or moving average model). In these approaches, the task is to estimate the parameters of the model that describes the stochastic process. When using the semi-parametric methods, the underlying process is modeled using a non-parametric framework, with the additional assumption that the number of non-zero components of the model is small (i.e., the model is sparse). Similar approaches may also be used for missing data recovery as well as

signal reconstruction In signal processing, reconstruction usually means the determination of an original continuous signal from a sequence of equally spaced samples. This article takes a generalized abstract mathematical approach to signal sampling and reconstructi ...

. Following is a partial list of non-parametric spectral density estimation techniques: *

Periodogram In signal processing, a periodogram is an estimate of the spectral density of a signal. The term was coined by Arthur Schuster in 1898. Today, the periodogram is a component of more sophisticated methods (see spectral estimation). It is the most ...

, the

modulus squared In mathematics, a square is the result of multiplying a number by itself. The verb "to square" is used to denote this operation. Squaring is the same as raising to the power 2, and is denoted by a superscript 2; for instance, the square ...

of the discrete Fourier transform ** Lomb–Scargle periodogram, for which data need not be equally spaced * Bartlett's method is the average of the periodograms taken of multiple segments of the signal to reduce variance of the spectral density estimate *

a windowed version of Bartlett's method that uses overlapping segments * Multitaper is a periodogram-based method that uses multiple tapers, or windows, to form independent estimates of the spectral density to reduce variance of the spectral density estimate *

Least-squares spectral analysis Least-squares spectral analysis (LSSA) is a method of estimating a frequency spectrum, based on a least squares fit of sinusoids to data samples, similar to Fourier analysis. Fourier analysis, the most used spectral method in science, generally ...

, based on least squares fitting to known frequencies * Non-uniform discrete Fourier transform is used when the signal samples are unevenly spaced in time * Singular spectrum analysis is a nonparametric method that uses a

singular value decomposition In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \ m \times n\ matrix. It is re ...

of the covariance matrix to estimate the spectral density *

Short-time Fourier transform The short-time Fourier transform (STFT), is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. In practice, the procedure for computing STFTs is to divi ...

* Critical filter is a nonparametric method based on

information field theory Information field theory (IFT) is a Bayesian statistical field theory relating to signal reconstruction, cosmography, and other related areas. IFT summarizes the information available on a physical field using Bayesian probabilities. It uses compu ...

that can deal with noise, incomplete data, and instrumental response functions Below is a partial list of parametric techniques: *

Autoregressive model In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...

(AR) estimation, which assumes that the ''n''th sample is correlated with the previous ''p'' samples. *

Moving-average model In time series analysis, the moving-average model (MA model), also known as moving-average process, is a common approach for modeling univariate time series. The moving-average model specifies that the output variable is cross-correlated with ...

(MA) estimation, which assumes that the ''n''th sample is correlated with noise terms in the previous ''p'' samples. *

Autoregressive moving average In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...

(ARMA) estimation, which generalizes the AR and MA models. * MUltiple SIgnal Classification (MUSIC) is a popular superresolution method. *

Maximum entropy spectral estimation Maximum entropy spectral estimation is a method of spectral density estimation. The goal is to improve the spectral quality based on the principle of maximum entropy. The method is based on choosing the spectrum which corresponds to the most rando ...

is an ''all-poles'' method useful for SDE when singular spectral features, such as sharp peaks, are expected. And finally some examples of semi-parametric techniques: * SParse Iterative Covariance-based Estimation (SPICE) estimation, and the more generalized

(r,q)

-SPICE. *Iterative Adaptive Approach (IAA) estimation. *

Lasso A lasso ( or ), also called lariat, riata, or reata (all from Castilian, la reata 're-tied rope'), is a loop of rope designed as a restraint to be thrown around a target and tightened when pulled. It is a well-known tool of the Spanish an ...

, similar to

but with a sparsity enforcing penalty.

Parametric estimation

In parametric spectral estimation, one assumes that the signal is modeled by a

stationary process In mathematics and statistics, a stationary process (or a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Con ...

which has a spectral density function (SDF)

S(f; a_1, \ldots, a_p)

that is a function of the frequency

f

and

p

parameters

a_1, \ldots, a_p

. The estimation problem then becomes one of estimating these parameters. The most common form of parametric SDF estimate uses as a model an

autoregressive model In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...

\text(p)

of order

p

. A signal sequence

\

obeying a zero mean

\text(p)

process satisfies the equation :

Y_t = \phi_1Y_ + \phi_2Y_ + \cdots + \phi_pY_ + \epsilon_t,

where the

\phi_1,\ldots,\phi_p

are fixed coefficients and

\epsilon_t

is a white noise process with zero mean and ''innovation variance''

\sigma^2_p

. The SDF for this process is :

S(f; \phi_1, \ldots, \phi_p, \sigma^2_p) =
    \frac \qquad , f,  < f_N,

with

\Delta t

the sampling time interval and

f_N

the

Nyquist frequency In signal processing, the Nyquist frequency (or folding frequency), named after Harry Nyquist, is a characteristic of a sampler, which converts a continuous function or signal into a discrete sequence. In units of cycles per second ( Hz), it ...

. There are a number of approaches to estimating the parameters

\phi_1, \ldots, \phi_p,\sigma^2_p

of the

\text(p)

process and thus the spectral density: * The '' Yule-Walker estimators'' are found by recursively solving the Yule-Walker equations for an

\text(p)

process * The ''Burg estimators'' are found by treating the Yule-Walker equations as a form of ordinary least squares problem. The Burg estimators are generally considered superior to the Yule-Walker estimators. Burg associated these with

maximum entropy spectral estimation Maximum entropy spectral estimation is a method of spectral density estimation. The goal is to improve the spectral quality based on the principle of maximum entropy. The method is based on choosing the spectrum which corresponds to the most rando ...

.Burg, J.P. (1967) "Maximum Entropy Spectral Analysis", ''Proceedings of the 37th Meeting of the Society of Exploration Geophysicists'', Oklahoma City, Oklahoma. * The ''forward-backward least-squares estimators'' treat the

\text(p)

process as a regression problem and solves that problem using forward-backward method. They are competitive with the Burg estimators. * The ''maximum likelihood estimators'' estimate the parameters using a

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...

approach. This involves a nonlinear optimization and is more complex than the first three. Alternative parametric methods include fitting to a

moving average model In time series analysis, the moving-average model (MA model), also known as moving-average process, is a common approach for modeling univariate time series. The moving-average model specifies that the output variable is cross-correlated with a ...

(MA) and to a full

autoregressive moving average model In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...

(ARMA).

Frequency estimation

Frequency estimation is the process of

estimating Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is der ...

the

, amplitude, and phase-shift of a

signal In signal processing, a signal is a function that conveys information about a phenomenon. Any quantity that can vary over space or time can be used as a signal to share messages between observers. The '' IEEE Transactions on Signal Processing' ...

in the presence of

noise Noise is unwanted sound considered unpleasant, loud or disruptive to hearing. From a physics standpoint, there is no distinction between noise and desired sound, as both are vibrations through a medium, such as air or water. The difference aris ...

given assumptions about the number of the components.Hayes, Monson H., ''Statistical Digital Signal Processing and Modeling'', John Wiley & Sons, Inc., 1996. . This contrasts with the general methods above, which do not make prior assumptions about the components.

Single tone

If one only wants to estimate the single loudest frequency, one can use a

pitch detection algorithm Pitch may refer to: Acoustic frequency * Pitch (music), the perceived frequency of sound including "definite pitch" and "indefinite pitch" ** Absolute pitch or "perfect pitch" ** Pitch class, a set of all pitches that are a whole number of octav ...

. If the dominant frequency changes over time, then the problem becomes the estimation of the

instantaneous frequency Instantaneous phase and frequency are important concepts in signal processing that occur in the context of the representation and analysis of time-varying functions. The instantaneous phase (also known as local phase or simply phase) of a ''comple ...

as defined in the

time–frequency representation A time–frequency representation (TFR) is a view of a signal (taken to be a function of time) represented over both time and frequency. Time–frequency analysis means analysis into the time–frequency domain provided by a TFR. This is achieved b ...

. Methods for instantaneous frequency estimation include those based on the Wigner-Ville distribution and higher order ambiguity functions. If one wants to know ''all'' the (possibly complex) frequency components of a received signal (including transmitted signal and noise), one uses a multiple-tone approach.

Multiple tones

A typical model for a signal

x(n)

consists of a sum of

p

complex exponentials in the presence of

white noise In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines ...

w(n)

x(n) = \sum_^p A_i e^ + w(n)

. The power spectral density of

x(n)

is composed of

p

impulse function In mathematics, the Dirac delta distribution ( distribution), also known as the unit impulse, is a generalized function or distribution over the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire ...

s in addition to the spectral density function due to noise. The most common methods for frequency estimation involve identifying the noise subspace to extract these components. These methods are based on eigen decomposition of the autocorrelation matrix into a signal subspace and a noise subspace. After these subspaces are identified, a frequency estimation function is used to find the component frequencies from the noise subspace. The most popular methods of noise subspace based frequency estimation are Pisarenko's method, the multiple signal classification (MUSIC) method, the eigenvector method, and the minimum norm method. ; Pisarenko's method:

\hat_\text\left(e^\right) = \frac

;

MUSIC Music is generally defined as the art of arranging sound to create some combination of form, harmony, melody, rhythm or otherwise expressive content. Exact definitions of music vary considerably around the world, though it is an aspe ...

\hat_\text\left(e^\right) = \frac

, ; Eigenvector method:

\hat_\text\left(e^\right) = \frac

; Minimum norm method:

\hat_\text\left(e^\right) = \frac ; \  \mathbf = \lambda \mathbf_n \mathbf_1

Example calculation

Suppose

x_n

, from

n=0

N-1

is a time series (discrete time) with zero mean. Suppose that it is a sum of a finite number of periodic components (all frequencies are positive): :

\begin
x_n &= \sum_k A_k \sin(2\pi\nu_k n + \phi_k)\\
    &= \sum_k A_k \left ( \sin (\phi_k) \cos(2\pi\nu_k n) +  \cos(\phi_k) \sin(2\pi\nu_k n) \right ) \\
    &= \sum_k \left(\overbrace^ \cos(2\pi\nu_k n) + \overbrace^ \sin(2\pi\nu_k n)\right)
\end

The variance of

x_n

is, for a zero-mean function as above, given by :

\frac \sum_^ x_n^2.

If these data were samples taken from an electrical signal, this would be its average power (power is energy per unit time, so it is analogous to variance if energy is analogous to the amplitude squared). Now, for simplicity, suppose the signal extends infinitely in time, so we pass to the limit as

N\to \infty.

If the average power is bounded, which is almost always the case in reality, then the following limit exists and is the variance of the data. :

\lim_ \frac \sum_^ x_n^2.

Again, for simplicity, we will pass to continuous time, and assume that the signal extends infinitely in time in both directions. Then these two formulas become :

x(t) = \sum_k A_k \sin(2\pi\nu_k t + \phi_k)

and :

\lim_ \frac \int_^T x(t)^2 dt.

The root mean square of

\sin

1/\sqrt

, so the variance of

A_k \sin(2\pi\nu_k t + \phi_k)

\tfrac A_k^2.

Hence, the contribution to the average power of

x(t)

coming from the component with frequency

\nu_k

\tfracA_k^2.

All these contributions add up to the average power of

x(t).

Then the power as a function of frequency is

\tfracA_k^2,

and its statistical cumulative distribution function

S(\nu)

will be :

S(\nu) = \sum _  \frac A_k^2.

S

is a

step function In mathematics, a function on the real numbers is called a step function if it can be written as a finite linear combination of indicator functions of intervals. Informally speaking, a step function is a piecewise constant function having onl ...

, monotonically non-decreasing. Its jumps occur at the frequencies of the periodic components of

x

, and the value of each jump is the power or variance of that component. The variance is the covariance of the data with itself. If we now consider the same data but with a lag of

\tau

, we can take the

x(t)

with

x(t + \tau)

, and define this to be the

autocorrelation function Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variabl ...

c

of the signal (or data)

x

: :

c(\tau) = \lim_ \frac \int_^T x(t) x(t + \tau) dt.

If it exists, it is an even function of

\tau.

If the average power is bounded, then

c

exists everywhere, is finite, and is bounded by

c(0),

which is the average power or variance of the data. It can be shown that

c

can be decomposed into periodic components with the same periods as

x

: :

c(\tau) = \sum_k \frac A_k^2 \cos(2\pi\nu_k\tau).

This is in fact the spectral decomposition of

c

over the different frequencies, and is related to the distribution of power of

x

over the frequencies: the amplitude of a frequency component of

c

is its contribution to the average power of the signal. The power spectrum of this example is not continuous, and therefore does not have a derivative, and therefore this signal does not have a power spectral density function. In general, the power spectrum will usually be the sum of two parts: a line spectrum such as in this example, which is not continuous and does not have a density function, and a residue, which is absolutely continuous and does have a density function.