In the analysis of data, a correlogram is a
chart
A chart (sometimes known as a graph) is a graphics, graphical representation for data visualization, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". A chart can repres ...
of
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
statistics.
For example, in
time series analysis
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
, a plot of the sample
autocorrelation
Autocorrelation, sometimes known as serial correlation in the discrete time case, measures the correlation of a signal with a delayed copy of itself. Essentially, it quantifies the similarity between observations of a random variable at differe ...
s
versus
(the time lags) is an autocorrelogram.
If
cross-correlation
In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a ''sliding dot product'' or ''sliding inner-product''. It is commonly used f ...
is plotted, the result is called a cross-correlogram.
The correlogram is a commonly used tool for checking
randomness
In common usage, randomness is the apparent or actual lack of definite pattern or predictability in information. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. ...
in a
data set
A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
. If random, autocorrelations should be near zero for any and all time-lag separations. If non-random, then one or more of the autocorrelations will be significantly non-zero.
In addition, correlograms are used in the
model identification stage for
Box–Jenkins autoregressive moving average time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
models. Autocorrelations should be near-zero for randomness; if the analyst does not check for randomness, then the validity of many of the statistical conclusions becomes suspect. The correlogram is an excellent way of checking for such randomness.
In
multivariate analysis
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., '' multivariate random variables''.
Multivariate statistics concerns understanding the differ ...
,
correlation matrices shown as
color-mapped images may also be called "correlograms" or "corrgrams".
Applications
The correlogram can help provide answers to the following questions:
* Are the data random?
* Is an observation related to an adjacent observation?
* Is an observation related to an observation twice-removed? (etc.)
* Is the observed time series
white noise
In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used with this or similar meanings in many scientific and technical disciplines, i ...
?
* Is the observed time series sinusoidal?
* Is the observed time series autoregressive?
* What is an appropriate model for the observed time series?
* Is the model
::
: valid and sufficient?
* Is the formula
valid?
Importance
Randomness (along with fixed model, fixed variation, and fixed distribution) is one of the four assumptions that typically underlie all measurement processes. The randomness assumption is critically important for the following three reasons:
* Most standard
statistical test
A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. ...
s depend on randomness. The validity of the test conclusions is directly linked to the validity of the randomness assumption.
* Many commonly used statistical formulae depend on the randomness assumption, the most common formula being the formula for determining the
standard error
The standard error (SE) of a statistic (usually an estimator of a parameter, like the average or mean) is the standard deviation of its sampling distribution or an estimate of that standard deviation. In other words, it is the standard deviati ...
of the sample mean:
::
where ''s'' is the
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
of the data. Although heavily used, the results from using this formula are of no value unless the randomness assumption holds.
* For univariate data, the default model is
::
If the data are not random, this model is incorrect and invalid, and the estimates for the parameters (such as the constant) become nonsensical and invalid.
Estimation of autocorrelations
The autocorrelation coefficient at lag ''h'' is given by
:
where ''c
h'' is the
autocovariance function
:
and ''c''
0 is the
variance function
:
The resulting value of ''r
h'' will range between −1 and +1.
Alternate estimate
Some sources may use the following formula for the autocovariance function:
:
Although this definition has less
bias
Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...
, the (1/''N'') formulation has some desirable statistical properties and is the form most commonly used in the statistics literature. See pages 20 and 49–50 in Chatfield for details.
In contrast to the definition above, this definition allows us to compute
in a slightly more intuitive way. Consider the sample
, where
for
. Then, let
:
We then compute the Gram matrix
. Finally,
is computed as the sample mean of the
th diagonal of
. For example, the
th diagonal (the main diagonal) of
has
elements, and its sample mean corresponds to
. The
st diagonal (to the right of the main diagonal) of
has
elements, and its sample mean corresponds to
, and so on.
Statistical inference with correlograms

In the same graph one can draw upper and lower bounds for autocorrelation with significance level
:
:
with
as the estimated autocorrelation at lag
.
If the autocorrelation is higher (lower) than this upper (lower) bound, the null hypothesis that there is no autocorrelation at and beyond a given lag is rejected at a significance level of
. This test is an approximate one and assumes that the time-series is
Gaussian.
In the above, ''z''
1−''α''/2 is the quantile of the
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
; SE is the standard error, which can be computed by
Bartlett's formula for MA(''ℓ'') processes:
:
:
for
In the example plotted, we can reject the
null hypothesis
The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
that there is no autocorrelation between time-points which are separated by lags up to 4. For most longer periods one cannot reject the
null hypothesis
The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
of no autocorrelation.
Note that there are two distinct formulas for generating the confidence bands:
1. If the correlogram is being used to test for randomness (i.e., there is no
time dependence in the data), the following formula is recommended:
:
where ''N'' is the
sample size
Sample size determination or estimation is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences abo ...
, ''z'' is the
quantile function
In probability and statistics, the quantile function is a function Q: ,1\mapsto \mathbb which maps some probability x \in ,1/math> of a random variable v to the value of the variable y such that P(v\leq y) = x according to its probability distr ...
of the
standard normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac e^ ...
and α is the
significance level
In statistical hypothesis testing, a result has statistical significance when a result at least as "extreme" would be very infrequent if the null hypothesis were true. More precisely, a study's defined significance level, denoted by \alpha, is the ...
. In this case, the confidence bands have fixed width that depends on the sample size.
2. Correlograms are also used in the model identification stage for fitting
ARIMA
Arima, officially The Royal Chartered Borough of Arima is the easternmost and second largest in area of the three boroughs of Trinidad and Tobago. It is geographically adjacent to Sangre Grande and Arouca at the south central foothills of the ...
models. In this case, a
moving average model is assumed for the data and the following confidence bands should be generated:
:
where ''k'' is the lag. In this case, the confidence bands increase as the lag increases.
Software
Correlograms are available in most general purpose statistical libraries.
Correlograms:
*
python pandas:
pandas.plotting.autocorrelation_plot
*
R: functions
acf
and
pacf
Corrgrams:
*
python seaborn:
heatmap
,
pairplot
*
R:
corrgram
Related techniques
*
Partial autocorrelation function
In time series analysis, the partial autocorrelation function (PACF) gives the partial correlation of a stationary time series with its own lagged values, regressed the values of the time series at all shorter lags. It contrasts with the autocorre ...
*
Lag plot
*
Spectral plot
*
Seasonal subseries plot
*
Scaled Correlation
*
Variogram
In spatial statistics the theoretical variogram, denoted 2\gamma(\mathbf_1,\mathbf_2), is a function describing the degree of spatial dependence of a spatial random field or stochastic process Z(\mathbf). The semivariogram \gamma(\mathbf_1,\ma ...
References
Further reading
*
*
*
External links
Autocorrelation Plot
{{Statistics, descriptive
Statistical charts and diagrams
Autocorrelation