In the analysis of data, a correlogram is a
chart
A chart (sometimes known as a graph) is a graphical representation for data visualization, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". A chart can represent tabu ...
of
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
statistics.
For example, in
time series analysis
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
, a plot of the sample
autocorrelation
Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable ...
s
versus
(the time lags) is an autocorrelogram.
If
cross-correlation
In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a ''sliding dot product'' or ''sliding inner-product''. It is commonly used fo ...
is plotted, the result is called a cross-correlogram.
The correlogram is a commonly used tool for checking
randomness
In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. Individual rand ...
in a
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
. If random, autocorrelations should be near zero for any and all time-lag separations. If non-random, then one or more of the autocorrelations will be significantly non-zero.
In addition, correlograms are used in the
model identification
In statistics, identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining ...
stage for
Box–Jenkins autoregressive moving average
In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
models. Autocorrelations should be near-zero for randomness; if the analyst does not check for randomness, then the validity of many of the statistical conclusions becomes suspect. The correlogram is an excellent way of checking for such randomness.
In
multivariate analysis
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable.
Multivariate statistics concerns understanding the different aims and background of each of the dif ...
,
correlation matrices shown as
color-mapped images may also be called "correlograms" or "corrgrams".
Applications
The correlogram can help provide answers to the following questions:
* Are the data random?
* Is an observation related to an adjacent observation?
* Is an observation related to an observation twice-removed? (etc.)
* Is the observed time series
white noise
In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines, ...
?
* Is the observed time series sinusoidal?
* Is the observed time series autoregressive?
* What is an appropriate model for the observed time series?
* Is the model
::
: valid and sufficient?
* Is the formula
valid?
Importance
Randomness (along with fixed model, fixed variation, and fixed distribution) is one of the four assumptions that typically underlie all measurement processes. The randomness assumption is critically important for the following three reasons:
* Most standard
statistical test
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
s depend on randomness. The validity of the test conclusions is directly linked to the validity of the randomness assumption.
* Many commonly used statistical formulae depend on the randomness assumption, the most common formula being the formula for determining the
standard error
The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
of the sample mean:
::
where ''s'' is the
standard deviation
In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
of the data. Although heavily used, the results from using this formula are of no value unless the randomness assumption holds.
* For univariate data, the default model is
::
If the data are not random, this model is incorrect and invalid, and the estimates for the parameters (such as the constant) become nonsensical and invalid.
Estimation of autocorrelations
The autocorrelation coefficient at lag ''h'' is given by
:
where ''c
h'' is the
autocovariance function
In probability theory and statistics, given a stochastic process, the autocovariance is a function that gives the covariance of the process with itself at pairs of time points. Autocovariance is closely related to the autocorrelation of the process ...
:
and ''c''
0 is the
variance function
In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statisti ...
:
The resulting value of ''r
h'' will range between −1 and +1.
Alternate estimate
Some sources may use the following formula for the autocovariance function:
:
Although this definition has less
bias
Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
, the (1/''N'') formulation has some desirable statistical properties and is the form most commonly used in the statistics literature. See pages 20 and 49–50 in Chatfield for details.
Statistical inference with correlograms

In the same graph one can draw upper and lower bounds for autocorrelation with significance level
:
:
with
as the estimated autocorrelation at lag
.
If the autocorrelation is higher (lower) than this upper (lower) bound, the null hypothesis that there is no autocorrelation at and beyond a given lag is rejected at a significance level of
. This test is an approximate one and assumes that the time-series is
Gaussian
Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below.
There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
.
In the above, ''z''
1−''α''/2 is the quantile of the
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
; SE is the standard error, which can be computed by
Bartlett
Bartlett may refer to:
Places
*Bartlett Bay, Canada, Arctic waterway
* Wharerata, New Zealand, also known as Bartletts
United States
* Bartlett, Illinois
** Bartlett station, a commuter railroad station
* Bartlett, Iowa
Bartlett is an uninc ...
's formula for MA(''ℓ'') processes:
:
:
for
In the example plotted, we can reject the
null hypothesis
In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
that there is no autocorrelation between time-points which are separated by lags up to 4. For most longer periods one cannot reject the
null hypothesis
In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
of no autocorrelation.
Note that there are two distinct formulas for generating the confidence bands:
1. If the correlogram is being used to test for randomness (i.e., there is no
time dependence
Time is the continued sequence of existence and events that occurs in an apparently irreversible succession from the past, through the present, into the future. It is a component quantity of various measurements used to sequence events, to co ...
in the data), the following formula is recommended:
:
where ''N'' is the
sample size
Sample size determination is the act of choosing the number of observations or Replication (statistics), replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make stat ...
, ''z'' is the
quantile function
In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value equ ...
of the
standard normal distribution and α is the
significance level
In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the ...
. In this case, the confidence bands have fixed width that depends on the sample size.
2. Correlograms are also used in the model identification stage for fitting
ARIMA
Arima, officially The Royal Chartered Borough of Arima is the easternmost and second largest in area of the three boroughs of Trinidad and Tobago. It is geographically adjacent to Sangre Grande and Arouca at the south central foothills of th ...
models. In this case, a
moving average model
In time series analysis, the moving-average model (MA model), also known as moving-average process, is a common approach for modeling univariate time series. The moving-average model specifies that the output variable is cross-correlated with a ...
is assumed for the data and the following confidence bands should be generated:
:
where ''k'' is the lag. In this case, the confidence bands increase as the lag increases.
Software
Correlograms are available in most general purpose statistical libraries.
Correlograms:
*
python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
pandas
Pediatric autoimmune neuropsychiatric disorders associated with streptococcal infections (PANDAS) is a controversial hypothetical diagnosis for a subset of children with rapid onset of obsessive-compulsive disorder (OCD) or tic disorders. Sy ...
:
pandas.plotting.autocorrelation_plot
*
R: functions
acf
and
pacf
Corrgrams:
*
python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
seaborn
Seaborn is a given name and a surname. Notable persons with that name include:
Persons with the given name
* Seaborn Buckalew, Jr. (1920–2017), American judge and politician
* Seaborn McDaniel Denson (1854–1936), American musician and singing ...
:
heatmap
,
pairplot
*
R:
corrgram
Related techniques
*
Partial autocorrelation function
In time series analysis, the partial autocorrelation function (PACF) gives the partial correlation of a stationary time series with its own lagged values, regressed the values of the time series at all shorter lags. It contrasts with the autocorre ...
*
Lag plot
In the analysis of data, a correlogram is a chart of correlation statistics.
For example, in time series analysis, a plot of the sample autocorrelations r_h\, versus h\, (the time lags) is an autocorrelogram.
If cross-correlation is plotted ...
*
Spectral plot
*
Seasonal subseries plot
Seasonal subseries plots are a graphical tool to visualize and detect seasonality in a time series. Seasonal subseries plots involves the extraction of the seasons from a time series into a subseries. Based on a selected periodicity, it is an alte ...
*
Scaled Correlation In statistics, scaled correlation is a form of a coefficient of correlation applicable to data that have a temporal component such as time series. It is the average short-term correlation. If the signals have multiple components (slow and fast), sca ...
*
Variogram
In spatial statistics the theoretical variogram 2\gamma(\mathbf_1,\mathbf_2) is a function describing the degree of spatial dependence of a spatial random field or stochastic process Z(\mathbf). The semivariogram \gamma(\mathbf_1,\mathbf_2) is hal ...
References
Further reading
*
*
*
External links
Autocorrelation Plot
{{Statistics, descriptive
Statistical charts and diagrams
Autocorrelation