Cointegration is a
statistical
Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industr ...
property of a collection of
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...
variables. First, all of the series must be integrated of order ''d'' (see
Order of integration
In statistics, the order of integration, denoted ''I''(''d''), of a time series is a summary statistic, which reports the minimum number of differences required to obtain a covariance-stationary series.
Integration of order ''d''
A time ...
). Next, if a
linear combination of this collection is integrated of order less than d, then the collection is said to be co-integrated. Formally, if (''X'',''Y'',''Z'') are each integrated of order ''d'', and there exist coefficients ''a'',''b'',''c'' such that is integrated of order less than d, then ''X'', ''Y'', and ''Z'' are cointegrated. Cointegration has become an important property in contemporary time series analysis. Time series often have trends—either deterministic or
stochastic
Stochastic (, ) refers to the property of being well described by a random probability distribution. Although stochasticity and randomness are distinct in that the former refers to a modeling approach and the latter refers to phenomena themselv ...
. In an influential paper, Charles Nelson and
Charles Plosser (1982) provided statistical evidence that many US macroeconomic time series (like GNP, wages, employment, etc.) have stochastic trends.
Introduction
If two or more series are individually
integrated (in the time series sense) but some
linear combination of them has a lower
order of integration
In statistics, the order of integration, denoted ''I''(''d''), of a time series is a summary statistic, which reports the minimum number of differences required to obtain a covariance-stationary series.
Integration of order ''d''
A time ...
, then the series are said to be cointegrated. A common example is where the individual series are first-order integrated () but some (cointegrating) vector of coefficients exists to form a
stationary linear combination of them. For instance, a
stock market index
In finance, a stock index, or stock market index, is an index that measures a stock market, or a subset of the stock market, that helps investors compare current stock price levels with past prices to calculate market performance.
Two of th ...
and the price of its associated
futures contract
In finance, a futures contract (sometimes called a futures) is a standardized legal contract to buy or sell something at a predetermined price for delivery at a specified time in the future, between parties not yet known to each other. The asset ...
move through time, each roughly following a
random walk
In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space.
An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...
. Testing the hypothesis that there is a
statistically significant
In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...
connection between the futures price and the spot price could now be done by testing for the existence of a cointegrated combination of the two series.
History
The first to introduce and analyse the concept of spurious—or nonsense—regression was
Udny Yule in 1926.
Before the 1980s, many economists used
linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...
s on non-stationary time series data, which Nobel laureate
Clive Granger and
Paul Newbold showed to be a dangerous approach that could produce
spurious correlation
In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but '' not'' causally related, due to either coincidence or the presence of a certain third, u ...
,
since standard detrending techniques can result in data that are still non-stationary. Granger's 1987 paper with
Robert Engle formalized the cointegrating vector approach, and coined the term.
For integrated processes, Granger and Newbold showed that de-trending does not work to eliminate the problem of spurious correlation, and that the superior alternative is to check for co-integration. Two series with trends can be co-integrated only if there is a genuine relationship between the two. Thus the standard current methodology for time series regressions is to check all-time series involved for integration. If there are series on both sides of the regression relationship, then it's possible for regressions to give misleading results.
The possible presence of cointegration must be taken into account when choosing a technique to test hypotheses concerning the relationship between two variables having
unit roots (i.e. integrated of at least order one).
The usual procedure for testing hypotheses concerning the relationship between non-stationary variables was to run
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
(OLS) regressions on data which had been differenced. This method is biased if the non-stationary variables are cointegrated.
For example, regressing the consumption series for any country (e.g. Fiji) against the GNP for a randomly selected dissimilar country (e.g. Afghanistan) might give a high
R-squared
In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
It is a statistic used ...
relationship (suggesting high explanatory power on Fiji's consumption from Afghanistan's
GNP). This is called
spurious regression: two integrated series which are not directly causally related may nonetheless show a significant correlation; this phenomenon is called spurious correlation.
Tests
The six main methods for testing for cointegration are:
Engle–Granger two-step method
If
and
are non-stationary and
order of integration
In statistics, the order of integration, denoted ''I''(''d''), of a time series is a summary statistic, which reports the minimum number of differences required to obtain a covariance-stationary series.
Integration of order ''d''
A time ...
''d''=1, then a linear combination of them must be stationary for some value of
and
. In other words:
:
where
is stationary.
If we knew
, we could just test it for stationarity with something like a
Dickey–Fuller test
In statistics, the Dickey–Fuller test tests the null hypothesis that a unit root is present in an autoregressive time series model. The alternative hypothesis is different depending on which version of the test is used, but is usually stationar ...
,
Phillips–Perron test
In statistics, the Phillips–Perron test (named after Peter C. B. Phillips and Pierre Perron) is a unit root test. That is, it is used in time series analysis to test the null hypothesis that a time series is integrated of order 1. It builds ...
and be done. But because we don't know
, we must estimate this first, generally by using
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
(by regressing
on
and an intercept) and then run our stationarity test on the estimated
series, often denoted
.
A second regression is then run on the first differenced variables from the first regression, and the lagged residuals
is included as a regressor.
Johansen test
The
Johansen test
In statistics, the Johansen test, named after Søren Johansen, is a procedure for testing cointegration of several, say ''k'', I(1) time series. This test permits more than one cointegrating relationship so is more generally applicable than th ...
is a test for cointegration that allows for more than one cointegrating relationship, unlike the Engle–Granger method, but this test is subject to asymptotic properties, i.e. large samples. If the sample size is too small then the results will not be reliable and one should use Auto Regressive Distributed Lags (ARDL).
Phillips–Ouliaris cointegration test
Peter C. B. Phillips
Peter Charles Bonest Phillips (born 23 March 1948) is an econometrician. Since 1979 he has been Professor of Economics and Statistics at Yale University. He also holds positions at the University of Auckland, Singapore Management University and t ...
and
Sam Ouliaris
Sam, SAM or variants may refer to:
Places
* Sam, Benin
* Sam, Boulkiemdé, Burkina Faso
* Sam, Bourzanga, Burkina Faso
* Sam, Kongoussi, Burkina Faso
* Sam, Iran
* Sam, Teton County, Idaho, United States, a populated place
People and fictional ...
(1990) show that residual-based unit root tests applied to the estimated cointegrating residuals do not have the usual Dickey–Fuller distributions under the null hypothesis of no-cointegration. Because of the spurious regression phenomenon under the null hypothesis, the distribution of these tests have asymptotic distributions that depend on (1) the number of deterministic trend terms and (2) the number of variables with which co-integration is being tested. These distributions are known as Phillips–Ouliaris distributions and critical values have been tabulated. In finite samples, a superior alternative to the use of these asymptotic critical value is to generate critical values from simulations.
Multicointegration
In practice, cointegration is often used for two series, but it is more generally applicable and can be used for variables integrated of higher order (to detect correlated accelerations or other second-difference effects). Multicointegration extends the cointegration technique beyond two variables, and occasionally to variables integrated at different orders.
Variable shifts in long time series
Tests for cointegration assume that the cointegrating vector is constant during the period of study. In reality, it is possible that the long-run relationship between the underlying variables change (shifts in the cointegrating vector can occur). The reason for this might be technological progress, economic crises, changes in the people's preferences and behaviour accordingly, policy or regime alteration, and organizational or institutional developments. This is especially likely to be the case if the sample period is long. To take this issue into account, tests have been introduced for cointegration with one unknown
structural break
In econometrics and statistics, a structural break is an unexpected change over time in the parameters of regression models, which can lead to huge forecasting errors and unreliability of the model in general. This issue was popularised by Da ...
, and tests for cointegration with two unknown breaks are also available.
Bayesian inference
Several
Bayesian methods
Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, an ...
have been proposed to compute the posterior distribution of the number of cointegrating relationships and the cointegrating linear combinations.
See also
*
Error correction model
An error correction model (ECM) belongs to a category of multiple time series models most commonly used for data where the underlying variables have a long-run common stochastic trend, also known as cointegration. ECMs are a theoretically-driven a ...
*
Granger causality
The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that cau ...
*
Stationary subspace analysis
References
Further reading
*
*
*
* An intuitive introduction to cointegration.
{{Authority control
Mathematical finance
Time series