Cointegration is a

statistical Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industr ...

property of a collection of

time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...

variables. First, all of the series must be integrated of order ''d'' (see

Order of integration In statistics, the order of integration, denoted ''I''(''d''), of a time series is a summary statistic, which reports the minimum number of differences required to obtain a covariance-stationary series. Integration of order ''d'' A time ...

). Next, if a linear combination of this collection is integrated of order less than d, then the collection is said to be co-integrated. Formally, if (''X'',''Y'',''Z'') are each integrated of order ''d'', and there exist coefficients ''a'',''b'',''c'' such that is integrated of order less than d, then ''X'', ''Y'', and ''Z'' are cointegrated. Cointegration has become an important property in contemporary time series analysis. Time series often have trends—either deterministic or

stochastic Stochastic (, ) refers to the property of being well described by a random probability distribution. Although stochasticity and randomness are distinct in that the former refers to a modeling approach and the latter refers to phenomena themselv ...

. In an influential paper, Charles Nelson and Charles Plosser (1982) provided statistical evidence that many US macroeconomic time series (like GNP, wages, employment, etc.) have stochastic trends.

Introduction

If two or more series are individually integrated (in the time series sense) but some linear combination of them has a lower

order of integration In statistics, the order of integration, denoted ''I''(''d''), of a time series is a summary statistic, which reports the minimum number of differences required to obtain a covariance-stationary series. Integration of order ''d'' A time ...

, then the series are said to be cointegrated. A common example is where the individual series are first-order integrated () but some (cointegrating) vector of coefficients exists to form a stationary linear combination of them. For instance, a

stock market index In finance, a stock index, or stock market index, is an index that measures a stock market, or a subset of the stock market, that helps investors compare current stock price levels with past prices to calculate market performance. Two of th ...

and the price of its associated

futures contract In finance, a futures contract (sometimes called a futures) is a standardized legal contract to buy or sell something at a predetermined price for delivery at a specified time in the future, between parties not yet known to each other. The asset ...

move through time, each roughly following a

random walk In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space. An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...

. Testing the hypothesis that there is a

statistically significant In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...

connection between the futures price and the spot price could now be done by testing for the existence of a cointegrated combination of the two series.

History

The first to introduce and analyse the concept of spurious—or nonsense—regression was Udny Yule in 1926. Before the 1980s, many economists used

linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...

s on non-stationary time series data, which Nobel laureate Clive Granger and Paul Newbold showed to be a dangerous approach that could produce

spurious correlation In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but '' not'' causally related, due to either coincidence or the presence of a certain third, u ...

, since standard detrending techniques can result in data that are still non-stationary. Granger's 1987 paper with Robert Engle formalized the cointegrating vector approach, and coined the term. For integrated processes, Granger and Newbold showed that de-trending does not work to eliminate the problem of spurious correlation, and that the superior alternative is to check for co-integration. Two series with trends can be co-integrated only if there is a genuine relationship between the two. Thus the standard current methodology for time series regressions is to check all-time series involved for integration. If there are series on both sides of the regression relationship, then it's possible for regressions to give misleading results. The possible presence of cointegration must be taken into account when choosing a technique to test hypotheses concerning the relationship between two variables having unit roots (i.e. integrated of at least order one). The usual procedure for testing hypotheses concerning the relationship between non-stationary variables was to run

ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...

(OLS) regressions on data which had been differenced. This method is biased if the non-stationary variables are cointegrated. For example, regressing the consumption series for any country (e.g. Fiji) against the GNP for a randomly selected dissimilar country (e.g. Afghanistan) might give a high

R-squared In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). It is a statistic used ...

relationship (suggesting high explanatory power on Fiji's consumption from Afghanistan's GNP). This is called spurious regression: two integrated series which are not directly causally related may nonetheless show a significant correlation; this phenomenon is called spurious correlation.

Tests

The six main methods for testing for cointegration are:

Engle–Granger two-step method

x_t

and

y_t

are non-stationary and

''d''=1, then a linear combination of them must be stationary for some value of

\beta

and

u_t

. In other words: :

y_t - \beta x_t = u_t \,

where

u_t

is stationary. If we knew

\beta

, we could just test it for stationarity with something like a

Dickey–Fuller test In statistics, the Dickey–Fuller test tests the null hypothesis that a unit root is present in an autoregressive time series model. The alternative hypothesis is different depending on which version of the test is used, but is usually stationar ...

Phillips–Perron test In statistics, the Phillips–Perron test (named after Peter C. B. Phillips and Pierre Perron) is a unit root test. That is, it is used in time series analysis to test the null hypothesis that a time series is integrated of order 1. It builds ...

and be done. But because we don't know

\beta

, we must estimate this first, generally by using

(by regressing

y_t

x_t

and an intercept) and then run our stationarity test on the estimated

u_t

series, often denoted

\hat_t

. A second regression is then run on the first differenced variables from the first regression, and the lagged residuals

\hat_

is included as a regressor.

Johansen test

The

Johansen test In statistics, the Johansen test, named after Søren Johansen, is a procedure for testing cointegration of several, say ''k'', I(1) time series. This test permits more than one cointegrating relationship so is more generally applicable than th ...

is a test for cointegration that allows for more than one cointegrating relationship, unlike the Engle–Granger method, but this test is subject to asymptotic properties, i.e. large samples. If the sample size is too small then the results will not be reliable and one should use Auto Regressive Distributed Lags (ARDL).

Phillips–Ouliaris cointegration test

Peter C. B. Phillips Peter Charles Bonest Phillips (born 23 March 1948) is an econometrician. Since 1979 he has been Professor of Economics and Statistics at Yale University. He also holds positions at the University of Auckland, Singapore Management University and t ...

and

Sam Ouliaris Sam, SAM or variants may refer to: Places * Sam, Benin * Sam, Boulkiemdé, Burkina Faso * Sam, Bourzanga, Burkina Faso * Sam, Kongoussi, Burkina Faso * Sam, Iran * Sam, Teton County, Idaho, United States, a populated place People and fictional ...

(1990) show that residual-based unit root tests applied to the estimated cointegrating residuals do not have the usual Dickey–Fuller distributions under the null hypothesis of no-cointegration. Because of the spurious regression phenomenon under the null hypothesis, the distribution of these tests have asymptotic distributions that depend on (1) the number of deterministic trend terms and (2) the number of variables with which co-integration is being tested. These distributions are known as Phillips–Ouliaris distributions and critical values have been tabulated. In finite samples, a superior alternative to the use of these asymptotic critical value is to generate critical values from simulations.

Multicointegration

In practice, cointegration is often used for two series, but it is more generally applicable and can be used for variables integrated of higher order (to detect correlated accelerations or other second-difference effects). Multicointegration extends the cointegration technique beyond two variables, and occasionally to variables integrated at different orders.

Variable shifts in long time series

Tests for cointegration assume that the cointegrating vector is constant during the period of study. In reality, it is possible that the long-run relationship between the underlying variables change (shifts in the cointegrating vector can occur). The reason for this might be technological progress, economic crises, changes in the people's preferences and behaviour accordingly, policy or regime alteration, and organizational or institutional developments. This is especially likely to be the case if the sample period is long. To take this issue into account, tests have been introduced for cointegration with one unknown

structural break In econometrics and statistics, a structural break is an unexpected change over time in the parameters of regression models, which can lead to huge forecasting errors and unreliability of the model in general. This issue was popularised by Da ...

, and tests for cointegration with two unknown breaks are also available.

Bayesian inference

Several

Bayesian methods Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, an ...

have been proposed to compute the posterior distribution of the number of cointegrating relationships and the cointegrating linear combinations.