Vector autoregression (VAR) is a statistical model used to capture the relationship between multiple quantities as they change over time. VAR is a type of

stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Sto ...

model. VAR models generalize the single-variable (univariate)

autoregressive model In statistics, econometrics, and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it can be used to describe certain time-varying processes in nature, economics, behavior, etc. The autoregre ...

by allowing for multivariate

time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...

. VAR models are often used in

economics Economics () is a behavioral science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goods and services. Economics focuses on the behaviour and interac ...

and the

natural science Natural science or empirical science is one of the branches of science concerned with the description, understanding and prediction of natural phenomena, based on empirical evidence from observation and experimentation. Mechanisms such as peer ...

s. Like the autoregressive model, each variable has an equation modelling its evolution over time. This equation includes the variable's lagged (past) values, the lagged values of the other variables in the model, and an error term. VAR models do not require as much knowledge about the forces influencing a variable as do structural models with

simultaneous equations In mathematics, a set of simultaneous equations, also known as a system of equations or an equation system, is a finite set of equations for which common solutions are sought. An equation system is usually classified in the same manner as single e ...

. The only prior knowledge required is a list of variables which can be hypothesized to affect each other over time.

Specification

Definition

A VAR model describes the evolution of a set of ''k'' variables, called ''

endogenous Endogeny, in biology, refers to the property of originating or developing from within an organism, tissue, or cell. For example, ''endogenous substances'', and ''endogenous processes'' are those that originate within a living system (e.g. an ...

variables'', over time. Each period of time is numbered, ''t'' = 1, ..., ''T''. The variables are collected in a

vector Vector most often refers to: * Euclidean vector, a quantity with a magnitude and a direction * Disease vector, an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematics a ...

, ''y_t'', which is of length ''k.'' (Equivalently, this vector might be described as a (''k'' × 1)- matrix.) The vector is modelled as a linear function of its previous value. The vector's components are referred to as ''y''_''i'',''t'', meaning the observation at time ''t'' of the ''i'' th variable. For example, if the first variable in the model measures the price of wheat over time, then ''y''_1,1998 would indicate the price of wheat in the year 1998. VAR models are characterized by their ''order'', which refers to the number of earlier time periods the model will use. Continuing the above example, a 5th-order VAR would model each year's wheat price as a linear combination of the last five years of wheat prices. A ''lag'' is the value of a variable in a previous time period. So in general a ''p''th-order VAR refers to a VAR model which includes lags for the last ''p'' time periods. A ''p''th-order VAR is denoted "VAR(''p'')" and sometimes called "a VAR with ''p'' lags". A ''p''th-order VAR model is written as :

y_t = c + A_1 y_ + A_2 y_ + \cdots + A_p y_ + e_t, \,

The variables of the form ''y''_''t''−i indicate that variable's value ''i'' time periods earlier and are called the "i''th'' lag" of ''y''_t. The variable ''c'' is a ''k''-vector of constants serving as the intercept of the model. ''A_i'' is a time-invariant (''k'' × ''k'')-matrix and ''e''_''t'' is a ''k''-vector of

error An error (from the Latin , meaning 'to wander'Oxford English Dictionary, s.v. “error (n.), Etymology,” September 2023, .) is an inaccurate or incorrect action, thought, or judgement. In statistics, "error" refers to the difference between t ...

terms. The error terms must satisfy three conditions: #

\mathrm(e_t) = 0\,

. Every error term has a

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

of zero. #

\mathrm(e_t e_t') = \Omega\,

. The contemporaneous

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...

of error terms is a ''k'' × ''k'' positive-semidefinite matrix denoted Ω. #

\mathrm(e_t e_') = 0\,

for any non-zero ''k''. There is no

correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...

across time. In particular, there is no serial correlation in individual error terms. The process of choosing the maximum lag ''p'' in the VAR model requires special attention because

inference Inferences are steps in logical reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinct ...

is dependent on correctness of the selected lag order.

Order of integration of the variables

Note that all variables have to be of the same order of integration. The following cases are distinct: *All the variables are I(0) (stationary): this is in the standard case, i.e. a VAR in level *All the variables are I(''d'') (non-stationary) with ''d'' > 0: **The variables are cointegrated: the error correction term has to be included in the VAR. The model becomes a Vector

error correction model An error correction model (ECM) belongs to a category of multiple time series models most commonly used for data where the underlying variables have a long-run common stochastic trend, also known as cointegration. ECMs are a theoretically-driven ap ...

(VECM) which can be seen as a restricted VAR. **The variables are not cointegrated: first, the variables have to be differenced d times and one has a VAR in difference.

Concise matrix notation

One can stack the vectors in order to write a VAR(''p'') as a

stochastic Stochastic (; ) is the property of being well-described by a random probability distribution. ''Stochasticity'' and ''randomness'' are technically distinct concepts: the former refers to a modeling approach, while the latter describes phenomena; i ...

matrix difference equation A matrix difference equation is a difference equation in which the value of a vector (or sometimes, a matrix) of variables at one point in time is related to its own value at one or more previous points in time, using matrices. The order of the e ...

, with a concise matrix notation: :

Y=BZ +U \,

Example

A VAR(1) in two variables can be written in matrix form (more compact notation) as :

\beginy_ \\ y_\end = \beginc_ \\ c_\end + \begina_&a_ \\ a_&a_\end\beginy_ \\ y_\end + \begine_ \\ e_\end,

(in which only a single ''A'' matrix appears because this example has a maximum lag ''p'' equal to 1), or, equivalently, as the following system of two equations :

y_ = c_ + a_y_ + a_y_ + e_\,

y_ = c_ + a_y_ + a_y_ + e_.\,

Each variable in the model has one equation. The current (time ''t'') observation of each variable depends on its own lagged values as well as on the lagged values of each other variable in the VAR.

Writing VAR(''p'') as VAR(1)

A VAR with ''p'' lags can always be equivalently rewritten as a VAR with only one lag by appropriately redefining the dependent variable. The transformation amounts to stacking the lags of the VAR(''p'') variable in the new VAR(1) dependent variable and appending identities to complete the precise number of equations. For example, the VAR(2) model :

y_t = c + A_1 y_ + A_2 y_ + e_t

can be recast as the VAR(1) model ::

\beginy_ \\ y_\end = \beginc \\ 0\end + \beginA_&A_ \\ I&0\end\beginy_ \\ y_\end + \begine_ \\ 0\end,

where ''I'' is the

identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. It has unique properties, for example when the identity matrix represents a geometric transformation, the obje ...

. The equivalent VAR(1) form is more convenient for analytical derivations and allows more compact statements.

Structural vs. reduced form

Structural VAR

A ''structural VAR with p lags'' (sometimes abbreviated SVAR) is :

B_0 y_t = c_0 + B_1 y_ + B_2 y_ + \cdots + B_p y_ + \epsilon_t,

where ''c''₀ is a ''k'' × 1 vector of constants, ''B_i'' is a ''k'' × ''k'' matrix (for every ''i'' = 0, ..., ''p'') and ''ε''_''t'' is a ''k'' × 1 vector of

terms. The

main diagonal In linear algebra, the main diagonal (sometimes principal diagonal, primary diagonal, leading diagonal, major diagonal, or good diagonal) of a matrix A is the list of entries a_ where i = j. All off-diagonal elements are zero in a diagonal matrix ...

terms of the ''B''₀ matrix (the coefficients on the ''i''^th variable in the ''i''^th equation) are scaled to 1. The error terms ε''_t'' (''structural shocks'') satisfy the conditions (1) - (3) in the definition above, with the particularity that all the elements in the off diagonal of the covariance matrix

\mathrm(\epsilon_t\epsilon_t') = \Sigma

are zero. That is, the structural shocks are uncorrelated. For example, a two variable structural VAR(1) is: :

\begin1&B_ \\ B_&1\end\beginy_ \\ y_\end = \beginc_ \\ c_\end + \beginB_&B_ \\ B_&B_\end\beginy_ \\ y_\end + \begin\epsilon_ \\ \epsilon_\end,

where :

\Sigma = \mathrm(\epsilon_t \epsilon_t') = \begin\sigma_^2&0 \\ 0&\sigma_^2\end;

that is, the

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

s of the structural shocks are denoted

\mathrm(\epsilon_i) = \sigma_i^2

(''i'' = 1, 2) and the

covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. If greater values of one ...

\mathrm(\epsilon_1,\epsilon_2) = 0

. Writing the first equation explicitly and passing ''y_2,t'' to the right hand side one obtains :

y_ = c_ - B_y_ + B_y_ + B_y_ + \epsilon_\,

Note that ''y''_2,''t'' can have a contemporaneous effect on ''y_1,t'' if ''B''_0;1,2 is not zero. This is different from the case when ''B''₀ is the

(all off-diagonal elements are zero — the case in the initial definition), when ''y''_2,''t'' can impact directly ''y''_1,''t''+1 and subsequent future values, but not ''y''_1,''t''. Because of the parameter identification problem,

ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression In statistics, linear regression is a statistical model, model that estimates the relationship ...

estimation of the structural VAR would yield

inconsistent In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences o ...

parameter estimates. This problem can be overcome by rewriting the VAR in reduced form. From an economic point of view, if the joint dynamics of a set of variables can be represented by a VAR model, then the structural form is a depiction of the underlying, "structural", economic relationships. Two features of the structural form make it the preferred candidate to represent the underlying relations: :1. ''Error terms are not correlated''. The structural, economic shocks which drive the dynamics of the economic variables are assumed to be

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

, which implies zero correlation between error terms as a desired property. This is helpful for separating out the effects of economically unrelated influences in the VAR. For instance, there is no reason why an oil price shock (as an example of a supply shock) should be related to a shift in consumers' preferences towards a style of clothing (as an example of a demand shock); therefore one would expect these factors to be statistically independent. :2. ''Variables can have a contemporaneous impact on other variables''. This is a desirable feature especially when using low frequency data. For example, an

indirect tax An indirect tax (such as a sales tax, per unit tax, value-added tax (VAT), excise tax, consumption tax, or tariff) is a tax that is levied upon goods and services before they reach the customer who ultimately pays the indirect tax as a part of ...

rate increase would not affect tax revenues the day the decision is announced, but one could find an effect in that quarter's data.

Reduced-form VAR

By premultiplying the structural VAR with the inverse of ''B''₀ :

y_t = B_0^c_0 + B_0^ B_1 y_ + B_0^ B_2 y_ + \cdots + B_0^ B_p y_ + B_0^\epsilon_t,

and denoting :

B_^ c_0 = c,\quad B_^B_i = A_\texti = 1, \dots, p\textB_^\epsilon_t = e_t

one obtains the ''p''th order reduced VAR :

y_t = c + A_1 y_ + A_2 y_ + \cdots + A_p y_ + e_t

Note that in the reduced form all right hand side variables are predetermined at time ''t''. As there are no time ''t'' endogenous variables on the right hand side, no variable has a ''direct'' contemporaneous effect on other variables in the model. However, the error terms in the reduced VAR are composites of the structural shocks ''e''_''t'' = ''B''₀⁻¹''ε''_''t''. Thus, the occurrence of one structural shock ''ε_i,t'' can potentially lead to the occurrence of shocks in all error terms ''e_j,t'', thus creating contemporaneous movement in all endogenous variables. Consequently, the covariance matrix of the reduced VAR :

\Omega = \mathrm(e_t e_t') = \mathrm (B_0^ \epsilon_t \epsilon_t' (B_0^)') = B_0^\Sigma(B_0^)'\,

can have non-zero off-diagonal elements, thus allowing non-zero correlation between error terms.

Estimation

Estimation of the regression parameters

Starting from the concise matrix notation: :

Y=BZ +U \,

*The multivariate least squares (MLS) approach for estimating B yields: :

\hat B= YZ'(ZZ')^.

This can be written alternatively as: :

\operatorname(\hat B) = ((ZZ')^ Z \otimes I_)\ \operatorname(Y),

where

\otimes

denotes the

Kronecker product In mathematics, the Kronecker product, sometimes denoted by ⊗, is an operation on two matrices of arbitrary size resulting in a block matrix. It is a specialization of the tensor product (which is denoted by the same symbol) from vector ...

and Vec the vectorization of the indicated matrix. This estimator is

consistent In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...

and asymptotically efficient. It is furthermore equal to the conditional

maximum likelihood estimator In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

. * As the explanatory variables are the same in each equation, the multivariate least squares estimator is equivalent to the

estimator applied to each equation separately.

Estimation of the covariance matrix of the errors

As in the standard case, the

(MLE) of the covariance matrix differs from the ordinary least squares (OLS) estimator. MLE estimator:

\hat \Sigma = \frac \sum_^T \hat \epsilon_t\hat \epsilon_t'

OLS estimator:

\hat \Sigma = \frac \sum_^T \hat \epsilon_t\hat \epsilon_t'

for a model with a constant, ''k'' variables and ''p'' lags. In a matrix notation, this gives: :

\hat \Sigma = \frac (Y-\hatZ)(Y-\hatZ)'.

Estimation of the estimator's covariance matrix

The covariance matrix of the parameters can be estimated as :

\widehat  \mbox (\mbox(\hat B)) =()^ \otimes\hat \Sigma.\,

Degrees of freedom

Vector autoregression models often involve the estimation of many parameters. For example, with seven variables and four lags, each matrix of coefficients for a given lag length is 7 by 7, and the vector of constants has 7 elements, so a total of 49×4 + 7 = 203 parameters are estimated, substantially lowering the

degrees of freedom In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinite ...

of the regression (the number of data points minus the number of parameters to be estimated). This can hurt the accuracy of the parameter estimates and hence of the forecasts given by the model.

Interpretation of estimated model

Impulse response

Consider the first-order case (i.e., with only one lag), with equation of evolution :

y_t=Ay_+e_t,

for evolving (state) vector

y

and vector

e

of shocks. To find, say, the effect of the ''j''-th element of the vector of shocks upon the ''i''-th element of the state vector 2 periods later, which is a particular impulse response, first write the above equation of evolution one period lagged: :

y_=Ay_+e_.

Use this in the original equation of evolution to obtain :

y_t=A^2y_+Ae_+e_t;

then repeat using the twice lagged equation of evolution, to obtain :

y_t=A^3y_+A^2e_+Ae_+e_t.

From this, the effect of the ''j''-th component of

e_

upon the ''i''-th component of

y_t

is the ''i, j'' element of the matrix

A^2.

It can be seen from this induction process that any shock will have an effect on the elements of ''y'' infinitely far forward in time, although the effect will become smaller and smaller over time assuming that the AR process is stable — that is, that all the

eigenvalue In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by a ...

s of the matrix ''A'' are less than 1 in

absolute value In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if x is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), ...

Forecasting using an estimated VAR model

An estimated VAR model can be used for

forecasting Forecasting is the process of making predictions based on past and present data. Later these can be compared with what actually happens. For example, a company might Estimation, estimate their revenue in the next year, then compare it against the ...

, and the quality of the forecasts can be judged, in ways that are completely analogous to the methods used in univariate autoregressive modelling.

Applications

Christopher Sims Christopher Albert Sims (born October 21, 1942) is an American econometrician and macroeconomist. He is currently the John J.F. Sherrerd '52 University Professor of Economics at Princeton University. Together with Thomas Sargent, he won the N ...

has advocated VAR models, criticizing the claims and performance of earlier modeling in

macroeconomic Macroeconomics is a branch of economics that deals with the performance, structure, behavior, and decision-making of an economy as a whole. This includes regional, national, and global economies. Macroeconomists study topics such as output/ GDP ...

econometrics Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...

. He recommended VAR models, which had previously appeared in time series

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

and in

system identification The field of system identification uses statistical methods to build mathematical models of dynamical systems from measured data. System identification also includes the optimal design#System identification and stochastic approximation, optimal de ...

, a statistical specialty in

control theory Control theory is a field of control engineering and applied mathematics that deals with the control system, control of dynamical systems in engineered processes and machines. The objective is to develop a model or algorithm governing the applic ...

. Sims advocated VAR models as providing a theory-free method to estimate economic relationships, thus being an alternative to the "incredible identification restrictions" in structural models. VAR models are also increasingly used in health research for automatic analyses of diary data or sensor data. Sio Iong Ao and R. E. Caraka found that the artificial neural network can improve its performance with the addition of the hybrid vector autoregression component.

Software

* R: The package
vars
' includes functions for VAR models. Other R packages are listed in the CRAN Task View: Time Series Analysis. * Python: The ''statsmodels'' package's tsa (time series analysis) module supports VARs. ''PyFlux'' has support for VARs and Bayesian VARs. * SAS: VARMAX * Stata: "var" * EViews: "VAR" *

Gretl gretl is an open-source statistical package, mainly for econometrics. The name is an acronym for ''G''nu ''R''egression, ''E''conometrics and ''T''ime-series ''L''ibrary. It has both a graphical user interface (GUI) and a command-line interf ...

: "var" *

Matlab MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...

: "varm" * Regression analysis of time series: "SYSTEM" *LDT