HOME

TheInfoList



OR:

Spike-and-slab regression is a type of
Bayesian linear regression Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients (as wel ...
in which a particular hierarchical prior distribution for the regression coefficients is chosen such that only a subset of the possible regressors is retained. The technique that is particularly useful when the number of possible predictors is larger than the number of observations. The idea of the spike-and-slab model was originally proposed by Mitchell & Beauchamp (1988). The approach was further significantly developed by Madigan & Raftery (1994) and George & McCulloch (1997). The final adjustments to the model were done by Ishwaran & Rao (2005).


Model description

Suppose we have ''P'' possible predictors in some model. Vector ''γ'' has a length equal to ''P'' and consists of zeros and ones. This vector indicates whether a particular variable is included in the regression or not. If no specific prior information on initial inclusion probabilities of particular variables is available, a Bernoulli prior distribution is a common default choice. Conditional on a predictor being in the regression, we identify a
prior distribution In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...
for the model coefficient, which corresponds to that variable (''β''). A common choice on that step is to use a
normal Normal(s) or The Normal(s) may refer to: Film and television * ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson * ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie * ''Norma ...
prior with mean equal to zero and a large variance calculated based on (X^TX)^ (where X is a
design matrix In statistics and in particular in regression analysis, a design matrix, also known as model matrix or regressor matrix and often denoted by X, is a matrix of values of explanatory variables of a set of objects. Each row represents an individual ...
of explanatory variables of the model). A draw of ''γ'' from its prior distribution is a list of the variables included in the regression. Conditional on this set of selected variables, we take a draw from the prior distribution of the regression coefficients (if ''γ''''i'' = 1 then ''β''''i'' ≠ 0 and if ''γ''''i'' = 0 then ''β''''i'' = 0). ''βγ'' denotes the subset of ''β'' for which ''γ''''i'' = 1. In the next step, we calculate a
posterior probability The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
for both inclusion and coefficients by applying a standard statistical procedure. All steps of the described algorithm are repeated thousands of times using
Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
(MCMC) technique. As a result, we obtain a posterior distribution of ''γ'' (variable inclusion in the model), ''β'' (regression coefficient values) and the corresponding prediction of ''y''. The model got its name (spike-and-slab) due to the shape of the two prior distributions. The "spike" is the probability of a particular coefficient in the model to be zero. The "slab" is the prior distribution for the regression coefficient values. An advantage of Bayesian variable selection techniques is that they are able to make use of prior knowledge about the model. In the absence of such knowledge, some reasonable default values can be used; to quote Scott and Varian (2013): "For the analyst who prefers simplicity at the cost of some reasonable assumptions, useful prior information can be reduced to an expected model size, an expected ''R''2, and a sample size ''ν'' determining the weight given to the guess at ''R''2." Some researchers suggest the following default values: ''R''2 = 0.5, ''ν'' = 0.01, and = 0.5 (parameter of a prior Bernoulli distribution).


See also

*
Bayesian model averaging In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statist ...
*
Bayesian structural time series Bayesian structural time series (BSTS) model is a statistical technique used for feature selection, time series forecasting, nowcasting, inferring causal impact and other applications. The model is designed to work with time series data. The mod ...
*
Lasso A lasso ( or ), also called lariat, riata, or reata (all from Castilian, la reata 're-tied rope'), is a loop of rope designed as a restraint to be thrown around a target and tightened when pulled. It is a well-known tool of the Spanish a ...


References


Further reading

*{{cite book , last=Congdon , first=Peter D. , chapter=Regression Techniques using Hierarchical Priors , pages=253–315 , title=Bayesian Hierarchical Models , location=Boca Raton , publisher=CRC Press , edition=2nd , year=2020, isbn=978-1-03-217715-1 Machine learning Bayesian inference Bayesian statistics