A hurdle model is a class of
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, ...
s where a random variable is modelled using two parts, the first which is the probability of attaining value 0, and the second part models the probability of the non-zero values. The use of hurdle models are often motivated by an excess of zeroes in the data, that is not sufficiently accounted for in more standard statistical models.
In a hurdle model, a random variable ''x'' is modelled as
:
:
where
is a
truncated probability distribution function, truncated at 0.
Hurdle models were introduced by John G. Cragg in 1971, where the non-zero values of ''x'' were modelled using a
normal model, and a
probit
In probability theory and statistics, the probit function is the quantile function associated with the standard normal distribution. It has applications in data analysis and machine learning, in particular exploratory statistical graphics and ...
model was used to model the zeros. The probit part of the model was said to model the presence of "hurdles" that must be overcome for the values of x to attain non-zero values, hence the designation ''hurdle model''. Hurdle models were later developed for count data, with
Poisson,
geometric
Geometry (; ) is, with arithmetic, one of the oldest branches of mathematics. It is concerned with properties of space such as the distance, shape, size, and relative position of figures. A mathematician who works in the field of geometry is ca ...
, and
negative binomial models for the non-zero counts .
Relationship with zero-inflated models
Hurdle models differ from
zero-inflated model
In statistics, a zero-inflated model is a statistical model based on a zero-inflated probability distribution, i.e. a distribution that allows for frequent zero-valued observations.
Zero-inflated Poisson
One well-known zero-inflated model is Di ...
s in that zero-inflated models model the zeros using a two-component
mixture model
In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observat ...
. With a mixture model, the probability of the variable being zero is determined by both the main distribution and the mixture weight. Specifically, a zero-inflated model for a random variable ''x'' is
:
:
where
is the mixture weight that determines the amount of zero-inflation. A zero-inflated model can only increase the probability of
, but this is not a restriction in hurdle models.
See also
*
Zero-inflated model
In statistics, a zero-inflated model is a statistical model based on a zero-inflated probability distribution, i.e. a distribution that allows for frequent zero-valued observations.
Zero-inflated Poisson
One well-known zero-inflated model is Di ...
*
Truncated normal hurdle model
In econometrics, the truncated normal hurdle model is a variant of the Tobit model and was first proposed by Cragg in 1971.
In a standard Tobit model, represented as y=(x\beta+u) 1 0">\beta+u>0/math>, where u, x\sim N(0,\sigma^2)This model constr ...
References
{{Reflist
Statistical models