In statistics, a tobit model is any of a class of
regression models in which the observed range of the
dependent variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
is
censored in some way. The term was coined by
Arthur Goldberger in reference to
James Tobin, who developed the model in 1958 to mitigate the problem of
zero-inflated data for observations of household expenditure on
durable goods. Because Tobin's method can be easily extended to handle
truncated and other non-randomly selected samples, some authors adopt a broader definition of the tobit model that includes these cases.
Tobin's idea was to modify the
likelihood function so that it reflects the unequal
sampling probability for each observation depending on whether the
latent dependent variable fell above or below the determined threshold. For a sample that, as in Tobin's original case, was censored from below at zero, the sampling probability for each non-limit observation is simply the height of the appropriate
density function. For any limit observation, it is the cumulative distribution, i.e. the
integral below zero of the appropriate density function. The tobit likelihood function is thus a mixture of densities and cumulative distribution functions.
The likelihood function
Below are the
likelihood and log likelihood functions for a type I tobit. This is a tobit that is censored from below at
when the latent variable
. In writing out the likelihood function, we first define an indicator function
:
:
Next, let
be the standard normal
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
and
to be the standard normal
probability density function. For a data set with ''N'' observations the likelihood function for a type I tobit is
:
and the log likelihood is given by
:
Reparametrization
The log-likelihood as stated above is not globally concave, which complicates the
maximum likelihood estimation. Olsen suggested the simple reparametrization
and
, resulting in a transformed log-likelihood,
: