HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the ordered logit model or proportional odds logistic regression is an
ordinal regression In statistics, ordinal regression, also called ordinal classification, is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between di ...
model—that is, a regression model for ordinal
dependent variable A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical functio ...
s—first considered by Peter McCullagh. For example, if one question on a survey is to be answered by a choice among "poor", "fair", "good", "very good" and "excellent", and the purpose of the analysis is to see how well that response can be predicted by the responses to other questions, some of which may be quantitative, then ordered logistic regression may be used. It can be thought of as an extension of the
logistic regression In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...
model that applies to
dichotomous A dichotomy () is a partition of a set, partition of a whole (or a set) into two parts (subsets). In other words, this couple of parts must be * jointly exhaustive: everything must belong to one part or the other, and * mutually exclusive: nothi ...
dependent variables, allowing for more than two (ordered) response categories.


The model and the proportional odds assumption

The model only applies to data that meet the ''proportional odds assumption'', the meaning of which can be exemplified as follows. Suppose there are five outcomes: "poor", "fair", "good", "very good", and "excellent". We assume that the probabilities of these outcomes are given by ''p''1(''x''), ''p''2(''x''), ''p''3(''x''), ''p''4(''x''), ''p''5(''x''), all of which are functions of some independent variable(s) ''x''. Then, for a fixed value of ''x,'' the logarithms of the
odds In probability theory, odds provide a measure of the probability of a particular outcome. Odds are commonly used in gambling and statistics. For example for an event that is 40% probable, one could say that the odds are or When gambling, o ...
(not the logarithms of the probabilities) of answering in certain ways are: : \begin \text & \log\frac, \\ pt\text & \log\frac, \\ pt\text & \log\frac, \\ pt\text & \log\frac \end The proportional odds assumption states that the numbers added to each of these logarithms to get the next are the same regardless of ''x''. In other words, the difference between the logarithm of the odds of having poor or fair health minus the logarithm of odds of having poor health is the same regardless of ''x''; similarly, the logarithm of the odds of having poor, fair, or good health minus the logarithm of odds of having poor or fair health is the same regardless of ''x''; etc. Examples of multiple-ordered response categories include bond ratings, opinion surveys with responses ranging from "strongly agree" to "strongly disagree," levels of state spending on government programs (high, medium, or low), the level of insurance coverage chosen (none, partial, or full), and employment status (not employed, employed part-time, or fully employed). Ordered logit can be derived from a latent-variable model, similar to the one from which binary logistic regression can be derived. Suppose the underlying process to be characterized is :y^ = \mathbf^ \beta + \varepsilon, \, where y^ is an unobserved dependent variable (perhaps the exact level of agreement with the statement proposed by the pollster); \mathbf is the vector of independent variables; \varepsilon is the
error term In mathematics and statistics, an error term is an additive type of error. In writing, an error term is an instance of faulty language or grammar. Common examples include: * errors and residuals in statistics, e.g. in linear regression * the error ...
, assumed to follow a standard logistic distribution; and \beta is the vector of regression coefficients which we wish to estimate. Further suppose that while we cannot observe y^, we instead can only observe the categories of response : y= \begin 0 & \text y^* \le \mu_1, \\ 1 & \text \mu_1 where the parameters \mu_i are the externally imposed endpoints of the observable categories. Then the ordered logit technique will use the observations on ''y'', which are a form of censored data on ''y*'', to fit the parameter vector \beta.


Model fitting

As with most statistical models,
maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
or
Bayesian inference Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...
are the most common ways of identifying the parameters. The estimated values indicate the direction and magnitude of the effect of each independent variable on the likelihood of the dependent variable falling into a higher category.


Applications

Ordered logistic regressions have been used in multiple fields, such as transportation, marketing or disaster management. In
clinical research Clinical research is a branch of medical research that involves people and aims to determine the effectiveness (efficacy) and safety of medications, devices, diagnostic products, and treatment regimens intended for improving human health. The ...
, the effect a drug may have on a patient may be modeled with ordinal regression. Independent variables may include the use or non-use of the drug, as well as control variables such as
demographics Demography () is the statistical study of human populations: their size, composition (e.g., ethnic group, age), and how they change through the interplay of fertility (births), mortality (deaths), and migration. Demographic analysis examin ...
and details from medical history. The dependent variable could be ranked on the following list: complete cure, improved symptoms, no change, worsened symptoms, or death. Another example application are Likert-type items commonly employed in survey research, where respondents rate their agreement on an ordered scale (e.g., "Strongly disagree" to "Strongly agree"). The ordered logit model provides an appropriate fit to these data, preserving the ordering of response options while making no assumptions of the interval distances between options.


See also

*
Multinomial logit In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the prob ...
*
Multinomial probit In statistics and econometrics, the multinomial probit model is a generalization of the probit model used when there are several possible categories that the dependent variable can fall into. As such, it is an alternative to the multinomial logi ...


References


Further reading

* * * * *


External links

* * {{cite web , first=Germán , last=Rodríguez , title=Ordered Logit Models , work=Princeton University , url=http://data.princeton.edu/wws509/stata/c6s5.html Logistic regression