Ordinal data is a categorical,
statistical data type
In statistics, data can have any of various ''types''. Statistical data types include categorical (e.g. country), directional ( angles or directions, e.g. wind measurements), count (a whole number of events), or real intervals (e.g. measures ...
where the variables have natural, ordered categories and the distances between the categories are not known.
These data exist on an ordinal scale, one of four
levels of measurement
Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to dependent and independent variables, variables. Psychologist Stanley Smith Stevens developed the best-known class ...
described by
S. S. Stevens in 1946. The ordinal scale is distinguished from the
nominal scale
Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scale ...
by having a ''
ranking
A ranking is a relationship between a set of items, often recorded in a list, such that, for any two items, the first is either "ranked higher than", "ranked lower than", or "ranked equal to" the second. In mathematics, this is known as a weak ...
''.
It also differs from the
interval scale
Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scale ...
and
ratio scale
In mathematics, the set of positive real numbers, \R_ = \left\, is the subset of those real numbers that are greater than zero. The non-negative real numbers, \R_ = \left\, also include zero. Although the symbols \R_ and \R^ are ambiguously used fo ...
by not having category widths that represent equal increments of the underlying attribute.
Examples of ordinal data
A well-known example of ordinal data is the
Likert scale
A Likert scale ( ,) is a psychometric scale named after its inventor, American social psychologist Rensis Likert, which is commonly used in research questionnaires. It is the most widely used approach to scaling responses in survey research, s ...
. An example of a Likert scale is:
Examples of ordinal data are often found in questionnaires: for example, the survey question "Is your general health poor, reasonable, good, or excellent?" may have those answers coded respectively as 1, 2, 3, and 4. Sometimes data on an
interval scale
Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scale ...
or
ratio scale
In mathematics, the set of positive real numbers, \R_ = \left\, is the subset of those real numbers that are greater than zero. The non-negative real numbers, \R_ = \left\, also include zero. Although the symbols \R_ and \R^ are ambiguously used fo ...
are grouped onto an ordinal scale: for example, individuals whose income is known might be grouped into the income categories $0–$19,999, $20,000–$39,999, $40,000–$59,999, ..., which then might be coded as 1, 2, 3, 4, .... Other examples of ordinal data include socioeconomic status, military ranks, and letter grades for coursework.
Ways to analyse ordinal data
Ordinal data analysis requires a different set of analyses than other qualitative variables. These methods incorporate the natural ordering of the variables in order to avoid loss of power.
Computing the mean of a sample of ordinal data is discouraged; other measures of central tendency, including the median or mode, are generally more appropriate.
General
Stevens (1946) argued that, because the assumption of equal distance between categories does not hold for ordinal data, the use of
means
Means may refer to:
* Means LLC, an anti-capitalist media worker cooperative
* Means (band), a Christian hardcore band from Regina, Saskatchewan
* Means, Kentucky, a town in the US
* Means (surname)
* Means Johnston Jr. (1916–1989), US Navy ...
and
standard deviations
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its mean. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the ...
for description of ordinal distributions and of inferential statistics based on means and standard deviations was not appropriate. Instead, positional measures like the median and percentiles, in addition to
descriptive statistics
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
appropriate for nominal data (number of cases, mode, contingency correlation), should be used.
Nonparametric methods have been proposed as the most appropriate procedures for inferential statistics involving ordinal data (e.g,
Kendall's W,
Spearman's rank correlation coefficient
In statistics, Spearman's rank correlation coefficient or Spearman's ''ρ'' is a number ranging from -1 to 1 that indicates how strongly two sets of ranks are correlated. It could be used in a situation where one only has ranked data, such as a ...
, etc.), especially those developed for the analysis of ranked measurements.
However, the use of
parametric statistics
Parametric statistics is a branch of statistics which leverages models based on a fixed (finite) set of parameters. Conversely nonparametric statistics does not assume explicit (finite-parametric) mathematical forms for distributions when modeli ...
for ordinal data may be permissible with certain caveats to take advantage of the greater range of available statistical procedures.
Univariate statistics
In place of means and standard deviations, univariate statistics appropriate for ordinal data include the median,
other percentiles (such as quartiles and deciles),
and the quartile deviation.
One-sample tests for ordinal data include the
Kolmogorov-Smirnov one-sample test,
the
one-sample runs test,
and the change-point test.
Bivariate statistics
In lieu of testing differences in means with
''t''-tests, differences in distributions of ordinal data from two independent samples can be tested with
Mann-Whitney,
runs,
Smirnov,
and
signed-ranks tests. Test for two related or matched samples include the
sign test
The sign test is a statistical test for consistent differences between pairs of observations, such as the weight of subjects before and after treatment. Given pairs of observations (such as weight pre- and post-treatment) for each subject, the sign ...
and the
Wilcoxon signed ranks test.
Analysis of variance with ranks and the
Jonckheere test for ordered alternatives can be conducted with ordinal data in place of independent samples
ANOVA
Analysis of variance (ANOVA) is a family of statistical methods used to compare the means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variation ''w ...
. Tests for more than two related samples includes the
Friedman two-way analysis of variance by ranks and the
Page test for ordered alternatives.
Correlation measures appropriate for two ordinal-scaled variables include
Kendall's tau
In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient (after the Greek letter τ, tau), is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non ...
,
gamma
Gamma (; uppercase , lowercase ; ) is the third letter of the Greek alphabet. In the system of Greek numerals it has a value of 3. In Ancient Greek, the letter gamma represented a voiced velar stop . In Modern Greek, this letter normally repr ...
,
''
rs'',
and ''
dyx/dxy''.
Regression applications
Ordinal data can be considered as a quantitative variable. In
logistic regression
In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...
, the equation
:
is the model and c takes on the assigned levels of the categorical scale.
In
regression analysis, outcomes (
dependent variable
A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical functio ...
s) that are ordinal variables can be predicted using a variant of
ordinal regression
In statistics, ordinal regression, also called ordinal classification, is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between di ...
, such as
ordered logit or
ordered probit.
In multiple regression/correlation analysis, ordinal data can be accommodated using power polynomials and through normalization of scores and ranks.
Linear trends
Linear trends are also used to find associations between ordinal data and other categorical variables, normally in a
contingency table
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the multivariate frequency distribution of the variables. They are heavily used in survey research, business int ...
s. A correlation ''r'' is found between the variables where ''r'' lies between -1 and 1. To test the trend, a
test statistic
Test statistic is a quantity derived from the sample for statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specified in terms of a tes ...
:
:
is used where ''n'' is the sample size.
''R'' can be found by letting
be the row scores and
be the column scores. Let
be the mean of the row scores while
. Then
is the marginal row probability and
is the marginal column probability. ''R'' is calculated by:
:
Classification methods
Classification methods have also been developed for ordinal data. The data are divided into different categories such that each observation is similar to others. Dispersion is measured and minimized in each group to maximize classification results. The dispersion function is used in
information theory
Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
.
Statistical models for ordinal data
There are several different models that can be used to describe the structure of ordinal data.
Four major classes of model are described below, each defined for a random variable
, with levels indexed by
.
Note that in the model definitions below, the values of
and
will not be the same for all the models for the same set of data, but the notation is used to compare the structure of the different models.
Proportional odds model
The most commonly-used model for ordinal data is the proportional odds model, defined by
where the parameters
describe the base distribution of the ordinal data,
are the covariates and
are the coefficients describing the effects of the covariates.
This model can be generalized by defining the model using
instead of
, and this would make the model suitable for nominal data (in which the categories have no natural ordering) as well as ordinal data. However, this generalization can make it much more difficult to fit the model to the data.
Baseline category logit model
The baseline category model is defined by
This model does not impose an ordering on the categories and so can be applied to nominal data as well as ordinal data.
Ordered stereotype model
The ordered stereotype model is defined by
where the score parameters are constrained such that
.
This is a more parsimonious, and more specialised, model than the baseline category logit model:
can be thought of as similar to
.
The non-ordered stereotype model has the same form as the ordered stereotype model, but without the ordering imposed on
. This model can be applied to nominal data.
Note that the fitted scores,
, indicate how easy it is to distinguish between the different levels of
. If
then that indicates that the current set of data for the covariates
do not provide much information to distinguish between levels
and
, but that does not necessarily imply that the actual values
and
are far apart. And if the values of the covariates change, then for that new data the fitted scores
and
might then be far apart.
Adjacent categories logit model
The adjacent categories model is defined by
although the most common form, referred to in
Agresti (2010)
as the "proportional odds form" is defined by
This model can only be applied to ordinal data, since modelling the probabilities of shifts from one category to the next category implies that an ordering of those categories exists.
The adjacent categories logit model can be thought of as a special case of the baseline category logit model, where
. The adjacent categories logit model can also be thought of as a special case of the ordered stereotype model, where
, i.e. the distances between the
are defined in advance, rather than being estimated based on the data.
Comparisons between the models
The proportional odds model has a very different structure to the other three models, and also a different underlying meaning. Note that the size of the reference category in the proportional odds model varies with
, since
is compared to
, whereas in the other models the size of the reference category remains fixed, as
is compared to
or
.
Different link functions
There are variants of all the models that use different link functions, such as the probit link or the complementary log-log link.
Statistical tests
Differences in ordinal data can be tested using
rank tests.
Visualization and display
Ordinal data can be visualized in several different ways. Common visualizations are the
bar chart
A bar chart or bar graph is a chart or graph that presents categorical variable, categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A ...
or a
pie chart
A pie chart (or a circle chart) is a circular Statistical graphics, statistical graphic which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and area) ...
.
Tables can also be useful for displaying ordinal data and frequencies.
Mosaic plot
A mosaic plot, Marimekko chart, Mekko chart, or sometimes percent stacked bar plot, is a graphical visualization of data from two or more qualitative variables. It is the multidimensional extension of spineplots, which graphically display the same ...
s can be used to show the relationship between an ordinal variable and a nominal or ordinal variable. A bump chart—a line chart that shows the relative ranking of items from one time point to the next—is also appropriate for ordinal data.
Color or
grayscale
In digital photography, computer-generated imagery, and colorimetry, a greyscale (more common in Commonwealth English) or grayscale (more common in American English) image is one in which the value of each pixel is a single sample (signal), s ...
gradation can be used to represent the ordered nature of the data. A single-direction scale, such as income ranges, can be represented with a bar chart where increasing (or decreasing) saturation or lightness of a single color indicates higher (or lower) income. The ordinal distribution of a variable measured on a dual-direction scale, such as a Likert scale, could also be illustrated with color in a stacked bar chart. A neutral color (white or gray) might be used for the middle (zero or neutral) point, with contrasting colors used in the opposing directions from the midpoint, where increasing saturation or darkness of the colors could indicate categories at increasing distance from the midpoint.
Choropleth map
A choropleth map () is a type of statistical thematic map that uses pseudocolor, meaning color corresponding with an aggregate summary of a geographic characteristic within spatial enumeration units, such as population density or per-capita inco ...
s also use color or grayscale shading to display ordinal data.
Applications
The use of ordinal data can be found in most areas of research where categorical data are generated. Settings where ordinal data are often collected include the social and behavioral sciences and governmental and business settings where measurements are collected from persons by observation, testing, or
questionnaire
A questionnaire is a research instrument that consists of a set of questions (or other types of prompts) for the purpose of gathering information from respondents through survey or statistical study. A research questionnaire is typically a mix of ...
s. Some common contexts for the collection of ordinal data include
survey research
In research of human subjects, a survey is a list of questions aimed for extracting specific data from a particular group of people. Surveys may be conducted by phone, mail, via the internet, and also in person in public spaces. Surveys are used ...
; and
intelligence
Intelligence has been defined in many ways: the capacity for abstraction, logic, understanding, self-awareness, learning, emotional knowledge, reasoning, planning, creativity, critical thinking, and problem-solving. It can be described as t ...
,
aptitude
An aptitude is a component of a competence to do a certain kind of work at a certain level. Outstanding aptitude can be considered "talent", or "skill". Aptitude is inborn potential to perform certain kinds of activities, whether physical or ...
,
personality
Personality is any person's collection of interrelated behavioral, cognitive, and emotional patterns that comprise a person’s unique adjustment to life. These interrelated patterns are relatively stable, but can change over long time per ...
testing and
decision-making
In psychology, decision-making (also spelled decision making and decisionmaking) is regarded as the Cognition, cognitive process resulting in the selection of a belief or a course of action among several possible alternative options. It could be ...
.
Calculation of 'Effect Size' (Cliff's Delta ''d'') using ordinal data has been recommended as a measure of statistical dominance.
See also
*
List of analyses of categorical data
This is a list of statistical procedures which can be used for the analysis of categorical data, also known as data on the nominal scale and as categorical variables.
General tests
* Bowker's test of symmetry
* Categorical distribution, general ...
*
Ordinal Priority Approach
*
Ordinal number
In set theory, an ordinal number, or ordinal, is a generalization of ordinal numerals (first, second, th, etc.) aimed to extend enumeration to infinite sets.
A finite set can be enumerated by successively labeling each element with the leas ...
*
Ordinal space
In mathematics, an order topology is a specific topology that can be defined on any totally ordered set. It is a natural generalization of the topology of the real numbers to arbitrary totally ordered sets.
If ''X'' is a totally ordered set, t ...
References
Further reading
* {{cite book , last=Agresti , first=Alan , title=Analysis of Ordinal Categorical Data , location=Hoboken, New Jersey , publisher=Wiley , edition=2nd , year=2010 , isbn=978-0470082898
Statistical data types
Comparison (mathematical)