Categorical Data Analysis
   HOME

TheInfoList



OR:

This is a list of
statistical Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
procedures which can be used for the analysis of categorical data, also known as data on the
nominal scale Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scale ...
and as
categorical variable In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or ...
s.


General tests

* Bowker's test of symmetry *
Categorical distribution In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution) is a discrete probability distribution that describes the possible results of a random variable that can ...
, general model *
Chi-squared test A chi-squared test (also chi-square or test) is a Statistical hypothesis testing, statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine w ...
* Cochran–Armitage test for trend *
Cochran–Mantel–Haenszel statistics In statistics, the Cochran–Mantel–Haenszel test (CMH) is a test used in the analysis of stratified or matched categorical data. It allows an investigator to test the association between a binary predictor or treatment and a binary outcome su ...
*
Correspondence analysis Correspondence analysis (CA) is a multivariate statistical technique proposed by Herman Otto Hartley (Hirschfeld) and later developed by Jean-Paul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical ...
*
Cronbach's alpha Cronbach's alpha (Cronbach's \alpha), also known as tau-equivalent reliability (\rho_T) or coefficient alpha (coefficient \alpha), is a reliability coefficient and a measure of the internal consistency of tests and measures. It was named after ...
* Diagnostic odds ratio *
G-test In statistics, ''G''-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended. Formulation The general formula for ''G'' i ...
*
Generalized estimating equation In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unmeasured correlation between observations from different timepoints. Regression beta coefficient estimates ...
s *
Generalized linear models In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
* Krichevsky–Trofimov estimator * Kuder–Richardson Formula 20 *
Linear discriminant analysis Linear discriminant analysis (LDA), normal discriminant analysis (NDA), canonical variates analysis (CVA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to fi ...
*
Multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided die rolled ''n'' times. For ''n'' statistical independence, indepen ...
*
Multinomial logit In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the prob ...
*
Multinomial probit In statistics and econometrics, the multinomial probit model is a generalization of the probit model used when there are several possible categories that the dependent variable can fall into. As such, it is an alternative to the multinomial logi ...
*
Multiple correspondence analysis In statistics, multiple correspondence analysis (MCA) is a data analysis technique for nominal categorical data, used to detect and represent underlying structures in a data set. It does this by representing data as points in a low-dimensional Eucl ...
*
Odds ratio An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of event A taking place in the presence of B, and the odds of A in the absence of B ...
*
Poisson regression In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable ''Y'' has a Poisson distribution, and assumes the lo ...
* Powered partial least squares discriminant analysis *
Qualitative variation Qualitative descriptions or distinctions are based on some quality or characteristic rather than on some quantity or measured value. Qualitative may also refer to: *Qualitative property, a property that can be observed but not measured numericall ...
* Randomization test for goodness of fit *
Relative risk The relative risk (RR) or risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. Together with risk difference and odds ratio, relative risk measures the association bet ...
* Stratified analysis * Tetrachoric correlation *
Uncertainty coefficient In statistics, the uncertainty coefficient, also called proficiency, entropy coefficient or Theil's U, is a measure of nominal Association (statistics), association. It was first introduced by Henri Theil and is based on the concept of informatio ...
*
Wald test In statistics, the Wald test (named after Abraham Wald) assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the ...


Binomial data

*
Bernstein inequalities (probability theory) In probability theory, Bernstein inequalities give bounds on the probability that the sum of random variables deviates from its mean. In the simplest case, let ''X''1, ..., ''X'n'' be independent Bernoulli random variables taking valu ...
*
Binomial regression In statistics, binomial regression is a regression analysis technique in which the response (often referred to as ''Y'') has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial ha ...
*
Binomial proportion confidence interval In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments (Bernoulli trials). In other words, a binomial proportion conf ...
*
Chebyshev's inequality In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) provides an upper bound on the probability of deviation of a random variable (with finite variance) from its mean. More specifically, the probability ...
*
Chernoff bound In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms ''the'' Chernoff or Chernoff-Cramér boun ...
* Gauss's inequality *
Markov's inequality In probability theory, Markov's inequality gives an upper bound on the probability that a non-negative random variable is greater than or equal to some positive Constant (mathematics), constant. Markov's inequality is tight in the sense that for e ...
*
Rule of succession In probability theory, the rule of succession is a formula introduced in the 18th century by Pierre-Simon Laplace in the course of treating the sunrise problem. The formula is still used, particularly to estimate underlying probabilities when ...
* Rule of three (medicine) * Vysochanskiï–Petunin inequality


2 × 2 tables

*
Chi-squared test A chi-squared test (also chi-square or test) is a Statistical hypothesis testing, statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine w ...
* Diagnostic odds ratio *
Fisher's exact test Fisher's exact test (also Fisher-Irwin test) is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. The test assumes that a ...
*
G-test In statistics, ''G''-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended. Formulation The general formula for ''G'' i ...
*
Odds ratio An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of event A taking place in the presence of B, and the odds of A in the absence of B ...
*
Relative risk The relative risk (RR) or risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. Together with risk difference and odds ratio, relative risk measures the association bet ...
*
McNemar's test McNemar's test is a statistical test used on paired nominal data. It is applied to 2 × 2 contingency tables with a dichotomous trait, with matched pairs of subjects, to determine whether the row and column marginal frequencies are eq ...
*
Yates's correction for continuity In statistics, Yates's correction for continuity (or Yates's chi-squared test) is a statistical test commonly used when analyzing count data organized in a contingency table, particularly when sample sizes are small. It is specifically designed f ...


Measures of association

* Aickin's α * Andres and Marzo's delta *
Bangdiwala's B Bangdiwala's B statistic was created by Shrikant Bangdiwala in 1985 and is a measure of inter-rater agreement.Bangwidala S (1985) A graphical test for observer agreement. Proc 45th Int Stats Institute Meeting, Amsterdam, 1, 307–308Bangdiwala K (1 ...
* Bennett, Alpert, and Goldstein’s S * Brennan and Prediger’s κ *
Coefficient of colligation In statistics, Yule's ''Y'', also known as the coefficient of colligation, is a measure of association between two binary variables. The measure was developed by George Udny Yule in 1912,Michel G. Soete. A new theory on the measurement of associa ...
- Yule's Y * Coefficient of consistency * Coefficient of raw agreement * Conger's Kappa * Contingency coefficient – Pearson's C *
Cramér's V In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φ''c'') is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic an ...
* Dice's coefficient *
Fleiss' kappa Fleiss' kappa (named after Joseph L. Fleiss) is a statistical measure for assessing the reliability of agreement between a fixed number of raters when assigning categorical ratings to a number of items or classifying items. This contrasts with o ...
*
Goodman and Kruskal's lambda In probability theory and statistics, Goodman & Kruskal's lambda (\lambda) is a measure of proportional reduction in error in cross tabulation analysis. For any sample with a nominal independent variable and dependent variable (or ones that can b ...
* Guilford’s G * Gwet's AC1 *
Hanssen–Kuipers discriminant Youden's J statistic (also called Youden's index) is a single statistic that captures the performance of a dichotomous diagnostic test. In meteorology, this statistic is referred to as Peirce Skill Score (PSS), Hanssen–Kuipers Discriminant (HKD) ...
* Heidke skill score *
Jaccard index The Jaccard index is a statistic used for gauging the similarity and diversity of sample sets. It is defined in general taking the ratio of two sizes (areas or volumes), the intersection size divided by the union size, also called intersection ...
* Janson and Vegelius' C * Kappa statistics * Klecka's tau *
Krippendorff's Alpha Krippendorff's alpha coefficient, named after academic Klaus Krippendorff, is a statistical measure of the agreement achieved when coding a set of units of analysis. Since the 1970s, ''alpha'' has been used in content analysis where textual units ...
* Kuipers performance index *
Matthews correlation coefficient In statistics, the phi coefficient, or mean square contingency coefficient, denoted by ''φ'' or ''r'φ'', is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) an ...
*
Phi coefficient In statistics, the phi coefficient, or mean square contingency coefficient, denoted by ''φ'' or ''r'φ'', is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) an ...
* Press' Q * Renkonen similarity index * Prevalence adjusted bias adjusted kappa * Sakoda's adjusted Pearson's C * Scott's Pi * Sørensen similarity index * Stouffer's Z * True skill statistic *
Tschuprow's T In statistics, Tschuprow's ''T'' is a measure of association between two nominal variables, giving a value between 0 and 1 (inclusive). It is closely related to Cramér's V, coinciding with it for square contingency tables. It was published by ...
*
Tversky index The Tversky index, named after Amos Tversky, is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of the Sørensen–Dice coefficient and the Jaccard index. For sets ...
* Von Eye's kappa


Categorical manifest variables as latent variable

*
Latent variable model A latent variable model is a statistical model that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of latent variables. Latent variable models are applied across a wide range of fields such ...
**
Item response theory In psychometrics, item response theory (IRT, also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of Test (student assessment), tests, questionnaires, and sim ...
***
Rasch model The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, ...
** Latent class analysis


See also

*
Categorical distribution In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution) is a discrete probability distribution that describes the possible results of a random variable that can ...
{{statistics
Categorical data In statistics, a categorical variable (also called qualitative variable) is a variable (research), variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a ...
Analysis Analysis (: analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (38 ...