In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, Tschuprow's ''T'' is a measure of
association between two
nominal variables, giving a value between 0 and 1 (inclusive). It is closely related to
Cramér's V
In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φ''c'') is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic an ...
, coinciding with it for square
contingency tables
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the multivariate frequency distribution of the variables. They are heavily used in survey research, business int ...
.
It was published by
Alexander Tschuprow (alternative spelling: Chuprov) in 1939.
[Tschuprow, A. A. (1939) ''Principles of the Mathematical Theory of Correlation''; translated by M. Kantorowitsch. W. Hodge & Co.]
Definition
For an ''r'' × ''c'' contingency table with ''r'' rows and ''c'' columns, let
be the proportion of the population in cell
and let
:
and
Then the
mean square contingency is given as
:
and Tschuprow's ''T'' as
:
Properties
''T'' equals zero if and only if independence holds in the table, i.e., if and only if
. ''T'' equals one if and only there is perfect dependence in the table, i.e., if and only if for each ''i'' there is only one ''j'' such that
and vice versa. Hence, it can only equal 1 for square tables. In this it differs from
Cramér's V
In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φ''c'') is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic an ...
, which can be equal to 1 for any rectangular table.
Estimation
If we have a multinomial sample of size ''n'', the usual way to estimate ''T'' from the data is via the formula
:
where
is the proportion of the sample in cell
. This is the
empirical value of ''T''. With
the
Pearson chi-square statistic, this formula can also be written as
:
See also
Other measures of correlation for nominal data:
*
Cramér's V
In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φ''c'') is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic an ...
*
Phi coefficient
In statistics, the phi coefficient, or mean square contingency coefficient, denoted by ''φ'' or ''r'φ'', is a measure of association for two binary variables.
In machine learning, it is known as the Matthews correlation coefficient (MCC) an ...
*
Uncertainty coefficient
In statistics, the uncertainty coefficient, also called proficiency, entropy coefficient or Theil's U, is a measure of nominal Association (statistics), association. It was first introduced by Henri Theil and is based on the concept of informatio ...
*
Lambda coefficient
Other related articles:
*
Effect size
In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the ...
References
{{Reflist
* Liebetrau, A. (1983). Measures of Association (Quantitative Applications in the Social Sciences). Sage Publications
Summary statistics for contingency tables