HOME

TheInfoList



OR:

The Theil index is a statistic primarily used to measure
economic inequality There are wide varieties of economic inequality, most notably income inequality measured using the distribution of income (the amount of money people are paid) and wealth inequality measured using the distribution of wealth (the amount of ...
and other economic phenomena, though it has also been used to measure racial segregation. The Theil index ''T''T is the same as redundancy in
information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
which is the maximum possible
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
of the data minus the observed entropy. It is a special case of the generalized entropy index. It can be viewed as a measure of redundancy, lack of diversity, isolation, segregation, inequality, non-randomness, and compressibility. It was proposed by a Dutch
econometrician Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
Henri Theil Henri (Hans) Theil (October 13, 1924 – August 20, 2000) was a Dutch econometrician and professor at the Netherlands School of Economics in Rotterdam, known for his contributions to the field of econometrics. Biography Born in Amsterdam, Th ...
(1924-2000) at the
Erasmus University Rotterdam Erasmus University Rotterdam (abbreviated as ''EUR'', nl, Erasmus Universiteit Rotterdam ) is a public research university located in Rotterdam, Netherlands. The university is named after Desiderius Erasmus Roterodamus, a 15th-century humani ...
. Henri Theil himself said (1967): "The (Theil) index can be interpreted as the expected information content of the indirect message which transforms the population shares as prior probabilities into the income shares as posterior probabilities."
Amartya Sen Amartya Kumar Sen (; born 3 November 1933) is an Indian economist and philosopher, who since 1972 has taught and worked in the United Kingdom and the United States. Sen has made contributions to welfare economics, social choice theory, economi ...
noted, "But the fact remains that the Theil index is an arbitrary formula, and the average of the logarithms of the reciprocals of income shares weighted by income is not a measure that is exactly overflowing with intuitive sense."


Formula

For a population of ''N'' "agents" each with characteristic ''x'', the situation may be represented by the list ''x''''i'' (''i'' = 1,...,''N'') where ''x''''i'' is the characteristic of agent ''i''. For example, if the characteristic is income, then ''xi'' is the income of agent ''i''. The Theil ''T'' index is defined as : T_T=T_ = \frac\sum_^N \frac \ln\left(\frac \right) and the Theil ''L'' index is defined as : T_L = T_=\frac\sum_^N \ln\left(\frac\right) where \mu is the mean income: : \mu=\frac\sum_^N x_i Theil-L is an income-distribution's dis-entropy per person, measured with respect to maximum entropy (...which is achieved with complete equality). (In an alternative interpretation of it, Theil-L is the natural-logarithm of the geometric-mean of the ratio: (mean income)/(income i), over all the incomes. The related Atkinson(1) is just 1 minus the geometric-mean of (income i)/(mean income),over the income distribution.) Because a transfer between a larger income & a smaller one will change the smaller income's ratio more than it changes the larger income's ratio, the transfer-principle is satisfied by this index. Equivalently, if the situation is characterized by a discrete distribution function ''f''''k'' (''k'' = 0,...,''W'') where ''f''''k'' is the fraction of the population with income ''k'' and ''W'' = ''Nμ'' is the total income, then \sum_^W f_k=1 and the Theil index is: : T_T=\sum_^W\, f_k\, \frac k \mu \ln\left(\frac k \mu \right) where \mu is again the mean income: : \mu=\sum_^W k f_k Note that in this case income ''k'' is an integer and ''k=1'' represents the smallest increment of income possible (e.g., cents). if the situation is characterized by a continuous distribution function ''f''(''k'') (supported from 0 to infinity) where ''f''(''k'') ''dk'' is the fraction of the population with income ''k'' to ''k'' + ''dk'', then the Theil index is: :T_T=\int_0^\infty f(k) \frac k \mu \ln\left(\frac k \mu \right) dk where the mean is: : \mu=\int_0^\infty k f(k) \, dk Theil indices for some common continuous probability distributions are given in the table below: : If everyone has the same income, then ''T''T equals 0. If one person has all the income, then ''T''T gives the result \ln N, which is maximum inequality. Dividing ''T''T by \ln N can normalize the equation to range from 0 to 1, but then the independence axiom is violated: T \cup xne T /math> and does not qualify as a measure of inequality. The Theil index measures an entropic "distance" the population is away from the egalitarian state of everyone having the same income. The numerical result is in terms of negative entropy so that a higher number indicates more order that is further away from the complete equality. Formulating the index to represent negative entropy instead of entropy allows it to be a measure of inequality rather than equality.


Relation to Atkinson Index

The Theil index can be transformed into an Atkinson index, which has a range between 0 and 1 (0% and 100%), where 0 indicates perfect equality and 1 (100%) indicates maximum inequality. (See Generalized entropy index for the transformation.)


Derivation from entropy

The Theil index is derived from Shannon's measure of
information entropy In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
S, where entropy is a measure of randomness in a given set of information. In information theory, physics, and the Theil index, the general form of entropy is ::S = k \sum_^N \left( p_i \log_ \left( \right) \right) = - k \sum_^N \left( p_i \log_ \left( \right) \right) :where ::*i is an individual item from the set (such as an individual member from a population, or an individual byte from a computer file). ::*p_i is the probability of finding i from a random sample from the set. ::*k is a constant.When this equation is used in physics, k typically represents the
Boltzmann constant The Boltzmann constant ( or ) is the proportionality factor that relates the average relative kinetic energy of particles in a gas with the thermodynamic temperature of the gas. It occurs in the definitions of the kelvin and the gas constan ...
. In information theory or statistics, k is typically equal to 1 (such as in the Theil Index).
::*\log_ \left( \right) is a
logarithm In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number  to the base  is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 ...
with a base equal to a.In information theory, when information is given in binary digits, the
binary logarithm In mathematics, the binary logarithm () is the power to which the number must be raised to obtain the value . That is, for any real number , :x=\log_2 n \quad\Longleftrightarrow\quad 2^x=n. For example, the binary logarithm of is , the ...
is used (with a equal to 2). In physics and also in computation of Theil index, the
natural logarithm The natural logarithm of a number is its logarithm to the base of the mathematical constant , which is an irrational and transcendental number approximately equal to . The natural logarithm of is generally written as , , or sometimes, if ...
is used (with a equal to e).
When looking at the distribution of income in a population, p_i is equal to the ratio of a particular individual's income to the total income of the entire population. This gives the observed entropy S_\text of a population to be: ::S_\text = \sum_^N \left( \frac \ln \left( \right) \right) :where ::*x_i is the income of a particular individual. ::*\left( N \bar \right) is the total income of the entire population, with :::*N being the number of individuals in the population. :::*\bar ("x bar") being the average income of the population. ::*\ln \left( x \right) is the
natural logarithm The natural logarithm of a number is its logarithm to the base of the mathematical constant , which is an irrational and transcendental number approximately equal to . The natural logarithm of is generally written as , , or sometimes, if ...
of x: \left( \log_ \left( x \right) \right). The Theil index T_T measures how far the observed entropy (S_\text, which represents how randomly income is distributed) is from the highest possible entropy (S_\text = \ln \left( \right),When the income of every individual is equal to the average income, \sum_^N \left( \left( \frac = 1 \right) \frac \ln \left( \right) \right) =\sum_^N \left( \frac \ln \left( \right) \right) =\ln \left( \right) which represents income being maximally distributed amongst individuals in the population– a distribution analogous to the ost likelyoutcome of an infinite number of random coin tosses: an equal distribution of heads and tails). Therefore, the Theil index is the difference between the theoretical maximum entropy (which would be reached if the incomes of every individual were equal) minus the observed entropy: ::T_T = S_\text - S_\text = \ln \left( \right) - S_\text When x is in units of population/species, S_\text is a measure of biodiversity and is called the
Shannon index A diversity index is a quantitative measure that reflects how many different types (such as species) there are in a dataset (a community), and that can simultaneously take into account the phylogenetic relations among the individuals distributed a ...
. If the Theil index is used with x=population/species, it is a measure of inequality of population among a set of species, or "bio-isolation" as opposed to "wealth isolation". The Theil index measures what is called redundancy in information theory.http://www.poorcity.richcity.org (Redundancy, Entropy and Inequality Measures) It is the left over "information space" that was not utilized to convey information, which reduces the effectiveness of the
price signal A price signal is information conveyed to consumers and producers, via the prices offered or requested for, and the amount requested or offered of a product or service, which provides a signal to increase or decrease quantity supplied or quantit ...
. The Theil index is a measure of the redundancy of income (or other measure of wealth) in some individuals. Redundancy in some individuals implies scarcity in others. A high Theil index indicates the total income is not distributed evenly among individuals in the same way an uncompressed text file does not have a similar number of byte locations assigned to the available unique byte characters.


Decomposability

According to the
World Bank The World Bank is an international financial institution that provides loans and grants to the governments of low- and middle-income countries for the purpose of pursuing capital projects. The World Bank is the collective name for the Inte ...
,
"The best-known entropy measures are Theil’s T (T_T) and Theil’s L (T_L), both of which allow one to decompose inequality into the part that is due to inequality within areas (e.g. urban, rural) and the part that is due to differences between areas (e.g. the rural-urban income gap). Typically at least three-quarters of inequality in a country is due to within-group inequality, and the remaining quarter to between-group differences."
If the population is divided into m subgroups and * s_i is the income share of group i, * N is the total population and N_i is the population of group i, * T_i is the Theil index for that subgroup, * \overline_i is the average income in group i, and * \mu is the average income of the population, then Theil's T index is : T_T = \sum_^m s_i T_i + \sum_^m s_i \ln for s_i = \frac\frac For example, inequality within the United States is the average inequality within each state, weighted by state income, plus the inequality between states. :Note: This image is not the Theil Index in each area of the United States, but of contributions to the Theil Index for the U.S. by each area. The Theil Index is always positive, although individual contributions to the Theil Index may be negative or positive. The decomposition of the Theil index which identifies the share attributable to the between-region component becomes a helpful tool for the positive analysis of regional inequality as it suggests the relative importance of spatial dimension of inequality.


Theil's ''T'' versus Theil's ''L''

Both Theil's ''T'' and Theil's ''L'' are decomposable. The difference between them is based on the part of the outcomes distribution that each is used for. Indexes of inequality in the generalized entropy (GE) family are more sensitive to differences in income shares among the poor or among the rich depending on a parameter that defines the GE index. The smaller the parameter value for GE, the more sensitive it is to differences at the bottom of the distribution. : GE(0) = Theil's ''L'' and is more sensitive to differences at the lower end of the distribution. It is also referred to as the
mean log deviation In statistics and econometrics, the mean log deviation (MLD) is a measure of income inequality. The MLD is zero when everyone has the same income, and takes larger positive values as incomes become more unequal, especially at the high end. Definit ...
measure. : GE(1) = Theil's ''T'' and is more sensitive to differences at the top of the distribution. The decomposability is a property of the Theil index which the more popular Gini coefficient does not offer. The Gini coefficient is more intuitive to many people since it is based on the
Lorenz curve In economics, the Lorenz curve is a graphical representation of the distribution of income or of wealth. It was developed by Max O. Lorenz in 1905 for representing inequality of the wealth distribution. The curve is a graph showing the prop ...
. However, it is not easily decomposable like the Theil.


Applications

In addition to multitude of economic applications, the Theil index has been applied to assess performance of
irrigation Irrigation (also referred to as watering) is the practice of applying controlled amounts of water to land to help grow crops, landscape plants, and lawns. Irrigation has been a key aspect of agriculture for over 5,000 years and has been devel ...
systems and distribution of
software metrics In software engineering and development, a software metric is a standard of measure of a degree to which a software system or process possesses some property. Even if a metric is not a measurement (metrics are functions, while measurements are t ...
.


Application in OECD

The Theil index is used for measuring regional inequalities by
OECD The Organisation for Economic Co-operation and Development (OECD; french: Organisation de coopération et de développement économiques, ''OCDE'') is an intergovernmental organisation with 38 member countries, founded in 1961 to stimulate ...
(the Organisation for Economic Co-operation and Development), whereas the Theil index is defined as T= \frac\sum_^N \frac \ln\left(\frac \right) where N is the number of regions n the OECD, y_i is the variable of interest in the i^ region (e.g. life expectancy, household income, homicide rate,...) and \bar is the mean of the given variable of interest across all regions. The interpretation is following: The Theil index ranges between 0 and \infty , where zero represents an equal distribution and any other (higher) value represents a higher level of disproportion. ''Note:'' The index allocates an equal weight to each region irrespective of its extent; hence differencies in the values of the index among countries could be partially due to differences in the average size of regions in each of the countries.


See also

* Generalized entropy index * Atkinson index * Gini coefficient *
Hoover index The Hoover index, also known as the Robin Hood index or the Schutz index, is a measure of income inequality. It is equal to the percentage of the total population's income that would have to be redistributed to make all the incomes equal. i.e. Th ...
*
Income inequality metrics Income inequality metrics or income distribution metrics are used by social scientists to measure the distribution of income and economic inequality among the participants in a particular economy, such as that of a specific country or of the world ...
* Suits index * Wealth condensation * Diversity index


Notes


References


External links

* Software: *
Free Online Calculator
computes the Gini Coefficient, plots the Lorenz curve, and computes many other measures of concentration for any dataset ** Free Calculator

an
downloadable scripts
( Python and Lua) for Atkinson, Gini, and Hoover inequalities ** Users of th
R
data analysis software can install the "ineq" package which allows for computation of a variety of inequality indices including Gini, Atkinson, Theil. **
MATLAB Inequality Package
{{Webarchive, url=https://web.archive.org/web/20081004090028/http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=19968 , date=2008-10-04 , including code for computing Gini, Atkinson, Theil indexes and for plotting the Lorenz Curve. Many examples are available. Information theory Income inequality metrics Welfare economics