Top-coded
   HOME

TheInfoList



OR:

In
econometrics Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
and statistics, a top-coded data observation is one for which data points whose values are above an upper bound are
censored Censorship is the suppression of speech, public communication, or other information. This may be done on the basis that such material is considered objectionable, harmful, sensitive, or "inconvenient". Censorship can be conducted by governments ...
. Survey data are often topcoded before release to the public to preserve the anonymity of respondents. For example, if a survey answer reported a respondent with self-identified wealth of $79 billion, it would not be anonymous because people would know there is a good chance the respondent was
Bill Gates William Henry Gates III (born October 28, 1955) is an American business magnate and philanthropist. He is a co-founder of Microsoft, along with his late childhood friend Paul Allen. During his career at Microsoft, Gates held the positions ...
. Top-coding may be also applied to prevent possibly-erroneous outliers from being published. Bottom-coding is analogous, e.g. if amounts below zero are reported as zero. Top-coding occurs for data recorded in groups, e.g. if age ranges are reported in these groups: 0-20, 21-50, 50-99, 100-and-up. Here we only know how many people have ages above 100, not their distribution. Producers of survey data sometimes release the average of the censored amounts to help users impute unbiased estimates of the top group.


Example: Top-coding of income at $30,000

Top-coding is a general problem for analysis of public use data sets. Top-coding in the
Current Population Survey The Current Population Survey (CPS) is a monthly survey of about 60,000 U.S. households conducted by the United States Census Bureau for the Bureau of Labor Statistics (BLS). The BLS uses the data to publish reports early each month called the Em ...
makes it hard to estimate measures of income inequality since the shape of the distribution of high incomes is blocked. To help overcome this problem, CPS provides the mean value of top-coded values. The practice of top-coding, or capping the reported maximum value on tax returns to protect the earner's anonymity, complicates the analysis of the distribution of wealth in the United States.


Implications for

ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
estimation

*If the lower bound of the top-coded group is used as a regressor value (30000 in the example above), OLS is biased and inconsistent since the regressor's highest values are reported with a systematic error. *The top-coded observations can be omitted from the regression entirely. Provided there are no systematic differences between the omitted group and the included groups, OLS is consistent and unbiased. *The Tobit procedure is robust to top coding, and gives unbiased estimates.


See also

*
Tobit model In statistics, a tobit model is any of a class of regression models in which the observed range of the dependent variable is censored in some way. The term was coined by Arthur Goldberger in reference to James Tobin, who developed the model in 19 ...
* Heckit model * Truncated data *
Censoring (statistics) In statistics, censoring is a condition in which the value of a measurement or observation is only partially known. For example, suppose a study is conducted to measure the impact of a drug on mortality rate. In such a study, it may be known tha ...


Further reading

* Jenkins, S. P., Burkhauser, R. V., Feng, S., & Larrimore, J. (2009)
Measuring inequality using censored data: a multiple imputation approach
ISER Working Paper Series 2009-04, Institute for Social and Economic Research.


References

Statistical data coding {{econometrics-stub