The granularity-related inconsistency of means (GRIM) test is a simple
statistical test
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
used to identify inconsistencies in the analysis of data sets. The test relies on the fact that, given a dataset containing ''N'' integer values, the
arithmetic mean
In mathematics and statistics, the arithmetic mean ( ) or arithmetic average, or just the ''mean'' or the '' average'' (when the context is clear), is the sum of a collection of numbers divided by the count of numbers in the collection. The coll ...
(commonly called simply the average) is restricted to a few possible values: it must always be expressible as a
fraction
A fraction (from la, fractus, "broken") represents a part of a whole or, more generally, any number of equal parts. When spoken in everyday English, a fraction describes how many parts of a certain size there are, for example, one-half, eight ...
with an integer numerator and a
denominator
A fraction (from la, fractus, "broken") represents a part of a whole or, more generally, any number of equal parts. When spoken in everyday English, a fraction describes how many parts of a certain size there are, for example, one-half, eight ...
''N''. If the reported mean does not fit this description, there must be an error somewhere; the preferred term for such errors is "inconsistencies", to emphasise that their origin is, on first discovery, typically unknown. GRIM inconsistencies can result from inadvertent data-entry or
typographical error
A typographical error (often shortened to typo), also called a misprint, is a mistake (such as a spelling mistake) made in the typing of printed (or electronic) material. Historically, this referred to mistakes in manual type-setting (typography). ...
s or from
scientific fraud. The GRIM test is most useful in fields such as
psychology
Psychology is the scientific study of mind and behavior. Psychology includes the study of conscious and unconscious phenomena, including feelings and thoughts. It is an academic discipline of immense scope, crossing the boundaries betwe ...
where researchers typically use small
groups and measurements are often
integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
s. The GRIM test was proposed by Nick Brown and James Heathers in 2016, following increased awareness of the
replication crisis
The replication crisis (also called the replicability crisis and the reproducibility crisis) is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibi ...
in some fields of science.
Procedure
The GRIM test is straightforward to perform. For each reported mean in a paper, the
sample size
Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a populatio ...
(''N'') is found, and all fractions with denominator ''N'' are calculated. The mean is then checked against this list (being aware of the fact that values may be
rounded
Round or rounds may refer to:
Mathematics and science
* The contour of a closed curve or surface with no sharp corners, such as an ellipse, circle, rounded rectangle, cant, or sphere
* Rounding, the shortening of a number to reduce the num ...
inconsistently: depending on the context, a mean of 1.125 may be reported as 1.12 or 1.13). If the mean is not in this list, it is highlighted as mathematically impossible.
Example
Consider an experiment in which a
fair dice
Dice (singular die or dice) are small, throwable objects with marked sides that can rest in multiple positions. They are used for generating random values, commonly as part of tabletop games, including dice games, board games, role-playing ga ...
is rolled 20 times. Each roll will produce one whole number between 1 and 6, and the
hypothesized mean value
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the ''arit ...
is 3.5. The results of the rolls are then averaged together, and the mean is reported as 3.48. This is close to the expected value, and appears to support the hypothesis. However, a GRIM test reveals that the reported mean is mathematically impossible: the result of dividing any whole number by 20, written to 2
decimal place
Significant figures (also known as the significant digits, ''precision'' or ''resolution'') of a number in positional notation are digits in the number that are reliable and necessary to indicate the quantity of something.
If a number expre ...
s, must be of the form X.X0 or X.X5; it is impossible to divide any integer by 20 and produce a result with an "8" in the second decimal place.
Interpretation and limitations
Even if the data fails the GRIM test, this is not automatically a sign of manipulation. Errors in the mean can come about innocently as a result of an error on the part of the tester, typographical errors, calculation and programming mistakes, or improper reporting of the sample size.
However, it can be a sign that some data has been improperly excluded or that the mean has been illegitimately fudged
in order to make the results appear more significant. The location of failures can be indicative of the underlying cause: an isolated impossible mean may be caused by an error, multiple impossible values in the same row of a table indicate a poor
response rate, and multiple impossible values in the same column indicate the given sample size is incorrect. Multiple errors scattered throughout a table can be a sign of deeper problems, and other statistical tests can be used to analyze the suspect data.
The GRIM test works best with data sets in which: the sample size is relatively small, the number of subcomponents in
composite measures is also small, and the mean is reported to multiple decimal places.
In some cases, a valid mean may appear to fail the test if the input data is not
discretized
In applied mathematics, discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical ...
as expected – for example, if people are asked how many slices of pizza they ate at a buffet, some people may respond with a fraction such as "three and a half" instead of a whole number as expected.
Applications
Brown and Heathers applied the test to 260 articles published in ''
Psychological Science
''Psychological Science'', the flagship journal of the Association for Psychological Science (APS), is a monthly, peer-reviewed, scientific journal published by SAGE Publications.
Publication scope
''Psychological Science'' publishes research ...
'', ''
Journal of Experimental Psychology: General'', and ''
Journal of Personality and Social Psychology
The ''Journal of Personality and Social Psychology'' is a monthly peer-reviewed scientific journal published by the American Psychological Association that was established in 1965. It covers the fields of social and personality psychology. The edi ...
''. Of these articles, 71 were amenable to GRIM test analysis; 36 of these contained at least one impossible value and 16 contained multiple impossible values.
GRIM testing also played a significant role in uncovering errors in publications by
Cornell University
Cornell University is a private statutory land-grant research university based in Ithaca, New York. It is a member of the Ivy League. Founded in 1865 by Ezra Cornell and Andrew Dickson White, Cornell was founded with the intention to ...
's Food and Brand Lab under
Brian Wansink
Brian Wansink is a former American professor and researcher who worked in consumer behavior and marketing research. He is the former executive director of the USDA's Center for Nutrition Policy and Promotion (CNPP) (2007–2009) and held the Joh ...
. GRIM testing revealed that a series of articles on the effect of price on consumption at an all-you-can-eat pizza buffet contained many impossible means – deeper analysis of the raw data revealed that in many cases, sample sizes were incorrectly stated and values incorrectly calculated.
References
{{Reflist
External links
Online GRIM test calculator
Statistical tests