The granularity-related inconsistency of means (GRIM) test is a simple

statistical test A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. ...

used to identify inconsistencies in the analysis of data sets. The test relies on the fact that, given a dataset containing ''N'' integer values, the

arithmetic mean In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...

(commonly called simply the average) is restricted to a few possible values: it must always be expressible as a

fraction A fraction (from , "broken") represents a part of a whole or, more generally, any number of equal parts. When spoken in everyday English, a fraction describes how many parts of a certain size there are, for example, one-half, eight-fifths, thre ...

with an integer numerator and a

denominator A fraction (from , "broken") represents a part of a whole or, more generally, any number of equal parts. When spoken in everyday English, a fraction describes how many parts of a certain size there are, for example, one-half, eight-fifths, thre ...

''N''. If the reported mean does not fit this description, there must be an error somewhere; the preferred term for such errors is "inconsistencies", to emphasise that their origin is, on first discovery, typically unknown. GRIM inconsistencies can result from inadvertent data-entry or

typographical error A typographical error (often shortened to typo), also called a misprint, is a mistake (such as a spelling or transposition error) made in the typing of printed or electronic material. Historically, this referred to mistakes in manual typesettin ...

s or from scientific fraud. The GRIM test is most useful in fields such as

psychology Psychology is the scientific study of mind and behavior. Its subject matter includes the behavior of humans and nonhumans, both consciousness, conscious and Unconscious mind, unconscious phenomena, and mental processes such as thoughts, feel ...

where researchers typically use small groups and measurements are often

integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...

s. The GRIM test was proposed by and James Heathers in 2016, following increased awareness of the

replication crisis The replication crisis, also known as the reproducibility or replicability crisis, refers to the growing number of published scientific results that other researchers have been unable to reproduce or verify. Because the reproducibility of empir ...

in some fields of science.

Procedure

The GRIM test is straightforward to perform. For each reported mean in a paper, the

sample size Sample size determination or estimation is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences abo ...

(''N'') is found, and all fractions with denominator ''N'' are calculated. The mean is then checked against this list (being aware of the fact that values may be rounded inconsistently: depending on the context, a mean of 1.125 may be reported as 1.12 or 1.13). If the mean is not in this list, it is highlighted as mathematically impossible.

Example

Consider an experiment in which a fair dice is rolled 20 times. Each roll will produce one whole number between 1 and 6, and the hypothesized

mean value A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

is 3.5. The results of the rolls are then averaged together, and the mean is reported as 3.48. This is close to the expected value, and appears to support the hypothesis. However, a GRIM test reveals that the reported mean is mathematically impossible: the result of dividing any whole number by 20, written to 2 decimal places, must be of the form X.X0 or X.X5; it is impossible to divide any integer by 20 and produce a result with an "8" in the second decimal place.

Interpretation and limitations

Even if the data fails the GRIM test, this is not automatically a sign of manipulation. Errors in the mean can come about innocently as a result of an error on the part of the tester, typographical errors, calculation and programming mistakes, or improper reporting of the sample size. However, it can be a sign that some data has been improperly excluded or that the mean has been illegitimately fudged in order to make the results appear more significant. The location of failures can be indicative of the underlying cause: an isolated impossible mean may be caused by an error, multiple impossible values in the same row of a table indicate a poor response rate, and multiple impossible values in the same column indicate the given sample size is incorrect. Multiple errors scattered throughout a table can be a sign of deeper problems, and other statistical tests can be used to analyze the suspect data. The GRIM test works best with data sets in which: the sample size is relatively small, the number of subcomponents in

composite measure Composite measure in statistics and research design refer to composite measures of variables, i.e. measurements based on multiple data items. An example of a composite measure is an IQ test, which gives a single score based on a series of response ...

s is also small, and the mean is reported to multiple decimal places. In some cases, a valid mean may appear to fail the test if the input data is not discretized as expected – for example, if people are asked how many slices of pizza they ate at a buffet, some people may respond with a fraction such as "three and a half" instead of a whole number as expected.

Applications

Brown and Heathers applied the test to 260 articles published in ''

Psychological Science ''Psychological Science'', the flagship journal of the Association for Psychological Science, is a monthly, peer-reviewed scientific journal published by SAGE Publications. The journal publishes research articles, short reports, and research repor ...

'', '' Journal of Experimental Psychology: General'', and ''

Journal of Personality and Social Psychology The ''Journal of Personality and Social Psychology'' is a monthly peer-reviewed scientific journal published by the American Psychological Association that was established in 1965. It covers the fields of social and personality psychology. The edi ...

''. Of these articles, 71 were amenable to GRIM test analysis; 36 of these contained at least one impossible value and 16 contained multiple impossible values. GRIM testing also played a significant role in uncovering errors in publications by

Cornell University Cornell University is a Private university, private Ivy League research university based in Ithaca, New York, United States. The university was co-founded by American philanthropist Ezra Cornell and historian and educator Andrew Dickson W ...

's Food and Brand Lab under Brian Wansink. GRIM testing revealed that a series of articles on the effect of price on consumption at an all-you-can-eat pizza buffet contained many impossible means – deeper analysis of the raw data revealed that in many cases, sample sizes were incorrectly stated and values incorrectly calculated.

References

{{Reflist

External links

Online GRIM test calculator

Statistical tests