The granularity-related inconsistency of means (GRIM) test is a simple

statistical test A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...

used to identify inconsistencies in the analysis of data sets. The test relies on the fact that, given a dataset containing ''N'' integer values, the

arithmetic mean In mathematics and statistics, the arithmetic mean ( ) or arithmetic average, or just the '' mean'' or the ''average'' (when the context is clear), is the sum of a collection of numbers divided by the count of numbers in the collection. The co ...

(commonly called simply the average) is restricted to a few possible values: it must always be expressible as a

fraction A fraction (from la, fractus, "broken") represents a part of a whole or, more generally, any number of equal parts. When spoken in everyday English, a fraction describes how many parts of a certain size there are, for example, one-half, eight ...

with an integer numerator and a

denominator A fraction (from la, fractus, "broken") represents a part of a whole or, more generally, any number of equal parts. When spoken in everyday English, a fraction describes how many parts of a certain size there are, for example, one-half, eight ...

''N''. If the reported mean does not fit this description, there must be an error somewhere; the preferred term for such errors is "inconsistencies", to emphasise that their origin is, on first discovery, typically unknown. GRIM inconsistencies can result from inadvertent data-entry or

typographical error A typographical error (often shortened to typo), also called a misprint, is a mistake (such as a spelling mistake) made in the typing of printed (or electronic) material. Historically, this referred to mistakes in manual type-setting (typography) ...

s or from

scientific fraud Scientific misconduct is the violation of the standard codes of scholarly conduct and ethical behavior in the publication of professional scientific research. A '' Lancet'' review on ''Handling of Scientific Misconduct in Scandinavian countrie ...

. The GRIM test is most useful in fields such as

psychology Psychology is the science, scientific study of mind and behavior. Psychology includes the study of consciousness, conscious and Unconscious mind, unconscious phenomena, including feelings and thoughts. It is an academic discipline of immens ...

where researchers typically use small

groups A group is a number of persons or things that are located, gathered, or classed together. Groups of people * Cultural group, a group whose members share the same cultural identity * Ethnic group, a group whose members share the same ethnic ide ...

and measurements are often

integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the languag ...

s. The GRIM test was proposed by Nick Brown and James Heathers in 2016, following increased awareness of the

replication crisis The replication crisis (also called the replicability crisis and the reproducibility crisis) is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibi ...

in some fields of science.

Procedure

The GRIM test is straightforward to perform. For each reported mean in a paper, the

sample size Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a populati ...

(''N'') is found, and all fractions with denominator ''N'' are calculated. The mean is then checked against this list (being aware of the fact that values may be rounded inconsistently: depending on the context, a mean of 1.125 may be reported as 1.12 or 1.13). If the mean is not in this list, it is highlighted as mathematically impossible.

Example

Consider an experiment in which a fair dice is rolled 20 times. Each roll will produce one whole number between 1 and 6, and the hypothesized

mean value There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the ''arithm ...

is 3.5. The results of the rolls are then averaged together, and the mean is reported as 3.48. This is close to the expected value, and appears to support the hypothesis. However, a GRIM test reveals that the reported mean is mathematically impossible: the result of dividing any whole number by 20, written to 2

decimal place Significant figures (also known as the significant digits, ''precision'' or ''resolution'') of a number in positional notation are digits in the number that are reliable and necessary to indicate the quantity of something. If a number expres ...

s, must be of the form X.X0 or X.X5; it is impossible to divide any integer by 20 and produce a result with an "8" in the second decimal place.

Interpretation and limitations

Even if the data fails the GRIM test, this is not automatically a sign of manipulation. Errors in the mean can come about innocently as a result of an error on the part of the tester, typographical errors, calculation and programming mistakes, or improper reporting of the sample size. However, it can be a sign that some data has been improperly excluded or that the mean has been illegitimately fudged in order to make the results appear more significant. The location of failures can be indicative of the underlying cause: an isolated impossible mean may be caused by an error, multiple impossible values in the same row of a table indicate a poor response rate, and multiple impossible values in the same column indicate the given sample size is incorrect. Multiple errors scattered throughout a table can be a sign of deeper problems, and other statistical tests can be used to analyze the suspect data. The GRIM test works best with data sets in which: the sample size is relatively small, the number of subcomponents in

composite measure Composite measure in statistics and research design refer to composite measures of variables, i.e. measurements based on multiple data items. An example of a composite measure is an IQ test, which gives a single score based on a series of response ...

s is also small, and the mean is reported to multiple decimal places. In some cases, a valid mean may appear to fail the test if the input data is not discretized as expected – for example, if people are asked how many slices of pizza they ate at a buffet, some people may respond with a fraction such as "three and a half" instead of a whole number as expected.

Applications

Brown and Heathers applied the test to 260 articles published in '' Psychological Science'', '' Journal of Experimental Psychology: General'', and ''

Journal of Personality and Social Psychology The ''Journal of Personality and Social Psychology'' is a monthly peer-reviewed scientific journal published by the American Psychological Association that was established in 1965. It covers the fields of social and personality psychology. The ed ...

''. Of these articles, 71 were amenable to GRIM test analysis; 36 of these contained at least one impossible value and 16 contained multiple impossible values. GRIM testing also played a significant role in uncovering errors in publications by

Cornell University Cornell University is a private statutory land-grant research university based in Ithaca, New York. It is a member of the Ivy League. Founded in 1865 by Ezra Cornell and Andrew Dickson White, Cornell was founded with the intention to tea ...

's Food and Brand Lab under Brian Wansink. GRIM testing revealed that a series of articles on the effect of price on consumption at an all-you-can-eat pizza buffet contained many impossible means – deeper analysis of the raw data revealed that in many cases, sample sizes were incorrectly stated and values incorrectly calculated.

References

{{Reflist

External links

Online GRIM test calculator
Statistical tests