Statistical conclusion validity is the degree to which conclusions about the relationship among variables based on the data are correct or "reasonable". This began as being solely about whether the statistical conclusion about the relationship of the variables was correct, but now there is a movement towards moving to "reasonable" conclusions that use: quantitative, statistical, and qualitative data. Fundamentally, two types of errors can occur: type I (finding a difference or correlation when none exists) and type II (finding no difference or correlation when one exists). Statistical conclusion validity concerns the qualities of the study that make these types of errors more likely. Statistical conclusion validity involves ensuring the use of adequate sampling procedures, appropriate statistical tests, and reliable measurement procedures.

Common threats

The most common threats to statistical conclusion validity are:

Low statistical power

Power is the probability of correctly rejecting the

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

when it is false (inverse of the type II error rate). Experiments with low power have a higher probability of incorrectly failing to reject the null hypothesis—that is, committing a type II error and concluding that there is no detectable effect when there is an effect (e.g., there is real covariation between the cause and effect). Low power occurs when the sample size of the study is too small given other factors (small effect sizes, large group variability, unreliable measures, etc.).

Violated assumptions of the test statistics

Most statistical tests (particularly

inferential statistics Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...

) involve assumptions about the data that make the analysis suitable for testing a hypothesis. Violating the assumptions of statistical tests can lead to incorrect inferences about the cause–effect relationship. The

robustness Robustness is the property of being strong and healthy in constitution. When it is transposed into a system A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, ...

of a test indicates how sensitive it is to violations. Violations of assumptions may make tests more or less likely to make type I or II errors.

Dredging and the error rate problem

Each hypothesis test involves a set risk of a type I error (the alpha rate). If a researcher searches or "

dredges Dredging is the Digging, excavation of material from a water environment. Possible reasons for dredging include improving existing Water feature, water features; reshaping land and water features to alter drainage, navigability, and commercial ...

" through their data, testing many different hypotheses to find a significant effect, they are inflating their type I error rate. The more the researcher repeatedly tests the data, the higher the chance of observing a type I error and making an incorrect inference about the existence of a relationship.

Unreliability of measures

If the dependent and/or independent variable(s) are not measured reliably (i.e. with large amounts of

measurement error Observational error (or measurement error) is the difference between a measured value of a quantity and its unknown true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. Such errors are inherent in the measurement pr ...

), incorrect conclusions can be drawn.

Restriction of range

Restriction of range, such as floor and ceiling effects or selection effects, reduce the power of the experiment, and increase the chance of a type II error. This is because

correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...

s are attenuated (weakened) by reduced variability (see, for example, the equation for the

Pearson product-moment correlation coefficient In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviation ...

which uses score variance in its estimation).

Heterogeneity of the units under study

Greater heterogeneity of individuals participating in the study can also impact interpretations of results by increasing the variance of results or obscuring true relationships (see also

sampling error In statistics, sampling errors are incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics of the sample ...

). This obscures possible interactions between the characteristics of the units and the cause–effect relationship.

Threats to internal validity

Any effect that can impact the

internal validity Internal validity is the extent to which a piece of evidence supports a claim about cause and effect, within the context of a particular study. It is one of the most important properties of scientific studies and is an important concept in reason ...

of a research study may bias the results and impact the validity of statistical conclusions reached. These threats to internal validity include unreliability of treatment implementation (lack of

standardization Standardization (American English) or standardisation (British English) is the process of implementing and developing technical standards based on the consensus of different parties that include firms, users, interest groups, standards organiza ...

) or failing to control for

extraneous variables A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...

References

{{reflist Validity (statistics)