Statistical conclusion validity is the degree to which conclusions about the relationship among
variables based on the data are correct or "reasonable". This began as being solely about whether the statistical conclusion about the relationship of the variables was correct, but now there is a movement towards moving to "reasonable" conclusions that use: quantitative, statistical, and qualitative data.
Fundamentally, two types of errors can occur:
type I (finding a difference or correlation when none exists) and
type II (finding no difference or correlation when one exists). Statistical conclusion validity concerns the qualities of the study that make these types of errors more likely. Statistical conclusion validity involves ensuring the use of adequate sampling procedures, appropriate statistical tests, and reliable measurement procedures.
Common threats
The most common threats to statistical conclusion validity are:
Low statistical power
Power is the probability of correctly rejecting the
null hypothesis
The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
when it is false (inverse of the type II error rate). Experiments with low power have a higher probability of incorrectly failing to reject the null hypothesis—that is, committing a type II error and concluding that there is no detectable effect when there is an effect (e.g., there is real covariation between the cause and effect). Low power occurs when the sample size of the study is too small given other factors (small
effect sizes, large group variability, unreliable measures, etc.).
Violated assumptions of the test statistics
Most statistical tests (particularly
inferential statistics
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
) involve assumptions about the data that make the analysis suitable for
testing a hypothesis. Violating the assumptions of statistical tests can lead to incorrect inferences about the cause–effect relationship. The
robustness
Robustness is the property of being strong and healthy in constitution. When it is transposed into a system
A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, ...
of a test indicates how sensitive it is to violations. Violations of assumptions may make tests more or less likely to make
type I or II errors.
Dredging and the error rate problem
Each hypothesis test involves a set risk of a type I error (the alpha rate). If a researcher searches or "
dredges
Dredging is the Digging, excavation of material from a water environment. Possible reasons for dredging include improving existing Water feature, water features; reshaping land and water features to alter drainage, navigability, and commercial ...
" through their data, testing many different hypotheses to find a significant effect, they are inflating their type I error rate. The more the researcher repeatedly tests the data, the higher the chance of observing a type I error and making an incorrect inference about the existence of a relationship.
Unreliability of measures
If the dependent and/or independent variable(s) are not measured
reliably (i.e. with large amounts of
measurement error
Observational error (or measurement error) is the difference between a measured value of a quantity and its unknown true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. Such errors are inherent in the measurement pr ...
), incorrect conclusions can be drawn.
Restriction of range
Restriction of range, such as
floor and ceiling effects or
selection effects, reduce the power of the experiment, and increase the chance of a type II error.
This is because
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
s are attenuated (weakened) by reduced variability (see, for example, the equation for the
Pearson product-moment correlation coefficient
In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviation ...
which uses score variance in its estimation).
Heterogeneity of the units under study
Greater heterogeneity of individuals participating in the study can also impact interpretations of results by increasing the variance of results or obscuring true relationships (see also
sampling error
In statistics, sampling errors are incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics of the sample ...
). This obscures possible interactions between the characteristics of the units and the cause–effect relationship.
Threats to internal validity
Any effect that can impact the
internal validity
Internal validity is the extent to which a piece of evidence supports a claim about cause and effect, within the context of a particular study. It is one of the most important properties of scientific studies and is an important concept in reason ...
of a research study may bias the results and impact the validity of statistical conclusions reached. These threats to internal validity include unreliability of treatment implementation (lack of
standardization
Standardization (American English) or standardisation (British English) is the process of implementing and developing technical standards based on the consensus of different parties that include firms, users, interest groups, standards organiza ...
) or failing to control for
extraneous variables
A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...
.
See also
*
Internal validity
Internal validity is the extent to which a piece of evidence supports a claim about cause and effect, within the context of a particular study. It is one of the most important properties of scientific studies and is an important concept in reason ...
*
Statistical model validation
*
Test validity
Test validity is the extent to which a test (such as a chemical test, chemical, physical test, physical, or test (assessment), scholastic test) accuracy and precision, accurately measures what it is supposed to measurement, measure. In the fields ...
*
Validity (statistics)
Validity is the main extent to which a concept, conclusion, or measurement is well-founded and likely corresponds accurately to the real world. The word "valid" is derived from the Latin validus, meaning strong. The validity of a measurement tool ...
References
{{reflist
Validity (statistics)