Statistical conclusion validity is the degree to which conclusions about the relationship among variables based on the data are correct or "reasonable". This began as being solely about whether the statistical conclusion about the relationship of the variables was correct, but now there is a movement towards moving to "reasonable" conclusions that use: quantitative, statistical, and qualitative data. Fundamentally, two types of errors can occur:

type I Type 1 or Type I or ''variant'', may refer to: Health *Diabetes mellitus type 1 (also known as "Type 1 Diabetes"), insulin-dependent diabetes * Type I female genital mutilation * Type 1 personality *Type I hypersensitivity (or immediate hypersensit ...

(finding a difference or correlation when none exists) and type II (finding no difference or correlation when one exists). Statistical conclusion validity concerns the qualities of the study that make these types of errors more likely. Statistical conclusion validity involves ensuring the use of adequate sampling procedures, appropriate statistical tests, and reliable measurement procedures.

Common threats

The most common threats to statistical conclusion validity are:

Low statistical power

Power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may ...

is the probability of correctly rejecting the

null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...

when it is false (inverse of the type II error rate). Experiments with low power have a higher probability of incorrectly accepting the null hypothesis—that is, committing a type II error and concluding that there is no effect when there actually is (I.e. there is real covariation between the cause and effect). Low power occurs when the sample size of the study is too small given other factors (small

effect sizes In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, th ...

, large group variability, unreliable measures, etc.).

Violated assumptions of the test statistics

Most statistical tests (particularly

inferential statistics Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...

) involve assumptions about the data that make the analysis suitable for testing a hypothesis. Violating the assumptions of statistical tests can lead to incorrect inferences about the cause–effect relationship. The

robustness Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...

of a test indicates how sensitive it is to violations. Violations of assumptions may make tests more or less likely to make type I or II errors.

Dredging and the error rate problem

Each hypothesis test involves a set risk of a type I error (the alpha rate). If a researcher searches or "

dredges Dredging is the excavation of material from a water environment. Possible reasons for dredging include improving existing water features; reshaping land and water features to alter drainage, navigability, and commercial use; constructing d ...

" through their data, testing many different hypotheses to find a significant effect, they are inflating their type I error rate. The more the researcher repeatedly tests the data, the higher the chance of observing a type I error and making an incorrect inference about the existence of a relationship.

Unreliability of measures

If the dependent and/or independent variable(s) are not measured reliably (i.e. with large amounts of

measurement error Observational error (or measurement error) is the difference between a measured value of a quantity and its true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. In statistics, an error is not necessarily a "mistak ...

), incorrect conclusions can be drawn.

Restriction of range

Restriction of range, such as floor and ceiling effects or

selection effects Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population int ...

, reduce the power of the experiment, and increase the chance of a type II error. This is because

correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statisti ...

s are attenuated (weakened) by reduced variability (see, for example, the equation for the

Pearson product-moment correlation coefficient In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficien ...

which uses score variance in its estimation).

Heterogeneity of the units under study

Greater heterogeneity of individuals participating in the study can also impact interpretations of results by increasing the variance of results or obscuring true relationships (see also

sampling error In statistics, sampling errors are incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics of the sample ...

). This obscures possible interactions between the characteristics of the units and the cause–effect relationship.

Threats to internal validity

Any effect that can impact the

internal validity Internal validity is the extent to which a piece of evidence supports a claim about cause and effect, within the context of a particular study. It is one of the most important properties of scientific studies and is an important concept in reason ...

of a research study may bias the results and impact the validity of statistical conclusions reached. These threats to internal validity include unreliability of treatment implementation (lack of

standardization Standardization or standardisation is the process of implementing and developing technical standards based on the consensus of different parties that include firms, users, interest groups, standards organizations and governments. Standardization ...

) or failing to control for

extraneous variables Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...

References

{{reflist Validity (statistics)