Statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, like all mathematical disciplines, does not
infer
Inferences are steps in logical reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinctio ...
valid conclusions from nothing. Inferring interesting conclusions about real
statistical population
In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hyp ...
s almost always requires some background assumptions. Those assumptions must be made carefully, because incorrect assumptions can generate wildly inaccurate conclusions.
Here are some examples of statistical assumptions:
*
Independence
Independence is a condition of a nation, country, or state, in which residents and population, or some portion thereof, exercise self-government, and usually sovereignty, over its territory. The opposite of independence is the status of ...
of observations from each other (this assumption is an especially common error).
*Independence of observational error from potential
confounding
In causal inference, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlatio ...
effects.
*Exact or approximate
normality of observations (or errors).
*Linearity of graded responses to quantitative stimuli, e.g., in
linear regression
In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...
.
Classes of assumptions
There are two approaches to
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
: ''model-based inference'' and ''design-based inference''. Both approaches rely on some
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
to represent the data-generating process. In the model-based approach, the model is taken to be initially unknown, and one of the goals is to
select an appropriate model for inference. In the design-based approach, the model is taken to be known, and one of the goals is to ensure that the sample data are selected randomly enough for inference.
Statistical assumptions can be put into two classes, depending upon which approach to inference is used.
*Model-based assumptions. These include the following three types:
**Distributional assumptions. Where a
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
involves terms relating to
random errors
Observational error (or measurement error) is the difference between a measured value of a quantity and its unknown true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. Such errors are inherent in the measurement pr ...
, assumptions may be made about the
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
of these errors. In some cases, the distributional assumption relates to the observations themselves.
**Structural assumptions. Statistical relationships between variables are often modelled by equating one variable to a function of another (or several others), plus a
random error
Observational error (or measurement error) is the difference between a measured value of a quantity and its unknown true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. Such errors are inherent in the measurement ...
. Models often involve making a structural assumption about the form of the functional relationship, e.g. as in
linear regression
In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...
. This can be generalised to models involving relationships between underlying unobserved
latent variable
In statistics, latent variables (from Latin: present participle of ) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such '' latent va ...
s.
**Cross-variation assumptions. These assumptions involve the
joint probability distribution
A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGraw- ...
s of either the observations themselves or the random errors in a model. Simple models may include the assumption that observations or errors are
statistically independent
Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two event (probability theory), events are independent, statistically independent, or stochastically independent if, informally s ...
.
*Design-based assumptions. These relate to the way observations have been gathered, and often involve an assumption of
randomization
Randomization is a statistical process in which a random mechanism is employed to select a sample from a population or assign subjects to different groups.Oxford English Dictionary "randomization" The process is crucial in ensuring the random alloc ...
during
sampling.
[de Gruijter et al., 2006, §2.2.1]
The model-based approach is the most commonly used in statistical inference; the design-based approach is used mainly with
survey sampling
In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey.
The term " survey" may refer to many different types or techniques of observation. In survey sampling it most oft ...
. With the model-based approach, all the assumptions are effectively encoded in the model.
Checking assumptions
Given that the validity of any conclusion drawn from a statistical inference depends on the validity of the assumptions made, it is clearly important that those assumptions should be reviewed at some stage. Some instances—for example where
data are lacking—may require that researchers judge whether an assumption is reasonable. Researchers can expand this somewhat to consider what effect a departure from the assumptions might produce. Where more extensive data are available, various types of procedures for
statistical model validation are available—e.g. for
regression model validation.
Example: Independence of Observations
Scenario:
Imagine a study assessing the effectiveness of a new teaching method in multiple classrooms. If the classrooms are not treated as independent entities, but rather as a single unit, the assumption of independence is violated. Students within the same classroom may share common characteristics or experiences, leading to correlated observations.
Consequence:
Failure to account for this lack of independence may inflate the perceived impact of the teaching method, as the outcomes within a classroom may be more similar than assumed. This can result in an overestimation of the method's generalizability to diverse educational settings.
See also
*
Misuse of statistics
*
Robust statistics
Robust statistics are statistics that maintain their properties even if the underlying distributional assumptions are incorrect. Robust Statistics, statistical methods have been developed for many common problems, such as estimating location parame ...
*
Statistical hypothesis testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...
*
Statistical theory
The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics.
The theory covers approaches to statistical-decision problems and to statistica ...
Notes
References
*
Cox D. R. (2006), ''Principles of Statistical Inference'',
Cambridge University Press
Cambridge University Press was the university press of the University of Cambridge. Granted a letters patent by King Henry VIII in 1534, it was the oldest university press in the world. Cambridge University Press merged with Cambridge Assessme ...
.
* de Gruijter J., Brus D., Bierkens M., Knotters M. (2006), ''Sampling for Natural Resource Monitoring'',
Springer-Verlag
Springer Science+Business Media, commonly known as Springer, is a German multinational publishing company of books, e-books and peer-reviewed journals in science, humanities, technical and medical (STM) publishing.
Originally founded in 1842 in ...
.
*
*McPherson, G. (1990), ''Statistics in Scientific Investigation: Its Basis, Application and Interpretation'',
Springer-Verlag
Springer Science+Business Media, commonly known as Springer, is a German multinational publishing company of books, e-books and peer-reviewed journals in science, humanities, technical and medical (STM) publishing.
Originally founded in 1842 in ...
.
{{DEFAULTSORT:Statistical Assumption
Statistical theory