HOME

TheInfoList



OR:

Statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, like all mathematical disciplines, does not
infer Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that i ...
valid conclusions from nothing. Inferring interesting conclusions about real
statistical population In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypoth ...
s almost always requires some background assumptions. Those assumptions must be made carefully, because incorrect assumptions can generate wildly inaccurate conclusions. Here are some examples of statistical assumptions: *
Independence Independence is a condition of a person, nation, country, or state in which residents and population, or some portion thereof, exercise self-government, and usually sovereignty, over its territory. The opposite of independence is the stat ...
of observations from each other (this assumption is an especially common error). *Independence of observational error from potential
confounding In statistics, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Con ...
effects. *Exact or approximate normality of observations (or errors). *Linearity of graded responses to quantitative stimuli, e.g., in
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...
.


Classes of assumptions

There are two approaches to
statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properti ...
: ''model-based inference'' and ''design-based inference''. Both approaches rely on some
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form ...
to represent the data-generating process. In the model-based approach, the model is taken to be initially unknown, and one of the goals is to select an appropriate model for inference. In the design-based approach, the model is taken to be known, and one of the goals is to ensure that the sample data are selected randomly enough for inference. Statistical assumptions can be put into two classes, depending upon which approach to inference is used. *Model-based assumptions. These include the following three types: **Distributional assumptions. Where a
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form ...
involves terms relating to
random errors Observational error (or measurement error) is the difference between a measured value of a quantity and its true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. In statistics, an error is not necessarily a "mistake ...
, assumptions may be made about the
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
of these errors. In some cases, the distributional assumption relates to the observations themselves. **Structural assumptions. Statistical relationships between variables are often modelled by equating one variable to a function of another (or several others), plus a
random error Observational error (or measurement error) is the difference between a measured value of a quantity and its true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. In statistics, an error is not necessarily a "mistake" ...
. Models often involve making a structural assumption about the form of the functional relationship, e.g. as in
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...
. This can be generalised to models involving relationships between underlying unobserved
latent variable In statistics, latent variables (from Latin: present participle of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or me ...
s. **Cross-variation assumptions. These assumptions involve the
joint probability distribution Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...
s of either the observations themselves or the random errors in a model. Simple models may include the assumption that observations or errors are
statistically independent Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of o ...
. *Design-based assumptions. These relate to the way observations have been gathered, and often involve an assumption of
randomization Randomization is the process of making something random. Randomization is not haphazard; instead, a random process is a sequence of random variables describing a process whose outcomes do not follow a deterministic pattern, but follow an evolution d ...
during sampling.de Gruijter et al., 2006, §2.2.1 The model-based approach is the most commonly used in statistical inference; the design-based approach is used mainly with
survey sampling In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey. The term " survey" may refer to many different types or techniques of observation. In survey sampling it most ofte ...
. With the model-based approach, all the assumptions are effectively encoded in the model.


Checking assumptions

Given that the validity of any conclusion drawn from a statistical inference depends on the validity of the assumptions made, it is clearly important that those assumptions should be reviewed at some stage. Some instances—for example where data are lacking—may require that researchers judge whether an assumption is reasonable. Researchers can expand this somewhat to consider what effect a departure from the assumptions might produce. Where more extensive data are available, various types of procedures for
statistical model validation In statistics, model validation is the task of evaluating whether a chosen statistical model is appropriate or not. Oftentimes in statistical inference, inferences from models that appear to fit their data may be flukes, resulting in a misunderstan ...
are available—e.g. for
regression model validation In statistics, regression validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are acceptable as descriptions of the data. The validation ...
.


See also

* Misuse of statistics *
Robust statistics Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, su ...
*
Statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
*
Statistical theory The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistica ...


Notes


References

* Cox D. R. (2006), ''Principles of Statistical Inference'',
Cambridge University Press Cambridge University Press is the university press of the University of Cambridge. Granted letters patent by King Henry VIII in 1534, it is the oldest university press in the world. It is also the King's Printer. Cambridge University Pr ...
. * de Gruijter J., Brus D., Bierkens M., Knotters M. (2006), ''Sampling for Natural Resource Monitoring'',
Springer-Verlag Springer Science+Business Media, commonly known as Springer, is a German multinational publishing company of books, e-books and peer-reviewed journals in science, humanities, technical and medical (STM) publishing. Originally founded in 1842 ...
. * *McPherson, G. (1990), ''Statistics in Scientific Investigation: Its Basis, Application and Interpretation'',
Springer-Verlag Springer Science+Business Media, commonly known as Springer, is a German multinational publishing company of books, e-books and peer-reviewed journals in science, humanities, technical and medical (STM) publishing. Originally founded in 1842 ...
. {{DEFAULTSORT:Statistical Assumption Statistical theory