Uncomfortable science, as identified by
statistician
A statistician is a person who works with theoretical or applied statistics. The profession exists in both the private and public sectors.
It is common to combine statistical knowledge with expertise in other subjects, and statisticians may wor ...
John Tukey
John Wilder Tukey (; June 16, 1915 – July 26, 2000) was an American mathematician and statistician, best known for the development of the fast Fourier Transform (FFT) algorithm and box plot. The Tukey range test, the Tukey lambda distributi ...
,
comprises situations in which there is a need to draw an
inference
Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that ...
from a limited
sample
Sample or samples may refer to:
Base meaning
* Sample (statistics), a subset of a population – complete data set
* Sample (signal), a digital discrete sample of a continuous analog signal
* Sample (material), a specimen or small quantity of so ...
of
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
, where further samples influenced by the same
cause system
Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state, or object (''a'' ''cause'') contributes to the production of another event, process, state, or object (an ''effect'') where the cau ...
will not be available. More specifically, it involves the analysis of a finite natural phenomenon for which it is difficult to overcome the problem of using a common sample of
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
for both
exploratory data analysis
In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but prim ...
and
confirmatory data analysis
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
. This leads to the danger of
systematic bias
Systematic may refer to:
Science
* Short for systematic error
* Systematic fault
* Systematic bias, errors that are not determined by chance but are introduced by an inaccuracy (involving either the observation or measurement process) inherent ...
through
testing hypotheses suggested by the data
In statistics, hypotheses suggested by a given dataset, when tested with the same dataset that suggested them, are likely to be accepted even when they are not true. This is because circular reasoning (double dipping) would be involved: somethi ...
.
A typical example is
Bode's law, which provides a simple numerical rule for the distances of the
planet
A planet is a large, rounded astronomical body that is neither a star nor its remnant. The best available theory of planet formation is the nebular hypothesis, which posits that an interstellar cloud collapses out of a nebula to create a ...
s in the
Solar System
The Solar System Capitalization of the name varies. The International Astronomical Union, the authoritative body regarding astronomical nomenclature, specifies capitalizing the names of all individual astronomical objects but uses mixed "Solar ...
from the
Sun
The Sun is the star at the center of the Solar System. It is a nearly perfect ball of hot plasma, heated to incandescence by nuclear fusion reactions in its core. The Sun radiates this energy mainly as light, ultraviolet, and infrared rad ...
. Once the rule has been derived, through the
trial and error
Trial and error is a fundamental method of problem-solving characterized by repeated, varied attempts which are continued until success, or until the practicer stops trying.
According to W.H. Thorpe, the term was devised by C. Lloyd Morgan ( ...
matching of various rules with the observed
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
(exploratory data analysis), there are not enough planets remaining for a rigorous and independent test of the
hypothesis
A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can testable, test it. Scientists generally base scientific hypotheses on prev ...
(confirmatory data analysis). We have exhausted the natural
phenomena
A phenomenon ( : phenomena) is an observable event. The term came into its modern philosophical usage through Immanuel Kant, who contrasted it with the noumenon, which ''cannot'' be directly observed. Kant was heavily influenced by Gottfried ...
. The agreement between data and the numerical rule should be no surprise, as we have deliberately chosen the rule to match the data. If we are concerned about what Bode's law tells us about the cause system of planetary distribution then we demand confirmation that will not be available until better information about other planetary systems becomes available.
See also
*
Cosmic variance
The term ''cosmic variance'' is the statistical uncertainty inherent in observations of the universe at extreme distances. It has three different but closely related meanings:
* It is sometimes used, incorrectly, to mean sample variance – the d ...
for an extreme example of this phenomenon
*
Data mining
Bibliography
*
References
{{Reflist
Philosophy of statistics
Statistical hypothesis testing