List of unsolved problems in statistics
   HOME

TheInfoList



OR:

There are many longstanding
unsolved problems in mathematics Many mathematical problems have been stated but not yet solved. These problems come from many areas of mathematics, such as theoretical physics, computer science, algebra, analysis, combinatorics, algebraic, differential, discrete and Eucli ...
for which a solution has still not yet been found. The notable unsolved problems in statistics are generally of a different flavor; according to John Tukey, "difficulties in identifying problems have delayed statistics far more than difficulties in solving problems." A list of "one or two open problems" (in fact 22 of them) was given by David Cox.


Inference and testing

* How to detect and correct for systematic errors, especially in sciences where
random error Observational error (or measurement error) is the difference between a measured value of a quantity and its true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. In statistics, an error is not necessarily a "mistake" ...
s are large (a situation Tukey termed uncomfortable science). * The Graybill–Deal estimator is often used to estimate the common mean of two normal populations with unknown and possibly unequal variances. Though this estimator is generally unbiased, its admissibility remains to be shown. *
Meta-analysis A meta-analysis is a statistical analysis that combines the results of multiple scientific studies. Meta-analyses can be performed when there are multiple scientific studies addressing the same question, with each individual study reporting me ...
: Though independent p-values can be combined using
Fisher's method In statistics, Fisher's method, also known as Fisher's combined probability test, is a technique for data fusion or "meta-analysis" (analysis of analyses). It was developed by and named for Ronald Fisher. In its basic form, it is used to combi ...
, techniques are still being developed to handle the case of dependent p-values. *
Behrens–Fisher problem In statistics, the Behrens–Fisher problem, named after Walter Behrens and Ronald Fisher, is the problem of interval estimation and hypothesis testing concerning the difference between the means of two normally distributed populations when t ...
:
Yuri Linnik Yuri Vladimirovich Linnik (russian: Ю́рий Влади́мирович Ли́нник; January 8, 1915 – June 30, 1972) was a Soviet mathematician active in number theory, probability theory and mathematical statistics. Linnik was born in B ...
showed in 1966 that there is no
uniformly most powerful test In statistical hypothesis testing, a uniformly most powerful (UMP) test is a hypothesis test which has the greatest power 1 - \beta among all possible tests of a given size ''α''. For example, according to the Neyman–Pearson lemma, the likelih ...
for the difference of two means when the variances are unknown and possibly unequal. That is, there is no
exact test In statistics, an exact (significance) test is a test such that if the null hypothesis is true, then all assumptions made during the derivation of the distribution of the test statistic are met. Using an exact test provides a significance test ...
(meaning that, if the means are in fact equal, one that rejects the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
with probability exactly α) that is also the most powerful for all values of the variances (which are thus
nuisance parameter Nuisance (from archaic ''nocence'', through Fr. ''noisance'', ''nuisance'', from Lat. ''nocere'', "to hurt") is a common law tort. It means that which causes offence, annoyance, trouble or injury. A nuisance can be either public (also "common") ...
s). Though there are many approximate solutions (such as
Welch's t-test In statistics, Welch's ''t''-test, or unequal variances ''t''-test, is a two-sample location test which is used to test the hypothesis that two populations have equal means. It is named for its creator, Bernard Lewis Welch, is an adaptation of ...
), the problem continues to attract attention as one of the classic problems in statistics. *
Multiple comparisons In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values. The more inferences ...
: There are various ways to adjust p-values to compensate for the simultaneous or
sequential testing In statistics, sequential analysis or sequential hypothesis testing is statistical analysis where the sample size is not fixed in advance. Instead data are evaluated as they are collected, and further sampling is stopped in accordance with a pr ...
of hypothesis. Of particular interest is how to simultaneously control the overall error rate, preserve statistical power, and incorporate the dependence between tests into the adjustment. These issues are especially relevant when the number of simultaneous tests can be very large, as is increasingly the case in the analysis of data from DNA microarrays. *
Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
: A list of open problems in Bayesian statistics has been proposed.


Experimental design

* As the theory of
Latin square In combinatorics and in experimental design, a Latin square is an ''n'' × ''n'' array filled with ''n'' different symbols, each occurring exactly once in each row and exactly once in each column. An example of a 3×3 Latin sq ...
s is a cornerstone in the
design of experiments The design of experiments (DOE, DOX, or experimental design) is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associ ...
, solving the
problems in Latin squares In mathematics, the theory of Latin squares is an active research area with many open problems. As in other areas of mathematics, such problems are often made public at professional conferences and meetings. Problems posed here appeared in, for ins ...
could have immediate applicability to experimental design.


Problems of a more philosophical nature

*
Sampling of species problem Sampling may refer to: *Sampling (signal processing), converting a continuous signal into a discrete signal * Sampling (graphics), converting continuous colors into discrete color components *Sampling (music), the reuse of a sound recording in ano ...
: How is a probability updated when there is unanticipated new data? *
Doomsday argument The Doomsday Argument (DA), or Carter catastrophe, is a probabilistic argument that claims to predict the future population of the human species, based on an estimation of the number of humans born to date. The Doomsday argument was originally ...
: How valid is the probabilistic argument that claims to
predict A prediction (Latin ''præ-'', "before," and ''dicere'', "to say"), or forecast, is a statement about a future event or data. They are often, but not always, based upon experience or knowledge. There is no universal agreement about the exact ...
the future lifetime of the
human race Humans (''Homo sapiens'') are the most abundant and widespread species of primate, characterized by bipedalism and exceptional cognitive skills due to a large and complex brain. This has enabled the development of advanced tools, culture, ...
given only an estimate of the total number of humans born so far? * Exchange paradox: Issues arise within the subjectivistic interpretation of
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
; more specifically within
Bayesian decision theory In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function (i.e., the posterior expected loss). Equivalently, it maximizes the po ...
. This is still an open problem among the subjectivists as no consensus has been reached yet. Examples include: ** The
two envelopes problem The two envelopes problem, also known as the exchange paradox, is a paradox in probability theory. It is of special interest in decision theory, and for the Bayesian interpretation of probability theory. It is a variant of an older problem known ...
** The necktie paradox * Sunrise problem: What is the probability that the sun will rise tomorrow? Very different answers arise depending on the methods used and assumptions made.


Notes


References

* * {{unsolved problems Statistics
Unsolved problems List of unsolved problems may refer to several notable conjectures or open problems in various academic fields: Natural sciences, engineering and medicine * Unsolved problems in astronomy * Unsolved problems in biology * Unsolved problems in c ...
*Statistics