Bayesian statistics is a theory in the field of
statistics
Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
based on the
Bayesian interpretation of probability where
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
expresses a ''degree of belief'' in an
event
Event may refer to:
Gatherings of people
* Ceremony, an event of ritual significance, performed on a special occasion
* Convention (meeting), a gathering of individuals engaged in some common interest
* Event management, the organization of ev ...
. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other
interpretations of probability, such as the
frequentist interpretation that views probability as the
limit
Limit or Limits may refer to:
Arts and media
* ''Limit'' (manga), a manga by Keiko Suenobu
* ''Limit'' (film), a South Korean film
* Limit (music), a way to characterize harmony
* "Limit" (song), a 2016 single by Luna Sea
* "Limits", a 2019 ...
of the relative frequency of an event after many trials.
Bayesian statistical methods use
Bayes' theorem
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...
to compute and update probabilities after obtaining new data. Bayes' theorem describes the
conditional probability of an event based on data as well as prior information or beliefs about the event or conditions related to the event.
For example, in
Bayesian inference
Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and ...
, Bayes' theorem can be used to estimate the parameters of a
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
or
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form ...
. Since Bayesian statistics treats probability as a degree of belief, Bayes' theorem can directly assign a probability distribution that quantifies the belief to the parameter or set of parameters.
Bayesian statistics is named after
Thomas Bayes, who formulated a specific case of Bayes' theorem in
a paper published in 1763. In several papers spanning from the late 18th to the early 19th centuries,
Pierre-Simon Laplace
Pierre-Simon, marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French scholar and polymath whose work was important to the development of engineering, mathematics, statistics, physics, astronomy, and philosophy. He summarize ...
developed the Bayesian interpretation of probability. Laplace used methods that would now be considered Bayesian to solve a number of statistical problems. Many Bayesian methods were developed by later authors, but the term was not commonly used to describe such methods until the 1950s. During much of the 20th century, Bayesian methods were viewed unfavorably by many statisticians due to philosophical and practical considerations. Many Bayesian methods required much computation to complete, and most methods that were widely used during the century were based on the frequentist interpretation. However, with the advent of powerful computers and new
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
s like
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
, Bayesian methods have seen increasing use within statistics in the 21st century.
Bayes' theorem
Bayes' theorem is used in Bayesian methods to update probabilities, which are degrees of belief, after obtaining new data. Given two events
and
, the conditional probability of
given that
is true is expressed as follows:
where
. Although Bayes' theorem is a fundamental result of
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
, it has a specific interpretation in Bayesian statistics. In the above equation,
usually represents a
proposition
In logic and linguistics, a proposition is the meaning of a declarative sentence. In philosophy, " meaning" is understood to be a non-linguistic entity which is shared by all sentences with the same meaning. Equivalently, a proposition is the no ...
(such as the statement that a coin lands on heads fifty percent of the time) and
represents the evidence, or new data that is to be taken into account (such as the result of a series of coin flips).
is the
prior probability of
which expresses one's beliefs about
before evidence is taken into account. The prior probability may also quantify prior knowledge or information about
.
is the
likelihood function
The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
, which can be interpreted as the probability of the evidence
given that
is true. The likelihood quantifies the extent to which the evidence
supports the proposition
.
is the
posterior probability
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
, the probability of the proposition
after taking the evidence
into account. Essentially, Bayes' theorem updates one's prior beliefs
after considering the new evidence
.
The probability of the evidence
can be calculated using the
law of total probability. If
is a
partition of the
sample space, which is the set of all
outcomes of an experiment, then,
When there are an infinite number of outcomes, it is necessary to
integrate over all outcomes to calculate
using the law of total probability. Often,
is difficult to calculate as the calculation would involve sums or integrals that would be time-consuming to evaluate, so often only the product of the prior and likelihood is considered, since the evidence does not change in the same analysis. The posterior is proportional to this product:
The
maximum a posteriori, which is the
mode
Mode ( la, modus meaning "manner, tune, measure, due measure, rhythm, melody") may refer to:
Arts and entertainment
* '' MO''D''E (magazine)'', a defunct U.S. women's fashion magazine
* ''Mode'' magazine, a fictional fashion magazine which is ...
of the posterior and is often computed in Bayesian statistics using
mathematical optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
methods, remains the same. The posterior can be approximated even without computing the exact value of
with methods such as Markov chain Monte Carlo or
variational Bayesian methods.
Outline of Bayesian methods
The general set of statistical techniques can be divided into a number of activities, many of which have special Bayesian versions.
Bayesian inference
Bayesian inference refers to
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properti ...
where uncertainty in inferences is quantified using probability. In classical
frequentist inference, model
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
s and hypotheses are considered to be fixed. Probabilities are not assigned to parameters or hypotheses in frequentist inference. For example, it would not make sense in frequentist inference to directly assign a probability to an event that can only happen once, such as the result of the next flip of a fair coin. However, it would make sense to state that the proportion of heads
approaches one-half as the number of coin flips increases.
Statistical models
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, ...
specify a set of statistical assumptions and processes that represent how the sample data are generated. Statistical models have a number of parameters that can be modified. For example, a coin can be represented as samples from a
Bernoulli distribution
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabi ...
, which models two possible outcomes. The Bernoulli distribution has a single parameter equal to the probability of one outcome, which in most cases is the probability of landing on heads. Devising a good model for the data is central in Bayesian inference. In most cases, models only approximate the true process, and may not take into account certain factors influencing the data.
In Bayesian inference, probabilities can be assigned to model parameters. Parameters can be represented as
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s. Bayesian inference uses Bayes' theorem to update probabilities after more evidence is obtained or known.
Statistical modeling
The formulation of
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form ...
s using Bayesian statistics has the identifying feature of requiring the specification of
prior distribution
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...
s for any unknown parameters. Indeed, parameters of prior distributions may themselves have prior distributions, leading to
Bayesian hierarchical modeling,
[Hajiramezanali, E. & Dadaneh, S. Z. & Karbalayghareh, A. & Zhou, Z. & Qian, X. Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data. 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada. ] also known as multi-level modeling. A special case is
Bayesian networks
A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Ba ...
.
For conducting a Bayesian statistical analysis, best practices are discussed by van de Schoot et al.
For reporting the results of a Bayesian statistical analysis, Bayesian analysis reporting guidelines (BARG) are provided in an open-access article by
John K. Kruschke.
Design of experiments
The
Bayesian design of experiments includes a concept called 'influence of prior beliefs'. This approach uses
sequential analysis techniques to include the outcome of earlier experiments in the design of the next experiment. This is achieved by updating 'beliefs' through the use of prior and
posterior distribution. This allows the design of experiments to make good use of resources of all types. An example of this is the
multi-armed bandit problem.
Exploratory analysis of Bayesian models
Exploratory analysis of Bayesian models is an adaptation or extension of the
exploratory data analysis approach to the needs and peculiarities of Bayesian modeling. In the words of Persi Diaconis:
The
inference process generates a posterior distribution, which has a central role in Bayesian statistics, together with other distributions like the posterior predictive distribution and the prior predictive distribution. The correct visualization, analysis, and interpretation of these distributions is key to properly answer the questions that motivate the inference process.
When working with Bayesian models there are a series of related tasks that need to be addressed besides inference itself:
* Diagnoses of the quality of the inference, this is needed when using numerical methods such as
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
techniques
* Model criticism, including evaluations of both model assumptions and model predictions
* Comparison of models, including model selection or model averaging
* Preparation of the results for a particular audience
All these tasks are part of the Exploratory analysis of Bayesian models approach and successfully performing them is central to the iterative and interactive modeling process. These tasks require both numerical and visual summaries.
See also
*
Bayesian epistemology
Bayesian epistemology is a formal approach to various topics in epistemology that has its roots in Thomas Bayes' work in the field of probability theory. One advantage of its formal method in contrast to traditional epistemology is that its conc ...
* For a list of mathematical logic notation used in this article
**
Notation in probability and statistics
**
List of logic symbols
References
Further reading
*
*
*
*
*
*
*''Johnson, Alicia A.; Ott, Miles Q.; Dogucu, Mine. (2022)'
Bayes Rules! An Introduction to Applied Bayesian Modeling Chapman and Hall
ISBN 9780367255398
External links
*
*
*
Bayesian statistics David Spiegelhalter, Kenneth Rice
Scholarpedia
''Scholarpedia'' is an English-language wiki-based online encyclopedia with features commonly associated with open-access online academic journals, which aims to have quality content in science and medicine.
''Scholarpedia'' articles are writ ...
4(8):5230.
doi:10.4249/scholarpedia.5230
Bayesian modeling bookand examples available for downloading.
*
Bayesian A/B Testing CalculatorDynamic Yield
{{Authority control