Bayesian hierarchical modelling is a
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, ...
written in multiple levels (hierarchical form) that estimates the
parameters
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
of the
posterior distribution
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
using the
Bayesian method.
[Allenby, Rossi, McCulloch (January 2005)]
"Hierarchical Bayes Model: A Practitioner’s Guide"
Journal of Bayesian Applications in Marketing
pp. 1–4. Retrieved 26 April 2014, p. 3 The sub-models combine to form the hierarchical model, and
Bayes' theorem
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For exa ...
is used to integrate them with the observed data and account for all the uncertainty that is present. The result of this integration is the posterior distribution, also known as the updated probability estimate, as additional evidence on the
prior distribution
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...
is acquired.
Frequentist statistics
Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pr ...
may yield conclusions seemingly incompatible with those offered by Bayesian statistics due to the Bayesian treatment of the parameters as
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
s and its use of subjective information in establishing assumptions on these parameters. As the approaches answer different questions the formal results aren't technically contradictory but the two approaches disagree over which answer is relevant to particular applications. Bayesians argue that relevant information regarding decision-making and updating beliefs cannot be ignored and that hierarchical modeling has the potential to overrule classical methods in applications where respondents give multiple observational data. Moreover, the model has proven to be
robust
Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...
, with the posterior distribution less sensitive to the more flexible hierarchical priors.
Hierarchical modeling is used when information is available on several different levels of observational units. For example, in epidemiological modeling to describe infection trajectories for multiple countries, observational units are countries, and each country has its own temporal profile of daily infected cases. In
decline curve analysis
Decline curve analysis is a means of predicting future oil well or gas well production based on past production history. Production decline curve analysis is a traditional means of identifying well production problems and predicting well performan ...
to describe oil or gas production decline curve for multiple wells, observational units are oil or gas wells in a reservoir region, and each well has each own temporal profile of oil or gas production rates (usually, barrels per month). Data structure for the hierarchical modeling retains nested data structure. The hierarchical form of analysis and organization helps in the understanding of multiparameter problems and also plays an important role in developing computational strategies.
Philosophy
Statistical methods and models commonly involve multiple parameters that can be regarded as related or connected in such a way that the problem implies a dependence of the joint probability model for these parameters.
Individual degrees of belief, expressed in the form of probabilities, come with uncertainty.
Amidst this is the change of the degrees of belief over time. As was stated by Professor
José M. Bernardo and Professor
Adrian F. Smith, “The actuality of the learning process consists in the evolution of individual and subjective beliefs about the reality.” These subjective probabilities are more directly involved in the mind rather than the physical probabilities.
[ Hence, it is with this need of updating beliefs that Bayesians have formulated an alternative statistical model which takes into account the prior occurrence of a particular event.
]
Bayes' theorem
The assumed occurrence of a real-world event will typically modify preferences between certain options. This is done by modifying the degrees of belief attached, by an individual, to the events defining the options.
Suppose in a study of the effectiveness of cardiac treatments, with the patients in hospital ''j'' having survival probability , the survival probability will be updated with the occurrence of ''y'', the event in which a controversial serum is created which, as believed by some, increases survival in cardiac patients.
In order to make updated probability statements about , given the occurrence of event ''y'', we must begin with a model providing a joint probability distribution
Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considere ...
for and ''y''. This can be written as a product of the two distributions that are often referred to as the prior distribution and the sampling distribution
In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given random-sample-based statistic. If an arbitrarily large number of samples, each involving multiple observations (data points), were se ...
respectively:
:
Using the basic property of conditional probability
In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occu ...
, the posterior distribution will yield:
:
This equation, showing the relationship between the conditional probability and the individual events, is known as Bayes' theorem. This simple expression encapsulates the technical core of Bayesian inference which aims to incorporate the updated belief, , in appropriate and solvable ways.
Exchangeability
The usual starting point of a statistical analysis is the assumption that the ''n'' values are exchangeable. If no information – other than data ''y'' – is available to distinguish any of the ’s from any others, and no ordering or grouping of the parameters can be made, one must assume symmetry among the parameters in their prior distribution. This symmetry is represented probabilistically by exchangeability. Generally, it is useful and appropriate to model data from an exchangeable distribution as independently and identically distributed, given some unknown parameter vector , with distribution .
Finite exchangeability
For a fixed number ''n'', the set is exchangeable if the joint probability is invariant under permutation
In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or p ...
s of the indices. That is, for every permutation or of (1, 2, …, ''n''),
Following is an exchangeable, but not independent and identical (iid), example:
Consider an urn with a red ball and a blue ball inside, with probability of drawing either. Balls are drawn without replacement, i.e. after one ball is drawn from the ''n'' balls, there will be ''n'' − 1 remaining balls left for the next draw.
:
Since the probability of selecting a red ball in the first draw and a blue ball in the second draw is equal to the probability of selecting a blue ball on the first draw and a red on the second draw, both of which are equal to 1/2 (i.e.