Bayesian inference Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...

, plate notation is a method of representing variables that repeat in a

graphical model A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. Graphical models are commonly used in ...

. Instead of drawing each repeated variable individually, a plate or rectangle is used to group variables into a subgraph that repeat together, and a number is drawn on the plate to represent the number of repetitions of the subgraph in the plate. The assumptions are that the subgraph is duplicated that many times, the variables in the subgraph are indexed by the repetition number, and any links that cross a plate boundary are replicated once for each subgraph repetition.

Example

In this example, we consider

Latent Dirichlet allocation In natural language processing, latent Dirichlet allocation (LDA) is a Bayesian network (and, therefore, a generative statistical model) for modeling automatically extracted topics in textual corpora. The LDA is an example of a Bayesian topic ...

, a

Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Whi ...

that models how documents in a corpus are topically related. There are two variables not in any plate; ''α'' is the parameter of the uniform

Dirichlet Johann Peter Gustav Lejeune Dirichlet (; ; 13 February 1805 – 5 May 1859) was a German mathematician. In number theory, he proved special cases of Fermat's last theorem and created analytic number theory. In Mathematical analysis, analysis, h ...

prior on the per-document topic distributions, and ''β'' is the parameter of the uniform Dirichlet prior on the per-topic word distribution. The outermost plate represents all the variables related to a specific document, including

\theta_i

, the topic distribution for document ''i''. The ''M'' in the corner of the plate indicates that the variables inside are repeated ''M'' times, once for each document. The inner plate represents the variables associated with each of the

N_i

words in document ''i'':

z_

is the topic distribution for the ''j''th word in document ''i'', and

w_

is the actual word used. The ''N'' in the corner represents the repetition of the variables in the inner plate

N_j

times, once for each word in document ''i''. The circle representing the individual words is shaded, indicating that each

w_

observable In physics, an observable is a physical property or physical quantity that can be measured. In classical mechanics, an observable is a real-valued "function" on the set of all possible system states, e.g., position and momentum. In quantum ...

, and the other circles are empty, indicating that the other variables are

latent variable In statistics, latent variables (from Latin: present participle of ) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such '' latent va ...

s. The directed edges between variables indicate dependencies between the variables: for example, each

w_

depends on

z_

and ''β''.

Extensions

A number of extensions have been created by various authors to express more information than simply the conditional relationships. However, few of these have become standard. Perhaps the most commonly used extension is to use rectangles in place of circles to indicate non-random variables—either parameters to be computed, hyperparameters given a fixed value (or computed through

empirical Bayes Empirical Bayes methods are procedures for statistical inference in which the prior probability distribution is estimated from the data. This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed be ...

), or variables whose values are computed deterministically from a random variable. The diagram on the right shows a few more non-standard conventions used in some articles in Wikipedia (e.g.

variational Bayes Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables (usually ...

): *Variables that are actually

random vector In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge ...

s are indicated by putting the vector size in brackets in the middle of the node. *Variables that are actually

random matrices In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all of its entries are sampled randomly from a probability distribution. Random matrix theory (RMT) is the ...

are similarly indicated by putting the matrix size in brackets in the middle of the node, with commas separating row size from column size. *

Categorical variable In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or ...

s are indicated by placing their size (without a bracket) in the middle of the node. *Categorical variables that act as "switches", and which pick one or more other random variables to condition on from a large set of such variables (e.g. mixture components), are indicated with a special type of arrow containing a squiggly line and ending in a T junction. *Boldface is consistently used for vector or matrix nodes (but not categorical nodes).

Software implementation

Plate notation has been implemented in various

TeX Tex, TeX, TEX, may refer to: People and fictional characters * Tex (nickname), a list of people and fictional characters with the nickname * Tex Earnhardt (1930–2020), U.S. businessman * Joe Tex (1933–1982), stage name of American soul singer ...

LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latices are found in nature, but synthetic latices are common as well. In nature, latex is found as a wikt:milky, milky fluid, which is present in 10% of all floweri ...

drawing packages, but also as part of graphical user interfaces to Bayesian statistics programs such as BUGS and BayesiaLab and

PyMC PyMC (formerly known as PyMC3) is a probabilistic programming language written in Python. It can be used for Bayesian statistical modeling and probabilistic machine learning. PyMC performs inference based on advanced Markov chain Monte Carlo an ...

References

{{DEFAULTSORT:Plate Notation Graphical models Bayesian networks Mathematical notation