A restricted Boltzmann machine (RBM) is a

generative Generative may refer to: * Generative actor, a person who instigates social change * Generative art, art that has been created using an autonomous system that is frequently, but not necessarily, implemented using a computer * Generative music, ...

stochastic Stochastic (, ) refers to the property of being well described by a random probability distribution. Although stochasticity and randomness are distinct in that the former refers to a modeling approach and the latter refers to phenomena themselve ...

artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected units ...

that can learn a

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...

over its set of inputs. RBMs were initially invented under the name Harmonium by Paul Smolensky in 1986, and rose to prominence after Geoffrey Hinton and collaborators invented fast learning algorithms for them in the mid-2000. RBMs have found applications in

dimensionality reduction Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...

classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...

, collaborative filtering, feature learning,

topic model In statistics and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden ...

lingRuslan Salakhutdinov and Geoffrey Hinton (2010)
Replicated softmax: an undirected topic model
''

Neural Information Processing Systems The Conference and Workshop on Neural Information Processing Systems (abbreviated as NeurIPS and formerly NIPS) is a machine learning and computational neuroscience conference held every December. The conference is currently a double-track mee ...

'' 23. and even many body quantum mechanics. They can be trained in either supervised or unsupervised ways, depending on the task. As their name implies, RBMs are a variant of Boltzmann machines, with the restriction that their

neurons A neuron, neurone, or nerve cell is an electrically excitable cell that communicates with other cells via specialized connections called synapses. The neuron is the main component of nervous tissue in all animals except sponges and placozoa. ...

must form a

bipartite graph In the mathematical field of graph theory, a bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets U and V, that is every edge connects a vertex in U to one in V. Vertex sets U and V ar ...

: a pair of nodes from each of the two groups of units (commonly referred to as the "visible" and "hidden" units respectively) may have a symmetric connection between them; and there are no connections between nodes within a group. By contrast, "unrestricted" Boltzmann machines may have connections between hidden units. This restriction allows for more efficient training

algorithms In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

than are available for the general class of Boltzmann machines, in particular the gradient-based contrastive divergence algorithm.Miguel Á. Carreira-Perpiñán and Geoffrey Hinton (2005)
On contrastive divergence learning
''Artificial Intelligence and Statistics''. Restricted Boltzmann machines can also be used in deep learning networks. In particular, deep belief networks can be formed by "stacking" RBMs and optionally fine-tuning the resulting deep network with gradient descent and backpropagation.

Structure

The standard type of RBM has binary-valued (

Boolean Any kind of logic, function, expression, or theory based on the work of George Boole is considered Boolean. Related to this, "Boolean" may refer to: * Boolean data type, a form of data with only two possible values (usually "true" and "false" ...

) hidden and visible units, and consists of a

matrix Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** '' The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...

of weights

W

of size

m\times n

. Each weight element

(w_)

of the matrix is associated with the connection between the visible (input) unit

v_i

and the hidden unit

h_j

. In addition, there are bias weights (offsets)

a_i

for

v_i

and

b_j

for

h_j

. Given the weights and biases, the ''energy'' of a configuration (pair of boolean vectors) is defined as :

E(v,h) = -\sum_i a_i v_i - \sum_j b_j h_j -\sum_i \sum_j v_i w_ h_j

or, in matrix notation, :

E(v,h) = -a^ v - b^ h -v^ W h.

This energy function is analogous to that of a Hopfield network. As with general Boltzmann machines, the

joint probability distribution Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considere ...

for the visible and hidden vectors is defined in terms of the energy function as follows,Geoffrey Hinton (2010).
A Practical Guide to Training Restricted Boltzmann Machines
'. UTML TR 2010–003, University of Toronto. :

P(v,h) = \frac e^

where

Z

is a partition function defined as the sum of

e^

over all possible configurations, which can be interpreted as a normalizing constant to ensure that the probabilities sum to 1. The marginal probability of a visible vector is the sum of

P(v,h)

over all possible hidden layer configurations, :

P(v) = \frac \sum_ e^

, and vice versa. Since the underlying graph structure of the RBM is

bipartite Bipartite may refer to: * 2 (number) * Bipartite (theology), a philosophical term describing the human duality of body and soul * Bipartite graph, in mathematics, a graph in which the vertices are partitioned into two sets and every edge has an en ...

(meaning there is no intra-layer connections), the hidden unit activations are

mutually independent Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of o ...

given the visible unit activations. Conversely, the visible unit activations are mutually independent given the hidden unit activations. That is, for ''m'' visible units and ''n'' hidden units, the

conditional probability In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occu ...

of a configuration of the visible units , given a configuration of the hidden units , is :

P(v, h) = \prod_^m P(v_i, h)

. Conversely, the conditional probability of given is :

P(h, v) = \prod_^n P(h_j, v)

. The individual activation probabilities are given by :

P(h_j=1, v) = \sigma \left(b_j + \sum_^m w_ v_i \right)

and

\,P(v_i=1, h) = \sigma \left(a_i + \sum_^n w_ h_j \right)

where

\sigma

denotes the logistic sigmoid. The visible units of Restricted Boltzmann Machine can be multinomial, although the hidden units are Bernoulli. In this case, the logistic function for visible units is replaced by the softmax function :

P(v_i^k = 1, h) = \frac

where ''K'' is the number of discrete values that the visible values have. They are applied in topic modeling, and

recommender system A recommender system, or a recommendation system (sometimes replacing 'system' with a synonym such as platform or engine), is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular ...

Relation to other models

Restricted Boltzmann machines are a special case of Boltzmann machines and Markov random fields.Asja Fischer and Christian Igel
Training Restricted Boltzmann Machines: An Introduction
. Pattern Recognition 47, pp. 25-39, 2014 Their

graphical model A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability ...

corresponds to that of

factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...

Training algorithm

Restricted Boltzmann machines are trained to maximize the product of probabilities assigned to some training set

V

(a matrix, each row of which is treated as a visible vector

v

), :

\arg\max_W \prod_ P(v)

or equivalently, to maximize the expected log probability of a training sample

v

selected randomly from

V

: :

\arg\max_W \mathbb \left \log P(v)\right /math>

The algorithm most often used to train RBMs, that is, to optimize the weight matrix W, is the contrastive divergence (CD) algorithm due to Hinton, originally developed to train PoE (product of experts) models. The algorithm performs

Gibbs sampling In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is diff ...

and is used inside a gradient descent procedure (similar to the way backpropagation is used inside such a procedure when training feedforward neural nets) to compute weight update. The basic, single-step contrastive divergence (CD-1) procedure for a single sample can be summarized as follows: # Take a training sample , compute the probabilities of the hidden units and sample a hidden activation vector from this probability distribution. # Compute the

outer product In linear algebra, the outer product of two coordinate vectors is a matrix. If the two vectors have dimensions ''n'' and ''m'', then their outer product is an ''n'' × ''m'' matrix. More generally, given two tensors (multidimensional arrays of n ...

of and and call this the ''positive gradient''. # From , sample a reconstruction of the visible units, then resample the hidden activations from this. (Gibbs sampling step) # Compute the

of and and call this the ''negative gradient''. # Let the update to the weight matrix

W

be the positive gradient minus the negative gradient, times some learning rate:

\Delta W = \epsilon (vh^\mathsf - v'h'^\mathsf)

. # Update the biases and analogously:

\Delta a = \epsilon (v - v')

\Delta b = \epsilon (h - h')

. A Practical Guide to Training RBMs written by Hinton can be found on his homepage.

Stacked Restricted Boltzmann Machine

*The difference between the Stacked Restricted Boltzmann Machines and RBM is that RBM has lateral connections within a layer that are prohibited to make analysis tractable. On the other hand, the Stacked Boltzmann consists of a combination of an unsupervised three-layer network with symmetric weights and a supervised fine-tuned top layer for recognizing three classes. *The usage of Stacked Boltzmann is to understand Natural languages, retrieve documents, image generation, and classification. These functions are trained with unsupervised pre-training and/or supervised fine-tuning. Unlike the undirected symmetric top layer, with a two-way unsymmetric layer for connection for RBM. The restricted Boltzmann's connection is three-layers with asymmetric weights, and two networks are combined into one. *Stacked Boltzmann does share similarities with RBM, the neuron for Stacked Boltzmann is a stochastic binary Hopfield neuron, which is the same as the Restricted Boltzmann Machine. The energy from both Restricted Boltzmann and RBM is given by Gibb's probability measure:

E = -\frac12\sum_+\sum_i

. The training process of Restricted Boltzmann is similar to RBM. Restricted Boltzmann train one layer at a time and approximate equilibrium state with a 3-segment pass, not performing back propagation. Restricted Boltzmann uses both supervised and unsupervised on different RBM for pre-training for classification and recognition. The training uses contrastive divergence with Gibbs sampling: Δw_ij = e*(p_ij - p'_ij) *The restricted Boltzmann's strength is it performs a non-linear transformation so it's easy to expand, and can give a hierarchical layer of features. The Weakness is that it has complicated calculations of integer and real-valued neurons. It does not follow the gradient of any function, so the approximation of Contrastive divergence to maximum likelihood is improvised.

Literature

References

External links

Introduction to Restricted Boltzmann Machines
Edwin Chen's blog, July 18, 2011. * . Deeplearning4j Documentation * {{Cite web , url=http://deeplearning4j.org/understandingRBMs.html , title=Understanding RBMs , access-date=December 29, 2014 , archive-url=https://web.archive.org/web/20160920122139/http://deeplearning4j.org/understandingRBMs.html , archive-date=September 20, 2016 , url-status=dead , df=mdy-all . Deeplearning4j Documentation * Pytho
implementation
of Bernoulli RBM an
tutorial

SimpleRBM
is a very small RBM code (24kB) useful for you to learn about how RBMs learn and work. *Julia implementation of Restricted Boltzmann machines: https://github.com/cossio/RestrictedBoltzmannMachines.jl Neural network architectures Stochastic models Supervised learning Unsupervised learning