HOME

TheInfoList



OR:

In
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
, a deep belief network (DBN) is a generative graphical model, or alternatively a class of
deep Deep or The Deep may refer to: Places United States * Deep Creek (Appomattox River tributary), Virginia * Deep Creek (Great Salt Lake), Idaho and Utah * Deep Creek (Mahantango Creek tributary), Pennsylvania * Deep Creek (Mojave River tributary ...
neural network A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...
, composed of multiple layers of latent variables ("hidden units"), with connections between the layers but not between units within each layer. When trained on a set of examples without supervision, a DBN can learn to probabilistically reconstruct its inputs. The layers then act as feature detectors. After this learning step, a DBN can be further trained with
supervision Supervision is an act or instance of directing, managing, or oversight. Etymology The English noun "supervision" derives from the two Latin words "super" (above) and "videre" (see, observe). Spelling The spelling is "Supervision" in Standard ...
to perform
classification Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
. DBNs can be viewed as a composition of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or
autoencoder An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning). An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function ...
s, where each sub-network's hidden layer serves as the visible layer for the next. An RBM is an
undirected In discrete mathematics, particularly in graph theory, a graph is a structure consisting of a set of objects where some pairs of the objects are in some sense "related". The objects are represented by abstractions called '' vertices'' (also call ...
, generative energy-based model with a "visible" input layer and a hidden layer and connections between but not within layers. This composition leads to a fast, layer-by-layer unsupervised training procedure, where contrastive divergence is applied to each sub-network in turn, starting from the "lowest" pair of layers (the lowest visible layer is a
training set In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from ...
). The observation that DBNs can be trained greedily, one layer at a time, led to one of the first effective
deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
algorithms. Overall, there are many attractive implementations and uses of DBNs in real-life applications and scenarios (e.g.,
electroencephalography Electroencephalography (EEG) is a method to record an electrogram of the spontaneous electrical activity of the brain. The biosignal, bio signals detected by EEG have been shown to represent the postsynaptic potentials of pyramidal neurons in ...
,
drug discovery In the fields of medicine, biotechnology, and pharmacology, drug discovery is the process by which new candidate medications are discovered. Historically, drugs were discovered by identifying the active ingredient from traditional remedies or ...
).


Training

The training method for RBMs proposed by
Geoffrey Hinton Geoffrey Everest Hinton (born 1947) is a British-Canadian computer scientist, cognitive scientist, and cognitive psychologist known for his work on artificial neural networks, which earned him the title "the Godfather of AI". Hinton is Univer ...
for use with training " Product of Experts" models is called contrastive divergence (CD). CD provides an approximation to the
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
method that would ideally be applied for learning the weights. In training a single RBM, weight updates are performed with
gradient descent Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function. The idea is to take repeated steps in the opposite direction of the gradi ...
via the following equation: w_(t+1) = w_(t) + \eta\frac where, p(v) is the probability of a visible vector, which is given by p(v) = \frac\sum_he^. Z is the partition function (used for normalizing) and E(v,h) is the energy function assigned to the state of the network. A lower energy indicates the network is in a more "desirable" configuration. The gradient \frac has the simple form \langle v_ih_j\rangle_\text - \langle v_ih_j\rangle_\text where \langle\cdots\rangle_p represent averages with respect to distribution p. The issue arises in sampling \langle v_ih_j\rangle_\text because this requires extended alternating
Gibbs sampling In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate distribution, multivariate probability distribution when direct sampling from the joint distribution is dif ...
. CD replaces this step by running alternating Gibbs sampling for n steps (values of n = 1 perform well). After n steps, the data are sampled and that sample is used in place of \langle v_ih_j\rangle_\text. The CD procedure works as follows: # Initialize the visible units to a training vector. # Update the hidden units in parallel given the visible units: p(h_j = 1 \mid \textbf) = \sigma(b_j + \sum_i v_iw_). \sigma is the
sigmoid function A sigmoid function is any mathematical function whose graph of a function, graph has a characteristic S-shaped or sigmoid curve. A common example of a sigmoid function is the logistic function, which is defined by the formula :\sigma(x ...
and b_j is the bias of h_j. # Update the visible units in parallel given the hidden units: p(v_i = 1 \mid \textbf) = \sigma(a_i + \sum_j h_jw_). a_i is the bias of v_i. This is called the "reconstruction" step. # Re-update the hidden units in parallel given the reconstructed visible units using the same equation as in step 2. # Perform the weight update: \Delta w_ \propto \langle v_ih_j\rangle_\text - \langle v_ih_j\rangle_\text. Once an RBM is trained, another RBM is "stacked" atop it, taking its input from the final trained layer. The new visible layer is initialized to a training vector, and values for the units in the already-trained layers are assigned using the current weights and biases. The new RBM is then trained with the procedure above. This whole process is repeated until the desired stopping criterion is met. Although the approximation of CD to maximum likelihood is crude (does not follow the gradient of any function), it is empirically effective.


See also

*
Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Whi ...
* Convolutional deep belief network *
Deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
* Energy based model * Stacked Restricted Boltzmann Machine


References


External links

* * * {{Cite web , title=Deep Belief Network Example , url=http://deeplearning4j.org/deepbeliefnetwork.html , access-date=2015-02-22 , archive-url=https://web.archive.org/web/20161003210144/http://deeplearning4j.org/deepbeliefnetwork.html , archive-date=2016-10-03 , url-status=dead , publisher=Deeplearning4j Tutorials Neural network architectures Probabilistic models