Gated Linear Unit
   HOME

TheInfoList



OR:

In
neural networks A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...
, the gating mechanism is an architectural motif for controlling the flow of
activation In chemistry and biology, activation is the process whereby something is prepared or excited for a subsequent reaction. Chemistry In chemistry, "activation" refers to the reversible transition of a molecule into a nearly identical chemical or ...
and gradient signals. They are most prominently used in
recurrent neural network Recurrent neural networks (RNNs) are a class of artificial neural networks designed for processing sequential data, such as text, speech, and time series, where the order of elements is important. Unlike feedforward neural networks, which proces ...
s (RNNs), but have also found applications in other architectures.


RNNs

Gating mechanisms are the centerpiece of
long short-term memory Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, ...
(LSTM). They were proposed to mitigate the
vanishing gradient problem In machine learning, the vanishing gradient problem is the problem of greatly diverging gradient magnitudes between earlier and later layers encountered when training neural networks with backpropagation. In such methods, neural network weights ar ...
often encountered by regular RNNs. An LSTM unit contains three gates: * An input gate, which controls the flow of new information into the memory cell * A forget gate, which controls how much information is retained from the previous time step * An output gate, which controls how much information is passed to the next layer. The equations for LSTM are: \begin \mathbf_t &= \sigma(\mathbf_t \mathbf_ + \mathbf_ \mathbf_ + \mathbf_i) \\ \mathbf_t &= \sigma(\mathbf_t \mathbf_ + \mathbf_ \mathbf_ + \mathbf_f) \\ \mathbf_t &= \sigma(\mathbf_t \mathbf_ + \mathbf_ \mathbf_ + \mathbf_o) \\ \tilde_t &= \tanh(\mathbf_t \mathbf_ + \mathbf_ \mathbf_ + \mathbf_c) \\ \mathbf_t &= \mathbf_t \odot \mathbf_ + \mathbf_t \odot \tilde_t \\ \mathbf_t &= \mathbf_t \odot \tanh(\mathbf_t) \end Here, \odot represents elementwise multiplication. File:LSTM 1.svg File:LSTM 0.svg File:LSTM 2.svg File:LSTM 3.svg The
gated recurrent unit Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term memory (LSTM) with a gating mechanism to input or forget certain features, but lacks a ...
(GRU) simplifies the LSTM. Compared to the LSTM, the GRU has just two gates: a reset gate and an update gate. GRU also merges the cell state and hidden state. The reset gate roughly corresponds to the forget gate, and the update gate roughly corresponds to the input gate. The output gate is removed. There are several variants of GRU. One particular variant has these equations: \begin \mathbf_t &= \sigma(\mathbf_t \mathbf_ + \mathbf_ \mathbf_ + \mathbf_r) \\ \mathbf_t &= \sigma(\mathbf_t \mathbf_ + \mathbf_ \mathbf_ + \mathbf_z) \\ \tilde_t &= \tanh(\mathbf_t \mathbf_ + (\mathbf_t \odot \mathbf_) \mathbf_ + \mathbf_h) \\ \mathbf_t &= \mathbf_t \odot \mathbf_ + (1 - \mathbf_t) \odot \tilde_t \end File:Gated Recurrent Unit 1.svg File:Gated Recurrent Unit 2.svg File:Gated Recurrent Unit 3.svg


Gated Linear Unit

Gated Linear Units (GLUs) adapt the gating mechanism for use in
feedforward neural network Feedforward refers to recognition-inference architecture of neural networks. Artificial neural network architectures are based on inputs multiplied by weights to obtain outputs (inputs-to-output): feedforward. Recurrent neural networks, or neur ...
s, often within
transformer In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple Electrical network, circuits. A varying current in any coil of the transformer produces ...
-based architectures. They are defined as: \mathrm(a,b)=a \odot \sigma(b) where a,b are the first and second inputs, respectively. \sigma represents the
sigmoid Sigmoid means resembling the lower-case Greek letter sigma (uppercase Σ, lowercase σ, lowercase in word-final position ς) or the Latin letter S. Specific uses include: * Sigmoid function, a mathematical function * Sigmoid colon, part of the l ...
activation function The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation f ...
. Replacing \sigma with other activation functions leads to variants of GLU: \begin \mathrm(a, b) &= a \odot \text(b)\\ \mathrm(a, b) &= a \odot \text(b)\\ \mathrm(a, b, \beta) &= a \odot \text_\beta(b) \end where ReLU, GELU, and Swish are different activation functions (see this table for definitions). In transformer models, such gating units are often used in the feedforward modules. For a single vector input, this results in: \begin \operatorname(x, W, V, b, c) & =\sigma(x W+b) \odot(x V+c) \\ \operatorname(x, W, V, b, c) & =(x W+b) \odot(x V+c) \\ \operatorname(x, W, V, b, c) & =\max (0, x W+b) \odot(x V+c) \\ \operatorname(x, W, V, b, c) & =\operatorname(x W+b) \odot(x V+c) \\ \operatorname(x, W, V, b, c, \beta) & =\operatorname_\beta(x W+b) \odot(x V+c) \end


Other architectures

Gating mechanism is used in highway networks, which were designed by unrolling an LSTM. Channel gating uses a gate to control the flow of information through different channels inside a
convolutional neural network A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different ty ...
(CNN).


See also

*
Recurrent neural network Recurrent neural networks (RNNs) are a class of artificial neural networks designed for processing sequential data, such as text, speech, and time series, where the order of elements is important. Unlike feedforward neural networks, which proces ...
*
Long short-term memory Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, ...
*
Gated recurrent unit Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term memory (LSTM) with a gating mechanism to input or forget certain features, but lacks a ...
*
Transformer In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple Electrical network, circuits. A varying current in any coil of the transformer produces ...
*
Activation function The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation f ...


References


Further reading

* {{Artificial intelligence navbox Neural network architectures Deep learning