
An artificial neuron is a
mathematical function
In mathematics, a function from a set (mathematics), set to a set assigns to each element of exactly one element of .; the words ''map'', ''mapping'', ''transformation'', ''correspondence'', and ''operator'' are sometimes used synonymously. ...
conceived as a
model
A model is an informative representation of an object, person, or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin , .
Models can be divided in ...
of a
biological neuron in a
neural network
A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...
. The artificial neuron is the elementary unit of an ''
artificial neural network
In machine learning, a neural network (also artificial neural network or neural net, abbreviated ANN or NN) is a computational model inspired by the structure and functions of biological neural networks.
A neural network consists of connected ...
''.
The design of the artificial neuron was inspired by biological
neural circuitry. Its inputs are analogous to
excitatory postsynaptic potential
In neuroscience, an excitatory postsynaptic potential (EPSP) is a postsynaptic potential that makes the postsynaptic neuron more likely to fire an action potential. This temporary depolarization of postsynaptic membrane potential, caused by the ...
s and
inhibitory postsynaptic potentials at neural
dendrite
A dendrite (from Ancient Greek language, Greek δένδρον ''déndron'', "tree") or dendron is a branched cytoplasmic process that extends from a nerve cell that propagates the neurotransmission, electrochemical stimulation received from oth ...
s, or . Its weights are analogous to
synaptic weights, and its output is analogous to a neuron's
action potential
An action potential (also known as a nerve impulse or "spike" when in a neuron) is a series of quick changes in voltage across a cell membrane. An action potential occurs when the membrane potential of a specific Cell (biology), cell rapidly ri ...
which is transmitted along its
axon
An axon (from Greek ἄξων ''áxōn'', axis) or nerve fiber (or nerve fibre: see American and British English spelling differences#-re, -er, spelling differences) is a long, slender cellular extensions, projection of a nerve cell, or neuron, ...
.
Usually, each input is separately
weighted, and the sum is often added to a term known as a ''bias'' (loosely corresponding to the
threshold potential), before being passed through a
nonlinear function known as an
activation function. Depending on the task, these functions could have a
sigmoid shape (e.g. for
binary classification), but they may also take the form of other nonlinear functions,
piecewise
In mathematics, a piecewise function (also called a piecewise-defined function, a hybrid function, or a function defined by cases) is a function whose domain is partitioned into several intervals ("subdomains") on which the function may be ...
linear functions, or
step functions. They are also often
monotonically increasing,
continuous,
differentiable
In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point in ...
, and
bounded. Non-monotonic, unbounded, and oscillating activation functions with multiple zeros that outperform sigmoidal and
ReLU-like activation functions on many tasks have also been recently explored. The threshold function has inspired building
logic gate
A logic gate is a device that performs a Boolean function, a logical operation performed on one or more binary inputs that produces a single binary output. Depending on the context, the term may refer to an ideal logic gate, one that has, for ...
s referred to as threshold logic; applicable to building
logic circuit
A logic gate is a device that performs a Boolean function, a logical operation performed on one or more binary inputs that produces a single binary output. Depending on the context, the term may refer to an ideal logic gate, one that has, for ...
s resembling brain processing. For example, new devices such as
memristor
A memristor (; a portmanteau of ''memory resistor'') is a non-linear two-terminal electrical component relating electric charge and magnetic flux linkage. It was described and named in 1971 by Leon Chua, completing a theoretical quartet of ...
s have been extensively used to develop such logic.
The artificial neuron activation function should not be confused with a linear system's
transfer function
In engineering, a transfer function (also known as system function or network function) of a system, sub-system, or component is a function (mathematics), mathematical function that mathematical model, models the system's output for each possible ...
.
An artificial neuron may be referred to as a semi-linear unit, Nv neuron, binary neuron, linear threshold function, or McCulloch–Pitts (MCP) neuron, depending on the structure used.
Simple artificial neurons, such as the McCulloch–Pitts model, are sometimes described as "caricature models", since they are intended to reflect one or more neurophysiological observations, but without regard to realism. Artificial neurons can also refer to
artificial cells in
neuromorphic engineering that are similar to natural physical neurons.
Basic structure
For a given artificial neuron
, let there be
inputs with signals
through
and weights
through
. Usually, the input
is assigned the value +1, which makes it a bias input with
. This leaves only
actual inputs to the neuron:
to
.
The output of the
-th neuron is:
:
,
where
(phi) is the activation function.
The output is analogous to the
axon
An axon (from Greek ἄξων ''áxōn'', axis) or nerve fiber (or nerve fibre: see American and British English spelling differences#-re, -er, spelling differences) is a long, slender cellular extensions, projection of a nerve cell, or neuron, ...
of a biological neuron, and its value propagates to the input of the next layer, through a synapse. It may also exit the system, possibly as part of an output
vector
Vector most often refers to:
* Euclidean vector, a quantity with a magnitude and a direction
* Disease vector, an agent that carries and transmits an infectious pathogen into another living organism
Vector may also refer to:
Mathematics a ...
.
It has no learning process as such. Its activation function weights are calculated, and its threshold value is predetermined.
McCulloch–Pitts (MCP) neuron
An MCP neuron is a kind of restricted artificial neuron which operates in discrete time-steps. Each has zero or more inputs, and are written as
. It has one output, written as
. Each input can be either ''excitatory'' or ''inhibitory''. The output can either be ''quiet'' or ''firing''. An MCP neuron also has a threshold
.
In an MCP neural network, all the neurons operate in synchronous discrete time-steps of
. At time
, the output of the neuron is
if the number of firing excitatory inputs is at least equal to the threshold, and no inhibitory inputs are firing;
otherwise.
Each output can be the input to an arbitrary number of neurons, including itself (i.e., self-loops are possible). However, an output cannot connect more than once with a single neuron. Self-loops do not cause contradictions, since the network operates in synchronous discrete time-steps.
As a simple example, consider a single neuron with threshold 0, and a single inhibitory self-loop. Its output would oscillate between 0 and 1 at every step, acting as a "clock".
Any
finite state machine can be simulated by a MCP neural network.
Furnished with an infinite tape, MCP neural networks can simulate any
Turing machine
A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algori ...
.
Biological models
Artificial neurons are designed to mimic aspects of their biological counterparts. However a significant performance gap exists between biological and artificial neural networks. In particular single biological neurons in the human brain with oscillating activation function capable of learning the
XOR function have been discovered.
*
Dendrites – in biological neurons, dendrites act as the input vector. These dendrites allow the cell to receive signals from a large (>1000) number of neighboring neurons. As in the above mathematical treatment, each dendrite is able to perform "multiplication" by that dendrite's "weight value." The multiplication is accomplished by increasing or decreasing the ratio of synaptic neurotransmitters to signal chemicals introduced into the dendrite in response to the synaptic neurotransmitter. A negative multiplication effect can be achieved by transmitting signal inhibitors (i.e. oppositely charged ions) along the dendrite in response to the reception of synaptic neurotransmitters.
*
Soma – in biological neurons, the soma acts as the summation function, seen in the above mathematical description. As positive and negative signals (exciting and inhibiting, respectively) arrive in the soma from the dendrites, the positive and negative ions are effectively added in summation, by simple virtue of being mixed together in the solution inside the cell's body.
*
Axon
An axon (from Greek ἄξων ''áxōn'', axis) or nerve fiber (or nerve fibre: see American and British English spelling differences#-re, -er, spelling differences) is a long, slender cellular extensions, projection of a nerve cell, or neuron, ...
– the axon gets its signal from the summation behavior which occurs inside the soma. The opening to the axon essentially samples the electrical potential of the solution inside the soma. Once the soma reaches a certain potential, the axon will transmit an all-in signal pulse down its length. In this regard, the axon behaves as the ability for us to connect our artificial neuron to other artificial neurons.
Unlike most artificial neurons, however, biological neurons fire in discrete pulses. Each time the electrical potential inside the soma reaches a certain threshold, a pulse is transmitted down the axon. This pulsing can be translated into continuous values. The rate (activations per second, etc.) at which an axon fires converts directly into the rate at which neighboring cells get signal ions introduced into them. The faster a biological neuron fires, the faster nearby neurons accumulate electrical potential (or lose electrical potential, depending on the "weighting" of the dendrite that connects to the neuron that fired). It is this conversion that allows computer scientists and mathematicians to simulate biological neural networks using artificial neurons which can output distinct values (often from −1 to 1).
Encoding
Research has shown that
unary coding is used in the neural circuits responsible for
birdsong
Bird vocalization includes both bird calls and bird songs. In non-technical use, bird songs (often simply ''birdsong'') are the bird sounds that are melodious to the human ear. In ornithology and birding, songs (relatively complex vocalization ...
production. The use of unary in biological networks is presumably due to the inherent simplicity of the coding. Another contributing factor could be that unary coding provides a certain degree of error correction.
Physical artificial cells
There is research and development into physical artificial neurons – organic and inorganic.
For example, some artificial neurons can receive
and release
dopamine
Dopamine (DA, a contraction of 3,4-dihydroxyphenethylamine) is a neuromodulatory molecule that plays several important roles in cells. It is an organic chemical of the catecholamine and phenethylamine families. It is an amine synthesized ...
(
chemical signals rather than electrical signals) and communicate with natural rat
muscle
Muscle is a soft tissue, one of the four basic types of animal tissue. There are three types of muscle tissue in vertebrates: skeletal muscle, cardiac muscle, and smooth muscle. Muscle tissue gives skeletal muscles the ability to muscle contra ...
and
brain cells, with potential for use in
BCIs/
prosthetics.
Low-power biocompatible
memristor
A memristor (; a portmanteau of ''memory resistor'') is a non-linear two-terminal electrical component relating electric charge and magnetic flux linkage. It was described and named in 1971 by Leon Chua, completing a theoretical quartet of ...
s may enable construction of artificial neurons which function at voltages of biological
action potential
An action potential (also known as a nerve impulse or "spike" when in a neuron) is a series of quick changes in voltage across a cell membrane. An action potential occurs when the membrane potential of a specific Cell (biology), cell rapidly ri ...
s and could be used to directly process
biosensing signals, for
neuromorphic computing and/or
direct communication with biological neurons.
Organic neuromorphic circuits made out of
polymer
A polymer () is a chemical substance, substance or material that consists of very large molecules, or macromolecules, that are constituted by many repeat unit, repeating subunits derived from one or more species of monomers. Due to their br ...
s, coated with an ion-rich gel to enable a material to carry an electric charge like
real neurons, have been built into a robot, enabling it to learn sensorimotorically within the real world, rather than via simulations or virtually.
Moreover, artificial spiking neurons made of soft matter (polymers) can operate in biologically relevant environments and enable the synergetic communication between the artificial and biological domains.
History
The first artificial neuron was the Threshold Logic Unit (TLU), or Linear Threshold Unit,
first proposed by
Warren McCulloch
Warren Sturgis McCulloch (November 16, 1898 – September 24, 1969) was an American neurophysiologist and cybernetician known for his work on the foundation for certain brain theories and his contribution to the cybernetics movement.Ken Aizawa ...
and
Walter Pitts
Walter Harry Pitts, Jr. (April 23, 1923 – May 14, 1969) was an American logician who worked in the field of computational neuroscience.Smalheiser, Neil R"Walter Pitts", ''Perspectives in Biology and Medicine'', Volume 43, Number 2, Wint ...
in 1943 in ''
A logical calculus of the ideas immanent in nervous activity''. The model was specifically targeted as a computational model of the "nerve net" in the brain.
As an activation function, it employed a threshold, equivalent to using the
Heaviside step function. Initially, only a simple model was considered, with binary inputs and outputs, some restrictions on the possible weights, and a more flexible threshold value. Since the beginning it was already noticed that any
Boolean function
In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set (usually , or ). Alternative names are switching function, used especially in older computer science literature, and truth functi ...
could be implemented by networks of such devices, what is easily seen from the fact that one can implement the AND and OR functions, and use them in the
disjunctive or the
conjunctive normal form.
Researchers also soon realized that cyclic networks, with
feedback
Feedback occurs when outputs of a system are routed back as inputs as part of a chain of cause and effect that forms a circuit or loop. The system can then be said to ''feed back'' into itself. The notion of cause-and-effect has to be handle ...
s through neurons, could define dynamical systems with memory, but most of the research concentrated (and still does) on strictly
feed-forward networks because of the smaller difficulty they present.
One important and pioneering artificial neural network that used the linear threshold function was the
perceptron
In machine learning, the perceptron is an algorithm for supervised classification, supervised learning of binary classification, binary classifiers. A binary classifier is a function that can decide whether or not an input, represented by a vect ...
, developed by
Frank Rosenblatt
Frank Rosenblatt (July 11, 1928July 11, 1971) was an American psychologist notable in the field of artificial intelligence. He is sometimes called the father of deep learning for his pioneering work on artificial neural networks.
Life and career
...
. This model already considered more flexible weight values in the neurons, and was used in machines with adaptive capabilities. The representation of the threshold values as a bias term was introduced by
Bernard Widrow in 1960 – see
ADALINE.
In the late 1980s, when research on neural networks regained strength, neurons with more continuous shapes started to be considered. The possibility of differentiating the activation function allows the direct use of the
gradient descent and other optimization algorithms for the adjustment of the weights. Neural networks also started to be used as a general
function approximation
In general, a function approximation problem asks us to select a function (mathematics), function among a that closely matches ("approximates") a in a task-specific way. The need for function approximations arises in many branches of applied ...
model. The best known training algorithm called
backpropagation has been rediscovered several times but its first development goes back to the work of
Paul Werbos.
Types of activation function
The activation function of a neuron is chosen to have a number of properties which either enhance or simplify the network containing the neuron. Crucially, for instance, any
multilayer perceptron using a linear activation function has an equivalent single-layer network; a ''non''-linear function is therefore necessary to gain the advantages of a multi-layer network.
Below,
refers in all cases to the weighted sum of all the inputs to the neuron, i.e. for
inputs,
:
where
is a vector of synaptic weights and
is a vector of inputs.
Step function
The output
of this activation function is binary, depending on whether the input meets a specified threshold,
(theta). The "signal" is sent, i.e. the output is set to 1, if the activation meets or exceeds the threshold.
:
This function is used in
perceptron
In machine learning, the perceptron is an algorithm for supervised classification, supervised learning of binary classification, binary classifiers. A binary classifier is a function that can decide whether or not an input, represented by a vect ...
s, and appears in many other models. It performs a division of the
space
Space is a three-dimensional continuum containing positions and directions. In classical physics, physical space is often conceived in three linear dimensions. Modern physicists usually consider it, with time, to be part of a boundless ...
of inputs by a
hyperplane
In geometry, a hyperplane is a generalization of a two-dimensional plane in three-dimensional space to mathematical spaces of arbitrary dimension. Like a plane in space, a hyperplane is a flat hypersurface, a subspace whose dimension is ...
. It is specially useful in the last layer of a network, intended for example to perform binary classification of the inputs.
Linear combination
In this case, the output unit is simply the weighted sum of its inputs, plus a bias term. A number of such linear neurons perform a linear transformation of the input vector. This is usually more useful in the early layers of a network. A number of analysis tools exist based on linear models, such as
harmonic analysis, and they can all be used in neural networks with this linear neuron. The bias term allows us to make
affine transformations to the data.
Sigmoid
A fairly simple nonlinear function, the
sigmoid function
A sigmoid function is any mathematical function whose graph of a function, graph has a characteristic S-shaped or sigmoid curve.
A common example of a sigmoid function is the logistic function, which is defined by the formula
:\sigma(x ...
such as the logistic function also has an easily calculated derivative, which can be important when calculating the weight updates in the network. It thus makes the network more easily manipulable mathematically, and was attractive to early computer scientists who needed to minimize the computational load of their simulations. It was previously commonly seen in
multilayer perceptrons. However, recent work has shown sigmoid neurons to be less effective than
rectified linear neurons. The reason is that the gradients computed by the
backpropagation algorithm tend to diminish towards zero as activations propagate through layers of sigmoidal neurons, making it difficult to optimize neural networks using multiple layers of sigmoidal neurons.
Rectifier
In the context of
artificial neural network
In machine learning, a neural network (also artificial neural network or neural net, abbreviated ANN or NN) is a computational model inspired by the structure and functions of biological neural networks.
A neural network consists of connected ...
s, the rectifier or ReLU (Rectified Linear Unit) is an
activation function defined as the positive part of its argument:
:
where
is the input to a neuron. This is also known as a
ramp function and is analogous to
half-wave rectification in electrical engineering. This
activation function was first introduced to a dynamical network by Hahnloser et al. in a 2000 paper in ''
Nature
Nature is an inherent character or constitution, particularly of the Ecosphere (planetary), ecosphere or the universe as a whole. In this general sense nature refers to the Scientific law, laws, elements and phenomenon, phenomena of the physic ...
''
with strong
biological motivations and mathematical justifications.
It has been demonstrated for the first time in 2011 to enable better training of deeper networks,
compared to the widely used activation functions prior to 2011, i.e., the
logistic sigmoid (which is inspired by
probability theory
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
; see
logistic regression
In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...
) and its more practical
counterpart, the
hyperbolic tangent.
A commonly used variant of the ReLU activation function is the Leaky ReLU which allows a small, positive gradient when the unit is not active:
where
is the input to the neuron and
is a small positive constant (set to 0.01 in the original paper).
[Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng (2014)]
Rectifier Nonlinearities Improve Neural Network Acoustic Models
Pseudocode algorithm
The following is a simple
pseudocode implementation of a single Threshold Logic Unit (TLU) which takes
Boolean inputs (true or false), and returns a single Boolean output when activated. An
object-oriented model is used. No method of training is defined, since several exist. If a purely functional model were used, the class TLU below would be replaced with a function TLU with input parameters threshold, weights, and inputs that returned a Boolean value.
class TLU defined as:
data member threshold : number
data member weights : list of numbers of size X
function member fire(inputs : list of booleans of size X) : boolean defined as:
variable T : number
T ← 0
for each i in 1 to X do
if inputs(i) is true then
T ← T + weights(i)
end if
end for each
if T > threshold then
return true
else:
return false
end if
end function
end class
See also
*
Binding neuron
*
Connectionism
Connectionism is an approach to the study of human mental processes and cognition that utilizes mathematical models known as connectionist networks or artificial neural networks.
Connectionism has had many "waves" since its beginnings. The first ...
References
Further reading
*
*
External links
{{sic, nolink=y, Artifical neuron mimicks function of human cellsMcCulloch-Pitts Neurons (Overview)
Artificial neural networks
American inventions
Bioinspiration