In the field of
mathematical modeling
A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used in the natural sciences (such as physics, ...
, a radial basis function network is an
artificial neural network
Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains.
An ANN is based on a collection of connected units ...
that uses
radial basis function A radial basis function (RBF) is a real-valued function \varphi whose value depends only on the distance between the input and some fixed point, either the origin, so that \varphi(\mathbf) = \hat\varphi(\left\, \mathbf\right\, ), or some other fixed ...
s as
activation functions. The output of the network is a
linear combination of radial basis functions of the inputs and neuron parameters. Radial basis function networks have many uses, including
function approximation
In general, a function approximation problem asks us to select a function among a that closely matches ("approximates") a in a task-specific way. The need for function approximations arises in many branches of applied mathematics, and comput ...
,
time series prediction,
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood.
Classification is the grouping of related facts into classes.
It may also refer to:
Business, organizat ...
, and system
control. They were first formulated in a 1988 paper by Broomhead and Lowe, both researchers at the
Royal Signals and Radar Establishment
The Royal Signals and Radar Establishment (RSRE) was a scientific research establishment within the Ministry of Defence (MoD) of the United Kingdom. It was located primarily at Malvern in Worcestershire, England. The RSRE motto was ''Ubique ...
.
Network architecture

Radial basis function (RBF) networks typically have three layers: an input layer, a hidden layer with a non-linear RBF activation function and a linear output layer. The input can be modeled as a vector of real numbers
. The output of the network is then a scalar function of the input vector,
, and is given by
:
where
is the number of neurons in the hidden layer,
is the center vector for neuron
, and
is the weight of neuron
in the linear output neuron. Functions that depend only on the distance from a center vector are radially symmetric about that vector, hence the name radial basis function. In the basic form, all inputs are connected to each hidden neuron. The
norm is typically taken to be the
Euclidean distance
In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points.
It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore o ...
(although the
Mahalanobis distance The Mahalanobis distance is a measure of the distance between a point ''P'' and a distribution ''D'', introduced by P. C. Mahalanobis in 1936. Mahalanobis's definition was prompted by the problem of identifying the similarities of skulls based ...
appears to perform better with pattern recognition
) and the radial basis function is commonly taken to be
Gaussian
Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below.
There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
:
.
The Gaussian basis functions are local to the center vector in the sense that
:
i.e. changing parameters of one neuron has only a small effect for input values that are far away from the center of that neuron.
Given certain mild conditions on the shape of the activation function, RBF networks are
universal approximator
In the mathematical theory of artificial neural networks, universal approximation theorems are results that establish the density of an algorithmically generated class of functions within a given function space of interest. Typically, these resul ...
s on a
compact
Compact as used in politics may refer broadly to a pact or treaty; in more specific cases it may refer to:
* Interstate compact
* Blood compact, an ancient ritual of the Philippines
* Compact government, a type of colonial rule utilized in British ...
subset of
.
This means that an RBF network with enough hidden neurons can approximate any continuous function on a closed, bounded set with arbitrary precision.
The parameters
,
, and
are determined in a manner that optimizes the fit between
and the data.
Normalized
Normalized architecture
In addition to the above ''unnormalized'' architecture, RBF networks can be ''normalized''. In this case the mapping is
:
where
:
is known as a ''normalized radial basis function''.
Theoretical motivation for normalization
There is theoretical justification for this architecture in the case of stochastic data flow. Assume a
stochastic kernel In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite ...
approximation for the joint probability density
:
where the weights
and
are exemplars from the data and we require the kernels to be normalized
:
and
:
.
The probability densities in the input and output spaces are
:
and
:
The expectation of y given an input
is
:
where
:
is the conditional probability of y given
.
The conditional probability is related to the joint probability through
Bayes theorem
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For ex ...
:
which yields
:
.
This becomes
:
when the integrations are performed.
Local linear models
It is sometimes convenient to expand the architecture to include
local linear models. In that case the architectures become, to first order,
:
and
:
in the unnormalized and normalized cases, respectively. Here
are weights to be determined. Higher order linear terms are also possible.
This result can be written
:
where
:
and
:
in the unnormalized case and
:
in the normalized case.
Here
is a
Kronecker delta function
In mathematics, the Kronecker delta (named after Leopold Kronecker) is a function of two variables, usually just non-negative integers. The function is 1 if the variables are equal, and 0 otherwise:
\delta_ = \begin
0 &\text i \neq j, \\
1 & ...
defined as
:
.
Training
RBF networks are typically trained from pairs of input and target values
,
by a two-step algorithm.
In the first step, the center vectors
of the RBF functions in the hidden layer are chosen. This step can be performed in several ways; centers can be randomly sampled from some set of examples, or they can be determined using
k-means clustering
''k''-means clustering is a method of vector quantization, originally from signal processing, that aims to partition ''n'' observations into ''k'' clusters in which each observation belongs to the cluster with the nearest mean (cluster centers o ...
. Note that this step is
unsupervised
''Unsupervised'' is an American adult animated sitcom created by David Hornsby, Rob Rosell, and Scott Marder which ran on FX from January 19 to December 20, 2012. The show was created, and for the most part, written by David Hornsby, Scott Marder ...
.
The second step simply fits a linear model with coefficients
to the hidden layer's outputs with respect to some objective function. A common objective function, at least for regression/function estimation, is the least squares function:
:
where
:
.
We have explicitly included the dependence on the weights. Minimization of the least squares objective function by optimal choice of weights optimizes accuracy of fit.
There are occasions in which multiple objectives, such as smoothness as well as accuracy, must be optimized. In that case it is useful to optimize a regularized objective function such as
:
where
:
and
:
where optimization of S maximizes smoothness and
is known as a
regularization
Regularization may refer to:
* Regularization (linguistics)
* Regularization (mathematics)
* Regularization (physics)
* Regularization (solid modeling)
* Regularization Law, an Israeli law intended to retroactively legalize settlements
See also ...
parameter.
A third optional
backpropagation
In machine learning, backpropagation (backprop, BP) is a widely used algorithm for training feedforward artificial neural networks. Generalizations of backpropagation exist for other artificial neural networks (ANNs), and for functions gener ...
step can be performed to fine-tune all of the RBF net's parameters.
Interpolation
RBF networks can be used to interpolate a function
when the values of that function are known on finite number of points:
. Taking the known points
to be the centers of the radial basis functions and evaluating the values of the basis functions at the same points
the weights can be solved from the equation
: