The term kernel is used in
statistical analysis
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
to refer to a
window function
In signal processing and statistics, a window function (also known as an apodization function or tapering function) is a mathematical function that is zero-valued outside of some chosen interval. Typically, window functions are symmetric around ...
. The term "kernel" has several distinct meanings in different branches of statistics.
Bayesian statistics
In statistics, especially in
Bayesian statistics
Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...
, the kernel of a
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
(pdf) or
probability mass function
In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
(pmf) is the form of the pdf or pmf in which any factors that are not functions of any of the variables in the domain are omitted. Note that such factors may well be functions of the
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
s of the pdf or pmf. These factors form part of the
normalization factor of the
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
, and are unnecessary in many situations. For example, in
pseudo-random number sampling
Non-uniform random variate generation or pseudo-random number sampling is the numerical practice of generating pseudo-random numbers (PRN) that follow a given probability distribution.
Methods are typically based on the availability of a unifo ...
, most sampling algorithms ignore the normalization factor. In addition, in
Bayesian analysis
Thomas Bayes ( ; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian
Presbyterianism is a historically Reformed Protestant tradition named for its form of church government by representative assemblies of elde ...
of
conjugate prior
In Bayesian probability theory, if, given a likelihood function
p(x \mid \theta), the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posteri ...
distributions, the normalization factors are generally ignored during the calculations, and only the kernel considered. At the end, the form of the kernel is examined, and if it matches a known distribution, the normalization factor can be reinstated. Otherwise, it may be unnecessary (for example, if the distribution only needs to be sampled from).
For many distributions, the kernel can be written in closed form, but not the normalization constant.
An example is the
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
. Its
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
is
:
and the associated kernel is
:
Note that the factor in front of the exponential has been omitted, even though it contains the parameter
, because it is not a function of the domain variable
.
Pattern analysis
The kernel of a
reproducing kernel Hilbert space
In functional analysis, a reproducing kernel Hilbert space (RKHS) is a Hilbert space of functions in which point evaluation is a continuous linear functional. Specifically, a Hilbert space H of functions from a set X (to \mathbb or \mathbb) is ...
is used in the suite of techniques known as
kernel methods
In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pa ...
to perform tasks such as
statistical classification
When classification is performed by a computer, statistical methods are normally used to develop the algorithm.
Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or ''f ...
,
regression analysis, and
cluster analysis
Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more Similarity measure, similar (in some specific sense defined by the ...
on data in an implicit space. This usage is particularly common in
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
.
Nonparametric statistics
In
nonparametric statistics
Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as in parametric s ...
, a kernel is a weighting function used in
non-parametric
Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as in parametric sta ...
estimation techniques. Kernels are used in
kernel density estimation
In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on '' kernels'' as ...
to estimate
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
s'
density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
s, or in
kernel regression
In statistics, kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables ''X'' and ''Y''.
In any nonparametric r ...
to estimate the
conditional expectation
In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on ...
of a random variable. Kernels are also used in
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
, in the use of the
periodogram to estimate the
spectral density
In signal processing, the power spectrum S_(f) of a continuous time signal x(t) describes the distribution of power into frequency components f composing that signal. According to Fourier analysis, any physical signal can be decomposed into ...
where they are known as
window functions. An additional use is in the estimation of a time-varying intensity for a
point process
In statistics and probability theory, a point process or point field is a set of a random number of mathematical points randomly located on a mathematical space such as the real line or Euclidean space. Kallenberg, O. (1986). ''Random Measures'', ...
where window functions (kernels) are convolved with time-series data.
Commonly, kernel widths must also be specified when running a non-parametric estimation.
Definition
A kernel is a
non-negative
In mathematics, the sign of a real number is its property of being either positive, negative, or 0. Depending on local conventions, zero may be considered as having its own unique sign, having no sign, or having both positive and negative sign. ...
real-valued
In mathematics, value may refer to several, strongly related notions.
In general, a mathematical value may be any definite mathematical object. In elementary mathematics, this is most often a number – for example, a real number such as or an ...
integrable function ''K.'' For most applications, it is desirable to define the function to satisfy two additional requirements:
*
Normalization
Normalization or normalisation refers to a process that makes something more normal or regular. Science
* Normalization process theory, a sociological theory of the implementation of new technologies or innovations
* Normalization model, used in ...
:
:
*
Even-function Symmetry:
:
The first requirement ensures that the method of kernel density estimation results in a
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
. The second requirement ensures that the average of the corresponding distribution is equal to that of the sample used.
If ''K'' is a kernel, then so is the function ''K''* defined by ''K''*(''u'') = λ''K''(λ''u''), where λ > 0. This can be used to select a scale that is appropriate for the data.
Kernel functions in common use

Several types of kernel functions are commonly used: uniform, triangle, Epanechnikov, quartic (biweight), tricube, triweight, Gaussian, quadratic
and cosine.
In the table below, if
is given with a bounded
support, then
for values of ''u'' lying outside the support.
See also
*
Kernel density estimation
In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on '' kernels'' as ...
*
Kernel smoother
*
Stochastic kernel
In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finit ...
*
Positive-definite kernel
In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solvi ...
*
Density estimation
*
Multivariate kernel density estimation
*
Kernel method
Notes
References
*
*
*{{cite journal, year=2002
, first1=D, last1=Comaniciu, first2= P, last2= Meer
, title=Mean shift: A robust approach toward feature space analysis
, journal=IEEE Transactions on Pattern Analysis and Machine Intelligence, volume= 24, issue= 5, pages= 603–619
, citeseerx = 10.1.1.76.8968 , doi=10.1109/34.1000236
Nonparametric statistics
Time series
Point processes
Bayesian statistics