The term kernel is used in

statistical analysis Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...

to refer to a

window function In signal processing and statistics, a window function (also known as an apodization function or tapering function) is a mathematical function that is zero-valued outside of some chosen interval. Typically, window functions are symmetric around ...

. The term "kernel" has several distinct meanings in different branches of statistics.

Bayesian statistics

In statistics, especially in

Bayesian statistics Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...

, the kernel of a

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

(pdf) or

probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...

(pmf) is the form of the pdf or pmf in which any factors that are not functions of any of the variables in the domain are omitted. Note that such factors may well be functions of the

parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

s of the pdf or pmf. These factors form part of the normalization factor of the

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

, and are unnecessary in many situations. For example, in

pseudo-random number sampling Non-uniform random variate generation or pseudo-random number sampling is the numerical practice of generating pseudo-random numbers (PRN) that follow a given probability distribution. Methods are typically based on the availability of a unifo ...

, most sampling algorithms ignore the normalization factor. In addition, in

Bayesian analysis Thomas Bayes ( ; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian Presbyterianism is a historically Reformed Protestant tradition named for its form of church government by representative assemblies of elde ...

conjugate prior In Bayesian probability theory, if, given a likelihood function p(x \mid \theta), the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posteri ...

distributions, the normalization factors are generally ignored during the calculations, and only the kernel considered. At the end, the form of the kernel is examined, and if it matches a known distribution, the normalization factor can be reinstated. Otherwise, it may be unnecessary (for example, if the distribution only needs to be sampled from). For many distributions, the kernel can be written in closed form, but not the normalization constant. An example is the

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

. Its

is :

p(x, \mu,\sigma^2) = \frac e^

and the associated kernel is :

p(x, \mu,\sigma^2) \propto e^

Note that the factor in front of the exponential has been omitted, even though it contains the parameter

\sigma^2

, because it is not a function of the domain variable

x

Pattern analysis

The kernel of a

reproducing kernel Hilbert space In functional analysis, a reproducing kernel Hilbert space (RKHS) is a Hilbert space of functions in which point evaluation is a continuous linear functional. Specifically, a Hilbert space H of functions from a set X (to \mathbb or \mathbb) is ...

is used in the suite of techniques known as

kernel methods In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pa ...

to perform tasks such as

statistical classification When classification is performed by a computer, statistical methods are normally used to develop the algorithm. Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or ''f ...

, regression analysis, and

cluster analysis Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more Similarity measure, similar (in some specific sense defined by the ...

on data in an implicit space. This usage is particularly common in

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

Nonparametric statistics

nonparametric statistics Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as in parametric s ...

, a kernel is a weighting function used in

non-parametric Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as in parametric sta ...

estimation techniques. Kernels are used in

kernel density estimation In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on '' kernels'' as ...

to estimate

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

s, or in

kernel regression In statistics, kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables ''X'' and ''Y''. In any nonparametric r ...

to estimate the

conditional expectation In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on ...

of a random variable. Kernels are also used in

time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...

, in the use of the periodogram to estimate the

spectral density In signal processing, the power spectrum S_(f) of a continuous time signal x(t) describes the distribution of power into frequency components f composing that signal. According to Fourier analysis, any physical signal can be decomposed into ...

where they are known as window functions. An additional use is in the estimation of a time-varying intensity for a

point process In statistics and probability theory, a point process or point field is a set of a random number of mathematical points randomly located on a mathematical space such as the real line or Euclidean space. Kallenberg, O. (1986). ''Random Measures'', ...

where window functions (kernels) are convolved with time-series data. Commonly, kernel widths must also be specified when running a non-parametric estimation.

Definition

A kernel is a

non-negative In mathematics, the sign of a real number is its property of being either positive, negative, or 0. Depending on local conventions, zero may be considered as having its own unique sign, having no sign, or having both positive and negative sign. ...

real-valued In mathematics, value may refer to several, strongly related notions. In general, a mathematical value may be any definite mathematical object. In elementary mathematics, this is most often a number – for example, a real number such as or an ...

integrable function ''K.'' For most applications, it is desirable to define the function to satisfy two additional requirements: *

Normalization Normalization or normalisation refers to a process that makes something more normal or regular. Science * Normalization process theory, a sociological theory of the implementation of new technologies or innovations * Normalization model, used in ...

: :

\int_^K(u)\,du = 1\,;

* Even-function Symmetry: :

K(-u) = K(u) \mbox u\,.

The first requirement ensures that the method of kernel density estimation results in a

. The second requirement ensures that the average of the corresponding distribution is equal to that of the sample used. If ''K'' is a kernel, then so is the function ''K''* defined by ''K''*(''u'') = λ''K''(λ''u''), where λ > 0. This can be used to select a scale that is appropriate for the data.

Kernel functions in common use

Several types of kernel functions are commonly used: uniform, triangle, Epanechnikov, quartic (biweight), tricube, triweight, Gaussian, quadratic and cosine. In the table below, if

K

is given with a bounded support, then

K(u) = 0

for values of ''u'' lying outside the support.

Notes

References

* * *{{cite journal, year=2002 , first1=D, last1=Comaniciu, first2= P, last2= Meer , title=Mean shift: A robust approach toward feature space analysis , journal=IEEE Transactions on Pattern Analysis and Machine Intelligence, volume= 24, issue= 5, pages= 603–619 , citeseerx = 10.1.1.76.8968 , doi=10.1109/34.1000236 Nonparametric statistics Time series Point processes Bayesian statistics