A kernel smoother is a
statistical
Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industr ...
technique to estimate a real valued
function
Function or functionality may refer to:
Computing
* Function key, a type of key on computer keyboards
* Function model, a structured representation of processes in a system
* Function object or functor or functionoid, a concept of object-orie ...
as the
weighted average
The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The ...
of neighboring observed data. The weight is defined by the ''
kernel
Kernel may refer to:
Computing
* Kernel (operating system), the central component of most operating systems
* Kernel (image processing), a matrix used for image convolution
* Compute kernel, in GPGPU programming
* Kernel method, in machine lea ...
'', such that closer points are given higher weights. The estimated function is smooth, and the level of smoothness is set by a single parameter.
Kernel smoothing is a type of
weighted moving average
In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating a series of averages of different subsets of the full data set. It is also called a moving mean (MM) or rolling mean and is ...
.
Definitions
Let
be a kernel defined by
:
where:
*
*
is the
Euclidean norm
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidea ...
*
is a parameter (kernel radius)
* ''D''(''t'') is typically a positive real valued function, whose value is decreasing (or not increasing) for the increasing distance between the ''X'' and ''X''
0.
Popular
kernels used for smoothing include parabolic (Epanechnikov), Tricube, and
Gaussian
Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below.
There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
kernels.
Let
be a continuous function of ''X''. For each
, the Nadaraya-Watson kernel-weighted average (smooth ''Y''(''X'') estimation) is defined by
:
where:
* ''N'' is the number of observed points
* ''Y''(''X''
''i'') are the observations at ''X''
''i'' points.
In the following sections, we describe some particular cases of kernel smoothers.
Gaussian kernel smoother
The
Gaussian kernel
In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form
f(x) = \exp (-x^2)
and with parametric extension
f(x) = a \exp\left( -\frac \right)
for arbitrary real constants , and non-zero . It is ...
is one of the most widely used kernels, and is expressed with the equation below.
:
Here, b is the length scale for the input space.
Nearest neighbor smoother
The idea of the
nearest neighbor smoother is the following. For each point ''X''
0, take m nearest neighbors and estimate the value of ''Y''(''X''
0) by averaging the values of these neighbors.
Formally,
, where
is the ''m''th closest to ''X''
0 neighbor, and
:
Example:
In this example, ''X'' is one-dimensional. For each X
0, the
is an average value of 16 closest to ''X''
0 points (denoted by red). The result is not smooth enough.
Kernel average smoother
The idea of the kernel average smoother is the following. For each data point ''X''
0, choose a constant distance size ''λ'' (kernel radius, or window width for ''p'' = 1 dimension), and compute a weighted average for all data points that are closer than
to ''X''
0 (the closer to ''X''
0 points get higher weights).
Formally,
and ''D''(''t'') is one of the popular kernels.
Example:
For each ''X''
0 the window width is constant, and the weight of each point in the window is schematically denoted by the yellow figure in the graph. It can be seen that the estimation is smooth, but the boundary points are biased. The reason for that is the non-equal number of points (from the right and from the left to the ''X''
0) in the window, when the ''X''
0 is close enough to the boundary.
Local linear regression
In the two previous sections we assumed that the underlying Y(X) function is locally constant, therefore we were able to use the weighted average for the estimation. The idea of local linear regression is to fit locally a straight line (or a hyperplane for higher dimensions), and not the constant (horizontal line). After fitting the line, the estimation
is provided by the value of this line at ''X''
0 point. By repeating this procedure for each ''X''
0, one can get the estimation function
.
Like in previous section, the window width is constant
Formally, the local linear regression is computed by solving a weighted least square problem.
For one dimension (''p'' = 1):
The closed form solution is given by:
:
where:
*
*
*
Example:
The resulting function is smooth, and the problem with the biased boundary points is reduced.
Local linear regression can be applied to any-dimensional space, though the question of what is a local neighborhood becomes more complicated. It is common to use k nearest training points to a test point to fit the local linear regression. This can lead to high variance of the fitted function. To bound the variance, the set of training points should contain the test point in their convex hull (see Gupta et al. reference).
Local polynomial regression
Instead of fitting locally linear functions, one can fit polynomial functions.
For p=1, one should minimize:
with
In general case (p>1), one should minimize:
See also
*
Savitzky–Golay filter
A Savitzky–Golay filter is a digital filter that can be applied to a set of digital data points for the purpose of smoothing the data, that is, to increase the precision of the data without distorting the signal tendency. This is achieved, in a ...
*
Kernel methods
In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example c ...
*
Kernel density estimation
In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on '' kernels'' as ...
*
Local regression
Local regression or local polynomial regression, also known as moving regression, is a generalization of the moving average and polynomial regression.
Its most common methods, initially developed for scatterplot smoothing, are LOESS (locally e ...
*
Kernel regression
In statistics, kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables ''X'' and ''Y''.
In any nonparametr ...
References
* Li, Q. and J.S. Racine. ''Nonparametric Econometrics: Theory and Practice''. Princeton University Press, 2007, .
* T. Hastie, R. Tibshirani and J. Friedman, ''The Elements of Statistical Learning'', Chapter 6, Springer, 2001. {{ISBN, 0-387-95284-5
companion book site.
* M. Gupta, E. Garcia and E. Chin
"Adaptive Local Linear Regression with Application to Printer Color Management,"IEEE Trans. Image Processing 2008.
Nonparametric statistics