HOME

TheInfoList



OR:

A kernel smoother is a
statistical Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industr ...
technique to estimate a real valued function f: \mathbb^p \to \mathbb as the
weighted average The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The ...
of neighboring observed data. The weight is defined by the ''
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learn ...
'', such that closer points are given higher weights. The estimated function is smooth, and the level of smoothness is set by a single parameter. Kernel smoothing is a type of weighted moving average.


Definitions

Let K_(X_0 ,X) be a kernel defined by :K_(X_0 ,X) = D\left( \frac \right) where: * X,X_0 \in \mathbb^p * \left\, \cdot \right\, is the
Euclidean norm Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean ...
* h_\lambda (X_0) is a parameter (kernel radius) * ''D''(''t'') is typically a positive real valued function, whose value is decreasing (or not increasing) for the increasing distance between the ''X'' and ''X''0. Popular kernels used for smoothing include parabolic (Epanechnikov), Tricube, and
Gaussian Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below. There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponym ...
kernels. Let Y(X):\mathbb^p \to \mathbb be a continuous function of ''X''. For each X_0 \in \mathbb^p, the Nadaraya-Watson kernel-weighted average (smooth ''Y''(''X'') estimation) is defined by :\hat(X_)=\frac where: * ''N'' is the number of observed points * ''Y''(''X''''i'') are the observations at ''X''''i'' points. In the following sections, we describe some particular cases of kernel smoothers.


Gaussian kernel smoother

The
Gaussian kernel In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form f(x) = \exp (-x^2) and with parametric extension f(x) = a \exp\left( -\frac \right) for arbitrary real constants , and non-zero . It is ...
is one of the most widely used kernels, and is expressed with the equation below. : K(x^*,x_i)=\exp\left(-\frac\right) Here, b is the length scale for the input space.


Nearest neighbor smoother

The idea of the nearest neighbor smoother is the following. For each point ''X''0, take m nearest neighbors and estimate the value of ''Y''(''X''0) by averaging the values of these neighbors. Formally, h_m (X_0)=\left\, X_0 - X_ \right\, , where X_ is the ''m''th closest to ''X''0 neighbor, and : D(t)= \begin 1/m & \text , t, \le 1 \\ 0 & \text \end Example: In this example, ''X'' is one-dimensional. For each X0, the \hat(X_0) is an average value of 16 closest to ''X''0 points (denoted by red). The result is not smooth enough.


Kernel average smoother

The idea of the kernel average smoother is the following. For each data point ''X''0, choose a constant distance size ''λ'' (kernel radius, or window width for ''p'' = 1 dimension), and compute a weighted average for all data points that are closer than \lambda to ''X''0 (the closer to ''X''0 points get higher weights). Formally, h_\lambda (X_0)=\lambda = \text, and ''D''(''t'') is one of the popular kernels. Example: For each ''X''0 the window width is constant, and the weight of each point in the window is schematically denoted by the yellow figure in the graph. It can be seen that the estimation is smooth, but the boundary points are biased. The reason for that is the non-equal number of points (from the right and from the left to the ''X''0) in the window, when the ''X''0 is close enough to the boundary.


Local linear regression

In the two previous sections we assumed that the underlying Y(X) function is locally constant, therefore we were able to use the weighted average for the estimation. The idea of local linear regression is to fit locally a straight line (or a hyperplane for higher dimensions), and not the constant (horizontal line). After fitting the line, the estimation \hat(X_) is provided by the value of this line at ''X''0 point. By repeating this procedure for each ''X''0, one can get the estimation function \hat(X). Like in previous section, the window width is constant h_\lambda (X_0)=\lambda = \text. Formally, the local linear regression is computed by solving a weighted least square problem. For one dimension (''p'' = 1): \begin & \min_ \sum\limits_^N \\ & \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\Downarrow \\ & \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\hat(X_)=\alpha (X_)+\beta (X_)X_ \\ \end The closed form solution is given by: : \hat(X_0)=\left( 1,X_0 \right)\left( B^W(X_0)B \right)^B^W(X_0)y where: * y=\left( Y(X_1),\dots,Y(X_N) \right)^T * W(X_0)= \operatorname \left( K_(X_0,X_i) \right)_ * B^=\left( \begin 1 & 1 & \dots & 1 \\ X_ & X_ & \dots & X_ \\ \end \right) Example: The resulting function is smooth, and the problem with the biased boundary points is reduced. Local linear regression can be applied to any-dimensional space, though the question of what is a local neighborhood becomes more complicated. It is common to use k nearest training points to a test point to fit the local linear regression. This can lead to high variance of the fitted function. To bound the variance, the set of training points should contain the test point in their convex hull (see Gupta et al. reference).


Local polynomial regression

Instead of fitting locally linear functions, one can fit polynomial functions. For p=1, one should minimize: \underset\,\sum\limits_^ with \hat(X_)=\alpha (X_)+\sum\limits_^ In general case (p>1), one should minimize: \begin & \hat(X_)=\underset\,\sum\limits_^^ \\ & b(X)=\left( \begin 1, & X_, & X_,... & X_^, & X_^,... & X_X_\,\,\,... \\ \end \right) \\ & \hat(X_)=b(X_)^\hat(X_) \\ \end


See also

*
Savitzky–Golay filter A Savitzky–Golay filter is a digital filter that can be applied to a set of digital data points for the purpose of smoothing the data, that is, to increase the precision of the data without distorting the signal tendency. This is achieved, in ...
*
Kernel methods In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for exampl ...
*
Kernel density estimation In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on '' kernels'' as ...
* Local regression *
Kernel regression In statistics, kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables ''X'' and ''Y''. In any nonparametric ...


References

* Li, Q. and J.S. Racine. ''Nonparametric Econometrics: Theory and Practice''. Princeton University Press, 2007, . * T. Hastie, R. Tibshirani and J. Friedman, ''The Elements of Statistical Learning'', Chapter 6, Springer, 2001. {{ISBN, 0-387-95284-5
companion book site
. * M. Gupta, E. Garcia and E. Chin
"Adaptive Local Linear Regression with Application to Printer Color Management,"
IEEE Trans. Image Processing 2008. Nonparametric statistics