Local regression or local polynomial regression, also known as moving regression, is a generalization of the
moving average
In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating a series of averages of different subsets of the full data set. It is also called a moving mean (MM) or rolling mean and is ...
and
polynomial regression
In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable ''x'' and the dependent variable ''y'' is modelled as an ''n''th degree polynomial in ''x''. Polynomial regression fi ...
.
Its most common methods, initially developed for
scatterplot smoothing, are LOESS (locally estimated scatterplot smoothing) and LOWESS (locally weighted scatterplot smoothing), both pronounced . They are two strongly related
non-parametric regression methods that combine multiple regression models in a
''k''-nearest-neighbor-based meta-model.
In some fields, LOESS is known and commonly referred to as
Savitzky–Golay filter (proposed 15 years before LOESS).
LOESS and LOWESS thus build on
"classical" methods, such as linear and nonlinear
least squares regression. They address situations in which the classical procedures do not perform well or cannot be effectively applied without undue labor. LOESS combines much of the simplicity of linear least squares regression with the flexibility of
nonlinear regression
In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. The data are fi ...
. It does this by fitting simple models to localized subsets of the data to build up a function that describes the deterministic part of the variation in the data, point by point. In fact, one of the chief attractions of this method is that the data analyst is not required to specify a global function of any form to fit a model to the data, only to fit segments of the data.
The trade-off for these features is increased computation. Because it is so computationally intensive, LOESS would have been practically impossible to use in the era when least squares regression was being developed. Most other modern methods for process modeling are similar to LOESS in this respect. These methods have been consciously designed to use our current computational ability to the fullest possible advantage to achieve goals not easily achieved by traditional approaches.
A smooth curve through a set of data points obtained with this statistical technique is called a loess curve, particularly when each smoothed value is given by a weighted quadratic least squares regression over the span of values of the ''y''-axis
scattergram
A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data ...
criterion variable. When each smoothed value is given by a weighted linear least squares regression over the span, this is known as a lowess curve; however, some authorities treat lowess and loess as synonyms.
[Kristen Pavlik, US Environmental Protection Agency, ]
Loess (or Lowess)
', Nutrient Steps, July 2016.
Model definition
In 1964, Savitsky and Golay proposed a method equivalent to LOESS, which is commonly referred to as
Savitzky–Golay filter.
William S. Cleveland
William Swain Cleveland II (born 1943) is an American computer scientist and Professor of Statistics and Professor of Computer Science at Purdue University, known for his work on data visualization, particularly on nonparametric regression and loc ...
rediscovered the method in 1979 and gave it a distinct name. The method was further developed by Cleveland and
Susan J. Devlin
Susan J. Devlin is an American statistician who has contributed to highly-cited research on robust statistics and local regression.
Education and career
Devlin earned a bachelor's degree in mathematics from Hobart and William Smith Colleges, Will ...
(1988). LOWESS is also known as locally weighted polynomial regression.
At each point in the range of the
data set a low-degree
polynomial
In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An ex ...
is fitted to a subset of the data, with
explanatory variable values near the point whose
response
Response may refer to:
*Call and response (music), musical structure
*Reaction (disambiguation)
*Request–response
**Output (computing), Output or response, the result of telecommunications input
*Response (liturgy), a line answering a versicle
...
is being estimated. The polynomial is fitted using
weighted least squares, giving more weight to points near the point whose response is being estimated and less weight to points further away. The value of the regression function for the point is then obtained by evaluating the local polynomial using the explanatory variable values for that data point. The LOESS fit is complete after regression function values have been computed for each of the
data points. Many of the details of this method, such as the degree of the polynomial model and the weights, are flexible. The range of choices for each part of the method and typical defaults are briefly discussed next.
Localized subsets of data
The subsets of data used for each weighted least squares fit in LOESS are determined by a nearest neighbors algorithm. A user-specified input to the procedure called the "bandwidth" or "smoothing parameter" determines how much of the data is used to fit each local polynomial. The smoothing parameter,
, is the fraction of the total number ''n'' of data points that are used in each local fit. The subset of data used in each weighted least squares fit thus comprises the
points (rounded to the next largest integer) whose explanatory variables' values are closest to the point at which the response is being estimated.
[NIST]
"LOESS (aka LOWESS)"
section 4.1.4.4, ''NIST/SEMATECH e-Handbook of Statistical Methods,'' (accessed 14 April 2017)
Since a polynomial of degree ''k'' requires at least ''k'' + 1 points for a fit, the smoothing parameter
must be between
and 1, with
denoting the degree of the local polynomial.
is called the smoothing parameter because it controls the flexibility of the LOESS regression function. Large values of
produce the smoothest functions that wiggle the least in response to fluctuations in the data. The smaller
is, the closer the regression function will conform to the data. Using too small a value of the smoothing parameter is not desirable, however, since the regression function will eventually start to capture the random error in the data.
Degree of local polynomials
The local polynomials fit to each subset of the data are almost always of first or second degree; that is, either locally linear (in the straight line sense) or locally quadratic. Using a zero degree polynomial turns LOESS into a weighted
moving average
In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating a series of averages of different subsets of the full data set. It is also called a moving mean (MM) or rolling mean and is ...
. Higher-degree polynomials would work in theory, but yield models that are not really in the spirit of LOESS. LOESS is based on the ideas that any function can be well approximated in a small neighborhood by a low-order polynomial and that simple models can be fit to data easily. High-degree polynomials would tend to overfit the data in each subset and are numerically unstable, making accurate computations difficult.
Weight function
As mentioned above, the weight function gives the most weight to the data points nearest the point of estimation and the least weight to the data points that are furthest away. The use of the weights is based on the idea that points near each other in the explanatory variable space are more likely to be related to each other in a simple way than points that are further apart. Following this logic, points that are likely to follow the local model best influence the local model parameter estimates the most. Points that are less likely to actually conform to the local model have less influence on the local model
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
estimates.
The traditional weight function used for LOESS is the
tri-cube weight function,
:
where ''d'' is the distance of a given data point from the point on the curve being fitted, scaled to lie in the range from 0 to 1.
However, any other weight function that satisfies the properties listed in Cleveland (1979) could also be used. The weight for a specific point in any localized subset of data is obtained by evaluating the weight function at the distance between that point and the point of estimation, after scaling the distance so that the maximum absolute distance over all of the points in the subset of data is exactly one.
Consider the following generalisation of the linear regression model with a metric
on the target space
that depends on two parameters,
. Assume that the linear hypothesis is based on
input parameters and that, as customary in these cases, we embed the input space
into
as
, and consider the following ''loss function''
:
Here,
is an
real matrix of coefficients,
and the subscript ''i'' enumerates input and output vectors from a training set. Since
is a metric, it is a symmetric, positive-definite matrix and, as such, there is another symmetric matrix
such that
. The above loss function can be rearranged into a trace by observing that
. By arranging the vectors
and
into the columns of a
matrix
and an
matrix
respectively, the above loss function can then be written as
:
where
is the square diagonal
matrix whose entries are the
s. Differentiating with respect to
and setting the result equal to 0 one finds the extremal matrix equation
:
Assuming further that the square matrix
is non-singular, the loss function
attains its minimum at
:
A typical choice for
is the
Gaussian weight
:
Advantages
As discussed above, the biggest advantage LOESS has over many other methods is the process of fitting a model to the sample data does not begin with the specification of a function. Instead the analyst only has to provide a smoothing parameter value and the degree of the local polynomial. In addition, LOESS is very flexible, making it ideal for modeling complex processes for which no theoretical models exist. These two advantages, combined with the simplicity of the method, make LOESS one of the most attractive of the modern regression methods for applications that fit the general framework of least squares regression but which have a complex deterministic structure.
Although it is less obvious than for some of the other methods related to linear least squares regression, LOESS also accrues most of the benefits typically shared by those procedures. The most important of those is the theory for computing uncertainties for prediction and calibration. Many other tests and procedures used for validation of least squares models can also be extended to LOESS models .
Disadvantages
LOESS makes less efficient use of data than other least squares methods. It requires fairly large, densely sampled data sets in order to produce good models. This is because LOESS relies on the local data structure when performing the local fitting. Thus, LOESS provides less complex data analysis in exchange for greater experimental costs.
Another disadvantage of LOESS is the fact that it does not produce a regression function that is easily represented by a mathematical formula. This can make it difficult to transfer the results of an analysis to other people. In order to transfer the regression function to another person, they would need the data set and software for LOESS calculations. In
nonlinear regression
In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. The data are fi ...
, on the other hand, it is only necessary to write down a functional form in order to provide estimates of the unknown parameters and the estimated uncertainty. Depending on the application, this could be either a major or a minor drawback to using LOESS. In particular, the simple form of LOESS can not be used for mechanistic modelling where fitted parameters specify particular physical properties of a system.
Finally, as discussed above, LOESS is a computationally intensive method (with the exception of evenly spaced data, where the regression can then be phrased as a non-causal
finite impulse response filter). LOESS is also prone to the effects of outliers in the data set, like other least squares methods. There is an iterative,
robust version of LOESS
leveland (1979)that can be used to reduce LOESS' sensitivity to
outliers, but too many extreme outliers can still overcome even the robust method.
See also
*
Degrees of freedom (statistics)#In non-standard regression
*
Kernel regression
*
Moving least squares
*
Moving average
In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating a series of averages of different subsets of the full data set. It is also called a moving mean (MM) or rolling mean and is ...
*
Multivariate adaptive regression splines
*
Non-parametric statistics
Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distri ...
*
Savitzky–Golay filter
*
Segmented regression
References
Citations
Sources
*
*
*
*
*
*
*
External links
Local Regression and Election ModelingSmoothing by Local Regression: Principles and Methods (PostScript Document)The Loess function in
RR: Scatter Plot SmoothingThe Lowess function in
RThe supsmu function (Friedman's SuperSmoother) in RQuantile LOESS– A method to perform Local regression on a Quantile moving window (with R code)
Nate Silver, How Opinion on Same-Sex Marriage Is Changing, and What It Means– sample of LOESS versus linear regression
Implementations
Fortran implementationC implementation (from the R project)Lowess implementation in Cythonb
Python implementation (in Statsmodels)LOESS Smoothing in ExcelLOESS implementation in pure JuliaJavaScript implementation
{{DEFAULTSORT:Local Regression
Nonparametric regression