Repeated Median Regression
   HOME

TheInfoList



OR:

In
robust statistics Robust statistics are statistics that maintain their properties even if the underlying distributional assumptions are incorrect. Robust Statistics, statistical methods have been developed for many common problems, such as estimating location parame ...
, repeated median regression, also known as the repeated median estimator, is a robust linear regression algorithm. The estimator has a
breakdown point Robust statistics are statistics that maintain their properties even if the underlying distributional assumptions are incorrect. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regr ...
of 50%. Although it is
equivariant In mathematics, equivariance is a form of symmetry for functions from one space with symmetry to another (such as symmetric spaces). A function is said to be an equivariant map when its domain and codomain are acted on by the same symmetry group, ...
under scaling, or under
linear transformation In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pr ...
s of either its explanatory variable or its response variable, it is not under
affine transformation In Euclidean geometry, an affine transformation or affinity (from the Latin, '' affinis'', "connected with") is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles. More general ...
s that combine both variables.Peter J. Rousseeuw, Nathan S. Netanyahu, and David M. Mount,
New Statistical and Computational Results on the Repeated Median Regression Estimator
, in ''New Directions in Statistical Data Analysis and Robustness'', edited by Stephan Morgenthaler, Elvezio Ronchetti, and Werner A. Stahel, Birkhauser Verlag, Basel, 1993, pp. 177-194.
It can be calculated in O(n^2) time by brute force, in O(n \log^2 n) time using more sophisticated techniques, or in O(n\log n) randomized expected time. It may also be calculated using an on-line algorithm with O(n) update time.


Method

The repeated median method estimates the slope of the regression line y = A + Bx for a set of points (X_i, Y_i) as :\widehat B = \underset \ \underset \ \operatorname(i, j) where \operatorname(i,j) is defined as (Y_j - Y_i) / (X_j - X_i). The estimated Y-axis intercept is defined as :\widehat A = \underset \ \underset \ \operatorname(i, j) where \operatorname(i, j) is defined as (X_j Y_i - X_i Y_j ) / (X_j - X_i). A simpler and faster alternative to estimate the intercept \widehat A is to use the value \widehat B just estimated, thus: :\widehat A = \underset \ (y_i - \widehat x_i) Note: The direct and hierarchical methods of estimating \widehat A give slightly different values, with the hierarchical method normally being the best estimate. This latter hierarchical approach is idential to the method of estimating \widehat A in
Theil–Sen estimator In non-parametric statistics, the Theil–Sen estimator is a method for robustly fitting a line to sample points in the plane (simple linear regression) by choosing the median of the slopes of all lines through pairs of points. It has also b ...
regression.


See also

*
Theil–Sen estimator In non-parametric statistics, the Theil–Sen estimator is a method for robustly fitting a line to sample points in the plane (simple linear regression) by choosing the median of the slopes of all lines through pairs of points. It has also b ...


References

Robust regression Statistical algorithms {{statistics-stub