HOME



picture info

Bagplot
A bagplot, or starburst plot, is a method in robust statistics for visualizing two- or three-dimensional statistical data, analogous to the one-dimensional box plot. Introduced in 1999 by Rousseuw et al., the bagplot allows one to visualize the location, spread, skewness, and outliers of a data set. Construction The bagplot consists of three nested polygons, called the "bag", the "fence", and the "loop". *The inner polygon, called the ''bag'', is constructed on the basis of Tukey depth, the smallest number of observations that can be contained by a half-plane that also contains a given point. It contains at most 50% of the data points *The outermost of the three polygons, called the ''fence'' is not drawn as part of the bagplot, but is used to construct it. It is formed by inflating the bag by a certain factor (usually 3). Observations outside the fence are flagged as outliers. *The observations that are not marked as outliers are surrounded by a ''loop'', the convex hull of the ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Box Plot
In descriptive statistics, a box plot or boxplot is a method for demonstrating graphically the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are called ''whiskers'') extending from the box indicating variability outside the upper and lower quartiles, thus, the plot is also called the box-and-whisker plot and the box-and-whisker diagram. Outliers that differ significantly from the rest of the dataset may be plotted as individual points beyond the whiskers on the box-plot. Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying probability distribution, statistical distribution (though Tukey's boxplot assumes symmetry for the whiskers and normality for their length). The spacings in each subsection of the box-plot indicate the degree of statistical dispersion, dispersion (spread) and skewness of the da ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Convex Hull
In geometry, the convex hull, convex envelope or convex closure of a shape is the smallest convex set that contains it. The convex hull may be defined either as the intersection of all convex sets containing a given subset of a Euclidean space, or equivalently as the set of all convex combinations of points in the subset. For a Bounded set, bounded subset of the plane, the convex hull may be visualized as the shape enclosed by a rubber band stretched around the subset. Convex hulls of open sets are open, and convex hulls of compact sets are compact. Every compact convex set is the convex hull of its extreme points. The convex hull operator is an example of a closure operator, and every antimatroid can be represented by applying this closure operator to finite sets of points. The algorithmic problems of finding the convex hull of a finite set of points in the plane or other low-dimensional Euclidean spaces, and its projective duality, dual problem of intersecting Half-space (geome ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Robust Statistics
Robust statistics are statistics that maintain their properties even if the underlying distributional assumptions are incorrect. Robust Statistics, statistical methods have been developed for many common problems, such as estimating location parameter, location, scale parameter, scale, and regression coefficient, regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a Parametric statistics, parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly. Introduction Robust statistics seek to provide methods that emulate popular statistical methods, but are not unduly affected by outliers or other small departures from Statistical assumption, model assumptions. In statistics, classical e ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Bivariate Data
In statistics, bivariate data is data on each of two variables, where each value of one of the variables is paired with a value of the other variable. It is a specific but very common case of multivariate data. The association can be studied via a tabular or graphical display, or via sample statistics which might be used for inference. Typically it would be of interest to investigate the possible association between the two variables. The method used to investigate the association would depend on the level of measurement of the variable. This association that involves exactly two variables can be termed a bivariate correlation, or bivariate association. For two quantitative variables (interval or ratio in level of measurement), a scatterplot can be used and a correlation coefficient or regression model can be used to quantify the association. For two qualitative variables (nominal or ordinal in level of measurement), a contingency table can be used to view the data, and a meas ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Peter Rousseeuw
Peter J. Rousseeuw (born 13 October 1956) is a Belgian statistician known for his work on robust statistics and cluster analysis. He obtained his PhD in 1981 at the Vrije Universiteit Brussel, following research carried out at the ETH in Zurich, which led to a book on influence functions. Later he was professor at the Delft University of Technology, The Netherlands, at the University of Fribourg, Switzerland, and at the University of Antwerp, Belgium. Next he was a senior researcher at Renaissance Technologies. He then returned to Belgium as professor at KU Leuven, until becoming emeritus in 2022. His former PhD students include Annick Leroy, Hendrik LopuhaƤ, Geert Molenberghs, Christophe Croux, Mia Hubert, Stefan Van Aelst, Tim Verdonck and Jakob Raymaekers. Research Rousseeuw has constructed and published many useful techniques. He proposed the Least Trimmed Squares method and S-estimators for robust regression, which can resist outliers in the data. He also introduced ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal distribution (a distribution with a single peak), negative skew commonly indicates that the ''tail'' is on the left side of the distribution, and positive skew indicates that the tail is on the right. In cases where one tail is long but the other tail is fat, skewness does not obey a simple rule. For example, a zero value in skewness means that the tails on both sides of the mean balance out overall; this is the case for a symmetric distribution but can also be true for an asymmetric distribution where one tail is long and thin, and the other is short but fat. Thus, the judgement on the symmetry of a given distribution by using only its skewness is risky; the distribution shape must be taken into account. Introduction Consider the two d ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Outliers
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are sometimes excluded from the data set. An outlier can be an indication of exciting possibility, but can also cause serious problems in statistical analyses. Outliers can occur by chance in any distribution, but they can indicate novel behaviour or structures in the data-set, measurement error, or that the population has a heavy-tailed distribution. In the case of measurement error, one wishes to discard them or use statistics that are robust to outliers, while in the case of heavy-tailed distributions, they indicate that the distribution has high skewness and that one should be very cautious in using tools or intuitions that assume a normal distribution. A frequent cause of outliers is a mixture of two distributions, which may be two d ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Polygon
In geometry, a polygon () is a plane figure made up of line segments connected to form a closed polygonal chain. The segments of a closed polygonal chain are called its '' edges'' or ''sides''. The points where two edges meet are the polygon's '' vertices'' or ''corners''. An ''n''-gon is a polygon with ''n'' sides; for example, a triangle is a 3-gon. A simple polygon is one which does not intersect itself. More precisely, the only allowed intersections among the line segments that make up the polygon are the shared endpoints of consecutive segments in the polygonal chain. A simple polygon is the boundary of a region of the plane that is called a ''solid polygon''. The interior of a solid polygon is its ''body'', also known as a ''polygonal region'' or ''polygonal area''. In contexts where one is concerned only with simple and solid polygons, a ''polygon'' may refer only to a simple polygon or to a solid polygon. A polygonal chain may cross over itself, creating star polyg ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Tukey Depth
In statistics and computational geometry, the Tukey depth or half-space depth is a measure of the depth of a point in a fixed set of points. The concept is named after its inventor, John Tukey. Given a set of ''n'' points \mathcal_n = \ in ''d''-dimensional space, Tukey's depth of a point ''x'' is the smallest fraction (or number) of points in any closed halfspace that contains ''x''. Tukey's depth measures how extreme a point is with respect to a point cloud. It is used to define the bagplot, a bivariate generalization of the boxplot. For example, for any extreme point of the convex hull there is always a (closed) halfspace that contains only that point, and hence its Tukey depth as a fraction is 1/n. Definitions ''Sample Tukey's depth'' of point ''x'', or Tukey's depth of ''x'' with respect to the point cloud \mathcal_n, is defined as D(x;\mathcal_n) = \inf_ \frac\sum_^n \mathbf\, where \mathbf\ is the indicator function In mathematics, an indicator fu ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Half-plane
In mathematics, the upper half-plane, is the set of points in the Cartesian plane with The lower half-plane is the set of points with instead. Arbitrary oriented half-planes can be obtained via a planar rotation. Half-planes are an example of two-dimensional half-space. A half-plane can be split in two quadrants. Affine geometry The affine transformations of the upper half-plane include # shifts (x,y)\mapsto (x+c,y), c\in\mathbb, and # dilations (x,y)\mapsto (\lambda x,\lambda y), \lambda > 0. Proposition: Let and be semicircles in the upper half-plane with centers on the boundary. Then there is an affine mapping that takes A to B. :Proof: First shift the center of to Then take \lambda=(\text\ B)/(\text\ A) and dilate. Then shift to the center of Inversive geometry Definition: \mathcal := \left\ . can be recognized as the circle of radius centered at and as the polar plot of \rho(\theta) = \cos \theta. Proposition: in and are collinear points. In ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are sometimes excluded from the data set. An outlier can be an indication of exciting possibility, but can also cause serious problems in statistical analyses. Outliers can occur by chance in any distribution, but they can indicate novel behaviour or structures in the data-set, measurement error, or that the population has a heavy-tailed distribution. In the case of measurement error, one wishes to discard them or use statistics that are robust statistics, robust to outliers, while in the case of heavy-tailed distributions, they indicate that the distribution has high skewness and that one should be very cautious in using tools or intuitions that assume a normal distribution. A frequent cause of outliers is a mixture of two distributions, wh ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]