Peter Rousseeuw
Peter J. Rousseeuw (born 13 October 1956) is a statistician known for his work on robust statistics and cluster analysis. He obtained his PhD in 1981 at the Vrije Universiteit Brussel, following research carried out at the ETH in Zurich, which led to a book on influence functions. Later he was professor at the Delft University of Technology, The Netherlands, at the University of Fribourg, Switzerland, and at the University of Antwerp, Belgium. Next he was a senior researcher at Renaissance Technologies. He then returned to Belgium as professor at KU Leuven, until becoming emeritus in 2022. His former PhD students include Annick Leroy, Hendrik Lopuhaä, Geert Molenberghs, Christophe Croux, Mia Hubert, Stefan Van Aelst, Tim Verdonck and Jakob Raymaekers. Research Rousseeuw has constructed and published many useful techniques. He proposed the Least Trimmed Squares method and S-estimators for robust regression, which can resist outliers in the data. He also introduced the Min ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Belgium
Belgium, ; french: Belgique ; german: Belgien officially the Kingdom of Belgium, is a country in Northwestern Europe. The country is bordered by the Netherlands to the north, Germany to the east, Luxembourg to the southeast, France to the southwest, and the North Sea to the northwest. It covers an area of and has a population of more than 11.5 million, making it the 22nd most densely populated country in the world and the 6th most densely populated country in Europe, with a density of . Belgium is part of an area known as the Low Countries, historically a somewhat larger region than the Benelux group of states, as it also included parts of northern France. The capital and largest city is Brussels; other major cities are Antwerp, Ghent, Charleroi, Liège, Bruges, Namur, and Leuven. Belgium is a sovereign state and a federal constitutional monarchy with a parliamentary system. Its institutional organization is complex and is structured on both regional ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Robust Regression And Outlier Detection
''Robust Regression and Outlier Detection'' is a book on robust statistics, particularly focusing on the breakdown point of methods for robust regression. It was written by Peter Rousseeuw and Annick M. Leroy, and published in 1987 by Wiley. Background Linear regression is the problem of inferring a linear functional relationship between a dependent variable and one or more independent variables, from data sets where that relation has been obscured by noise. Ordinary least squares assumes that the data all lie near the fit line or plane, but depart from it by the addition of normally distributed residual values. In contrast, robust regression methods work even when some of the data points are outliers that bear no relation to the fit line or plane, possibly because the data draws from a mixture of sources or possibly because an adversarial agent is trying to corrupt the data to cause the regression method to produce an inaccurate result. A typical application, discussed in the book ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Institute Of Mathematical Statistics
The Institute of Mathematical Statistics is an international professional and scholarly society devoted to the development, dissemination, and application of statistics and probability. The Institute currently has about 4,000 members in all parts of the world. Beginning in 2005, the institute started offering joint membership with the Bernoulli Society for Mathematical Statistics and Probability as well as with the International Statistical Institute. The Institute was founded in 1935 with Harry C. Carver and Henry L. Rietz as its two most important supporters. The institute publishes a variety of journals, and holds several international conference every year. Publications The Institute publishes five journals: *''Annals of Statistics'' *''Annals of Applied Statistics'' *'' Annals of Probability'' *'' Annals of Applied Probability'' *'' Statistical Science'' In addition, it co-sponsors: * The '' Current Index to Statistics'' * '' Electronic Communications in Probability'' * ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
International Statistical Institute
The International Statistical Institute (ISI) is a professional association of statisticians. It was founded in 1885, although there had been international statistical congresses since 1853. The institute has about 4,000 elected members from government, academia, and the private sector. The affiliated Associations have membership open to any professional statistician. The institute publishes a variety of books and journals, and holds an international conference every two years. The biennial convention was commonly known as the ISI Session; however, since 2011, it is now referred to as the ISI World Statistics Congress. The permanent office of the institute is located in the Statistics Netherlands building in Leidschenveen (The Hague), in the Netherlands. Specialized Associations ISI serves as an umbrella for seven specialized Associations: *Bernoulli Society for Mathematical Statistics and Probability (BS) *International Association for Statistical Computing (IASC) *Internationa ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Principal Component Analysis
Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and enabling the visualization of multidimensional data. Formally, PCA is a statistical technique for reducing the dimensionality of a dataset. This is accomplished by linearly transforming the data into a new coordinate system where (most of) the variation in the data can be described with fewer dimensions than the initial data. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points. Principal component analysis has applications in many fields such as population genetics, microbiome studies, and atmospheric science. The principal components of a collection of points in a real coordinate space are a sequence of p unit vectors, where the ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Boxplot
In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are called ''whiskers'') extending from the box indicating variability outside the upper and lower quartiles, thus, the plot is also termed as the box-and-whisker plot and the box-and-whisker diagram. Outliers that differ significantly from the rest of the dataset may be plotted as individual points beyond the whiskers on the box-plot. Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution (though Tukey's boxplot assumes symmetry for the whiskers and normality for their length). The spacings in each subsection of the box-plot indicate the degree of dispersion (spread) and skewness of the data, which are usually described using the five-n ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Bagplot
A bagplot, or starburst plot, is a method in robust statistics for visualizing two- or three-dimensional statistical data, analogous to the one-dimensional box plot. Introduced in 1999 by Rousseuw et al., the bagplot allows one to visualize the location, spread, skewness, and outliers of a data set. Construction The bagplot consists of three nested polygons, called the "bag", the "fence", and the "loop". *The inner polygon, called the ''bag'', is constructed on the basis of Tukey depth, the smallest number of observations that can be contained by a half-plane that also contains a given point. It contains at most 50% of the data points *The outermost of the three polygons, called the ''fence'' is not drawn as part of the bagplot, but is used to construct it. It is formed by inflating the bag by a certain factor (usually 3). Observations outside the fence are flagged as outliers. *The observations that are not marked as outliers are surrounded by a ''loop'', the convex hull of the ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
John Tukey
John Wilder Tukey (; June 16, 1915 – July 26, 2000) was an American mathematician and statistician, best known for the development of the fast Fourier Transform (FFT) algorithm and box plot. The Tukey range test, the Tukey lambda distribution, the Tukey test of additivity, and the Teichmüller–Tukey lemma all bear his name. He is also credited with coining the term 'bit' and the first published use of the word 'software'. Biography Tukey was born in New Bedford, Massachusetts in 1915, to a Latin teacher father and a private tutor. He was mainly taught by his mother and attended regular classes only for certain subjects like French. Tukey obtained a BA in 1936 and MSc in 1937 in chemistry, from Brown University, before moving to Princeton University, where in 1939 he received a PhD in mathematics after completing a doctoral dissertation titled "On denumerability in topology". During World War II, Tukey worked at the Fire Control Research Office and collaborated wi ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Robust Measures Of Scale
In statistics, robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the ''interquartile range'' (IQR) and the ''median absolute deviation'' (MAD). These are contrasted with conventional or non-robust measures of scale, such as sample variance or standard deviation, which are greatly influenced by outliers. These robust statistics are particularly used as estimators of a scale parameter, and have the advantages of both robustness and superior efficiency on contaminated data, at the cost of inferior efficiency on clean data from distributions such as the normal distribution. To illustrate robustness, the standard deviation can be made arbitrarily large by increasing exactly one observation (it has a breakdown point of 0, as it can be contaminated by a single point), a defect that is not shared by robust statistics. IQR and MAD One of the most common robus ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Median Absolute Deviation
In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample. For a univariate data set ''X''1, ''X''2, ..., ''Xn'', the MAD is defined as the median of the absolute deviations from the data's median \tilde=\operatorname(X) : : \operatorname = \operatorname( , X_i - \tilde, ) that is, starting with the residuals (deviations) from the data's median, the MAD is the median of their absolute values. Example Consider the data (1, 1, 2, 2, 4, 6, 9). It has a median value of 2. The absolute deviations about 2 are (1, 1, 0, 0, 2, 4, 7) which in turn have a median value of 1 (because the sorted absolute deviations are (0, 0, 1, 1, 2, 4, 7)). So the median absolute deviation for this data is 1. Uses The median absolute deviation is a measure of statistical dispersion. Moreover, the MAD is ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Robust Measures Of Scale
In statistics, robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the ''interquartile range'' (IQR) and the ''median absolute deviation'' (MAD). These are contrasted with conventional or non-robust measures of scale, such as sample variance or standard deviation, which are greatly influenced by outliers. These robust statistics are particularly used as estimators of a scale parameter, and have the advantages of both robustness and superior efficiency on contaminated data, at the cost of inferior efficiency on clean data from distributions such as the normal distribution. To illustrate robustness, the standard deviation can be made arbitrarily large by increasing exactly one observation (it has a breakdown point of 0, as it can be contaminated by a single point), a defect that is not shared by robust statistics. IQR and MAD One of the most common robus ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
R Package
R packages are extensions to the R statistical programming language. R packages contain code, data, and documentation in a standardised collection format that can be installed by users of R, typically via a centralised software repository such as CRAN (the Comprehensive R Archive Network). The large number of packages available for R, and the ease of installing and using them, has been cited as a major factor driving the widespread adoption of the language in data science. Compared to libraries in other programming language, R packages must conform to a relatively strict specification. The ''Writing R Extensions'' manual specifies a standard directory structure for R source code, data, documentation, and package metadata, which enables them to be installed and loaded using R's in-built package management tools. Packages distributed on CRAN must meet additional standards. According to John Chambers, whilst these requirements "impose considerable demands" on package developers, th ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |