Non-parametric
Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as in parametric statistics. Nonparametric statistics can be used for descriptive statistics or statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are evidently violated. Definitions The term "nonparametric statistics" has been defined imprecisely in the following two ways, among others: The first meaning of ''nonparametric'' involves techniques that do not rely on data belonging to any particular parametric family of probability distributions. These include, among others: * Methods which are ''distribution-free'', which do not rely on assumptions that the data are drawn from a given parametric family of probability distributions. * Statistics defined to be a function on a sample, without dependency on a pa ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Statistical Inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population. Inferential statistics can be contrasted with descriptive statistics. Descriptive statistics is solely concerned with properties of the observed data, and it does not rest on the assumption that the data come from a larger population. In machine learning, the term ''inference'' is sometimes used instead to mean "make a prediction, by evaluating an already trained model"; in this context inferring properties of the model is referred to as ''training'' or ''learning'' (rather than ''inference''), and using a model for prediction is referred to as ''inference'' (instead of ''prediction''); se ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
K-nearest Neighbors Algorithm
In statistics, the ''k''-nearest neighbors algorithm (''k''-NN) is a Non-parametric statistics, non-parametric supervised learning method. It was first developed by Evelyn Fix and Joseph Lawson Hodges Jr., Joseph Hodges in 1951, and later expanded by Thomas M. Cover, Thomas Cover. Most often, it is used for statistical classification, classification, as a ''k''-NN classifier, the output of which is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its ''k'' nearest neighbors (''k'' is a positive integer, typically small). If ''k'' = 1, then the object is simply assigned to the class of that single nearest neighbor. The ''k''-NN algorithm can also be generalized for regression analysis, regression. In ''-NN regression'', also known as ''nearest neighbor smoothing'', the output is the property value for the object. This value is the average of the values of ''k'' nearest neighbo ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Semiparametric Regression
In statistics, a semiparametric model is a statistical model that has parametric and nonparametric components. A statistical model is a parameterized family of distributions: \ indexed by a parameter \theta. * A parametric model is a model in which the indexing parameter \theta is a vector in k-dimensional Euclidean space Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces ..., for some nonnegative integer k.. Thus, \theta is finite-dimensional, and \Theta \subseteq \mathbb^k. * With a nonparametric model, the set of possible values of the parameter \theta is a subset of some space V, which is not necessarily finite-dimensional. For example, we might consider the set of all distributions with mean 0. Such spaces are vector spaces with topological structure, but may not be finit ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Kernel Density Estimation
In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on '' kernels'' as weights. KDE answers a fundamental data smoothing problem where inferences about the population are made based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form. One of the famous applications of kernel density estimation is in estimating the class-conditional marginal densities of data when using a naive Bayes classifier, which can improve its prediction accuracy. Definition Let be independent and identically distributed samples drawn from some univariate distribution with an unknown density at any given point . We are in ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Dirichlet Process
In probability theory, Dirichlet processes (after the distribution associated with Peter Gustav Lejeune Dirichlet) are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution. As an example, a bag of 100 real-world dice is a ''random probability mass function (random pmf)''—to sample this random pmf you put your hand in the bag and draw out a die, that is, you draw a pmf. A bag of dice manufactured using a crude process 100 years ago will likely have probabilities that deviate wildly from the uniform pmf, whereas a bag of state-of-the-art dice used by Las Vegas casinos may have barely perceptible imperfe ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Parametric Statistics
Parametric statistics is a branch of statistics which leverages models based on a fixed (finite) set of parameters. Conversely nonparametric statistics does not assume explicit (finite-parametric) mathematical forms for distributions when modeling data. However, it may make some assumptions about that distribution, such as continuity or symmetry, or even an explicit mathematical shape but have a model for a distributional parameter that is not itself finite-parametric. Most well-known statistical methods are parametric. Regarding nonparametric (and semiparametric) models, Sir David Cox has said, "These typically involve fewer assumptions of structure and distributional form but usually contain strong assumptions about independencies". Example The normal family of distributions all have the same general shape and are ''parameterized'' by mean and standard deviation. That means that if the mean and standard deviation are known and if the distribution is normal, the probability o ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Nonparametric Regression
Nonparametric regression is a form of regression analysis where the predictor does not take a predetermined form but is completely constructed using information derived from the data. That is, no parametric equation is assumed for the relationship between predictors and dependent variable. A larger sample size is needed to build a nonparametric model having a level of uncertainty as a parametric model because the data must supply both the model structure and the parameter estimates. Definition Nonparametric regression assumes the following relationship, given the random variables X and Y: : \mathbb \mid X=x= m(x), where m(x) is some deterministic function. Linear regression is a restricted case of nonparametric regression where m(x) is assumed to be a linear function of the data. Sometimes a slightly stronger assumption of additive noise is used: : Y = m(X) + U, where the random variable U is the `noise term', with mean 0. Without the assumption that m belongs to a specific ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Order Statistic
In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with Ranking (statistics), rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and non-parametric inference, inference. Important special cases of the order statistics are the minimum and maximum value of a sample, and (with some qualifications discussed below) the sample median and other quantile, sample quantiles. When using probability theory to analyze order statistics of random samples from a continuous probability distribution, continuous distribution, the cumulative distribution function is used to reduce the analysis to the case of order statistics of the uniform distribution (continuous), uniform distribution. Notation and examples For example, suppose that four numbers are observed or recorded, resulting in a sample of size 4. If the sample values are :6, 9, 3, 7, the order statistics would be denoted : ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Data Envelopment Analysis
Data envelopment analysis (DEA) is a nonparametric method in operations research and economics for the estimation of production frontiers.Charnes et al (1978) DEA has been applied in a large range of fields including international banking, economic sustainability, police department operations, and logistical applicationsCharnes et al (1995)Emrouznejad et al (2016)Thanassoulis (1995) Additionally, DEA has been used to assess the performance of natural language processing models, and it has found other applications within machine learning.Zhou et al (2022)Guerrero et al (2022) Description DEA is used to empirically measure productive efficiency of decision-making units (DMUs). Although DEA has a strong link to production theory in economics, the method is also used for benchmarking in operations management, whereby a set of measures is selected to benchmark the performance of manufacturing and service operations. In benchmarking, the efficient DMUs, as defined by DEA, may not nece ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Histogram
A histogram is a visual representation of the frequency distribution, distribution of quantitative data. To construct a histogram, the first step is to Data binning, "bin" (or "bucket") the range of values— divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping interval (mathematics), intervals of a variable. The bins (intervals) are adjacent and are typically (but not required to be) of equal size. Histograms give a rough sense of the density of the underlying distribution of the data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram used for probability density is always normalized to 1. If the length of the intervals on the ''x''-axis are all 1, then a histogram is identical to a relative frequency plot. Histograms are sometimes confused with bar charts. In a his ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Multivariate Analysis
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., '' multivariate random variables''. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied. In addition, multivariate statistics is concerned with multivariate probability distributions, in terms of both :*how these can be used to represent the distributions of observed data; :*how they can be used as part of statistical inference, particularly where several different quantities are of interest to the same analysis. Certain types of problems involving multivariate da ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |