Summary Statistic

picture info	Summary Statistic In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in * a measure of location, or central tendency, such as the arithmetic mean * a measure of statistical dispersion like the standard mean absolute deviation * a measure of the shape of the distribution like skewness or kurtosis * if more than one variable is measured, a measure of statistical dependence such as a correlation coefficient A common collection of order statistics used as summary statistics are the five-number summary, sometimes extended to a seven-number summary, and the associated box plot. Entries in an analysis of variance table can also be regarded as summary statistics. Examples Location Common measures of location, or central tendency, are the arithmetic mean, median, mode, and interquartile mean. Spread Common measures of stati ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Mode (statistics) In statistics, the mode is the value that appears most often in a set of data values. If is a discrete random variable, the mode is the value at which the probability mass function takes its maximum value (i.e., ). In other words, it is the value that is most likely to be sampled. Like the statistical mean and median, the mode is a way of expressing, in a (usually) single number, important information about a random variable or a population (statistics), population. The numerical value of the mode is the same as that of the mean and median in a normal distribution, and it may be very different in highly skewed distributions. The mode is not necessarily unique in a given discrete distribution since the probability mass function may take the same maximum value at several points , , etc. The most extreme case occurs in Uniform distribution (discrete), uniform distributions, where all values occur equally frequently. A mode of a continuous probability distribution is often conside ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal distribution (a distribution with a single peak), negative skew commonly indicates that the ''tail'' is on the left side of the distribution, and positive skew indicates that the tail is on the right. In cases where one tail is long but the other tail is fat, skewness does not obey a simple rule. For example, a zero value in skewness means that the tails on both sides of the mean balance out overall; this is the case for a symmetric distribution but can also be true for an asymmetric distribution where one tail is long and thin, and the other is short but fat. Thus, the judgement on the symmetry of a given distribution by using only its skewness is risky; the distribution shape must be taken into account. Introduction Consider the two d ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Percentiles In statistics, a ''k''-th percentile, also known as percentile score or centile, is a score (e.g., a data point) a given percentage ''k'' of all scores in its frequency distribution exists ("exclusive" definition) or a score a given percentage of the all scores exists ("inclusive" definition); i.e. a score in the ''k''-th percentile would be above approximately ''k''% of all scores in its set. For example, the 97th percentile of data is a data point below which 97% of all data points exist (by the exclusive definition). Percentiles depends on how scores are arranged. Percentiles are a type of quantiles, obtained adopting a subdivision into 100 groups. The 25th percentile is also known as the first '' quartile'' (''Q''1), the 50th percentile as the ''median'' or second quartile (''Q''2), and the 75th percentile as the third quartile (''Q''3). For example, the 50th percentile (median) is the score (or , depending on the definition) which 50% of the scores in the distribution are ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	L-moment In statistics, L-moments are a sequence of statistics used to summarize the shape of a probability distribution. They are linear combinations of order statistics ( L-statistics) analogous to conventional moments, and can be used to calculate quantities analogous to standard deviation, skewness and kurtosis, termed the L-scale, L-skewness and L-kurtosis respectively (the L-mean is identical to the conventional mean). Standardised L-moments are called L-moment ratios and are analogous to standardized moments. Just as for conventional moments, a theoretical distribution has a set of population L-moments. Sample L-moments can be defined for a sample from the population, and can be used as estimators of the population L-moments. Population L-moments For a random variable , the th population L-moment is \lambda_r = \frac \sum_^ (-1)^k \binom \operatorname X_\, , where denotes the th order statistic (th smallest value) in an independent sample of size from the distribution of a ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Gini Coefficient In economics, the Gini coefficient ( ), also known as the Gini index or Gini ratio, is a measure of statistical dispersion intended to represent the income distribution, income inequality, the wealth distribution, wealth inequality, or the consumption inequality within a nation or a social group. It was developed by Italian statistics, statistician and Sociology, sociologist Corrado Gini. The Gini coefficient measures the economic inequality, inequality among the values of a frequency distribution, such as income levels. A Gini coefficient of 0 reflects perfect equality, where all income or wealth values are the same. In contrast, a Gini coefficient of 1 (or 100%) reflects maximal inequality among values, where a single individual has all the income while all others have none. Corrado Gini proposed the Gini coefficient as a measure of social inequality, inequality of income inequality metrics, income or Wealth concentration, wealth. For Organisation for Economic Co-operatio ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Coefficient Of Variation In probability theory and statistics, the coefficient of variation (CV), also known as normalized root-mean-square deviation (NRMSD), percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is defined as the ratio of the standard deviation \sigma to the mean \mu (or its absolute value, , and often expressed as a percentage ("%RSD"). The CV or RSD is widely used in analytical chemistry to express the precision and repeatability of an assay. It is also commonly used in fields such as engineering or physics when doing quality assurance studies and ANOVA gauge R&R, by economists and investors in economic models, in epidemiology, and in psychology/neuroscience. Definition The coefficient of variation (CV) is defined as the ratio of the standard deviation \sigma to the mean \mu, CV = \frac. It shows the extent of variability in relation to the mean of the population. The coefficien ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Distance Standard Deviation In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vector In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge ...s of arbitrary, not necessarily equal, Euclidean vector, dimension. The population distance correlation coefficient is zero if and only if the random vectors are Independence (probability theory), independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables. Distance correlation can be used to perform a Statistical hypothesis testing, statistical test of dependence with a permutation test. One first co ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Mean Absolute Difference The mean absolute difference (univariate) is a Statistical dispersion#Measures of statistical dispersion, measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. A related statistic is the #Relative_mean_absolute_difference, relative mean absolute difference, which is the mean absolute difference divided by the arithmetic mean, and equal to twice the Gini coefficient. The mean absolute difference is also known as the absolute mean difference (not to be confused with the absolute value of the mean signed difference) and the Corrado Gini, Gini mean difference (GMD). The mean absolute difference is sometimes denoted by Δ or as MD. Definition The mean absolute difference is defined as the "average" or "mean", formally the expected value, of the absolute difference of two random variables ''X'' and ''Y'' Independent and identically distributed random variables, independently and identically distribut ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Absolute Deviation In mathematics and statistics, deviation serves as a measure to quantify the disparity between an observed value of a variable and another designated value, frequently the mean of that variable. Deviations with respect to the sample mean and the population mean (or "true value") are called ''errors'' and ''residuals'', respectively. The sign of the deviation reports the direction of that difference: the deviation is positive when the observed value exceeds the reference value. The absolute value of the deviation indicates the size or magnitude of the difference. In a given sample, there are as many deviations as sample points. Summary statistics can be derived from a set of deviations, such as the ''standard deviation'' and the '' mean absolute deviation'', measures of dispersion, and the '' mean signed deviation'', a measure of bias. The deviation of each data point is calculated by subtracting the mean of the data set from the individual data point. Mathematically, the de ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Interquartile Range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the difference between the 75th and 25th percentiles of the data. To calculate the IQR, the data set is divided into quartiles, or four rank-ordered even parts via linear interpolation. These quartiles are denoted by ''Q''1 (also called the lower quartile), ''Q''2 (the median), and ''Q''3 (also called the upper quartile). The lower quartile corresponds with the 25th percentile and the upper quartile corresponds with the 75th percentile, so IQR = ''Q''3 − ''Q''1. The IQR is an example of a trimmed estimator, defined as the 25% trimmed range, which enhances the accuracy of dataset statistics by dropping lower contribution, outlying points. It is also used as a robust measure of scale It can be clearly visualized by the box on a box plot. Use ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Range (statistics) In descriptive statistics, the range of a set of data is size of the narrowest interval which contains all the data. It is calculated as the difference between the largest and smallest values (also known as the sample maximum and minimum). It is expressed in the same units as the data. The range provides an indication of statistical dispersion. Closely related alternative measures are the Interdecile range and the Interquartile range. Range of continuous IID random variables For ''n'' independent and identically distributed continuous random variables ''X''1, ''X''2, ..., ''X''''n'' with the cumulative distribution function G(''x'') and a probability density function g(''x''), let T denote the range of them, that is, T= max(''X''1, ''X''2, ..., ''X''''n'')- min(''X''1, ''X''2, ..., ''X''''n''). Distribution The range, T, has the cumulative distribution function ::F(t)= n \int_^\infty g(x) (x+t)-G(x) \, \textx. Gumbel notes that the "beauty of this formula is com ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]