statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...

, a population is a

set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...

of similar items or events which is of interest for some question or

experiment An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs whe ...

. A statistical population can be a group of existing objects (e.g. the set of all

star A star is an astronomical object comprising a luminous spheroid of plasma (physics), plasma held together by its gravity. The List of nearest stars and brown dwarfs, nearest star to Earth is the Sun. Many other stars are visible to the naked ...

s within the

Milky Way The Milky Way is the galaxy that includes our Solar System, with the name describing the galaxy's appearance from Earth: a hazy band of light seen in the night sky formed from stars that cannot be individually distinguished by the naked eye. ...

galaxy A galaxy is a system of stars, stellar remnants, interstellar gas, dust, dark matter, bound together by gravity. The word is derived from the Greek ' (), literally 'milky', a reference to the Milky Way galaxy that contains the Solar System ...

) or a hypothetical and potentially

infinite Infinite may refer to: Mathematics * Infinite set, a set that is not a finite set *Infinity, an abstract concept describing something without any limit Music *Infinite (group), a South Korean boy band *''Infinite'' (EP), debut EP of American m ...

group of objects conceived as a generalization from experience (e.g. the set of all possible hands in a game of

poker Poker is a family of comparing card games in which players wager over which hand is best according to that specific game's rules. It is played worldwide, however in some places the rules may vary. While the earliest known form of the game wa ...

). A common aim of statistical analysis is to produce information about some chosen population. In

statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properti ...

, a subset of the population (a statistical ''

sample Sample or samples may refer to: Base meaning * Sample (statistics), a subset of a population – complete data set * Sample (signal), a digital discrete sample of a continuous analog signal * Sample (material), a specimen or small quantity of ...

'') is chosen to represent the population in a statistical analysis. Moreover, the statistical sample must be unbiased and accurately model the population (every unit of the population has an equal chance of selection). The ratio of the size of this statistical sample to the size of the population is called a ''

sampling fraction In sampling theory, the sampling fraction is the ratio of sample size to population size or, in the context of stratified sampling, the ratio of the sample size to the size of the stratum. The formula for the sampling fraction is :f=\frac, where ' ...

''. It is then possible to

estimate Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...

the '' population parameters'' using the appropriate sample statistics.

Mean

The population mean, or population

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...

, is a measure of the

central tendency In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.Weisberg H.F (1992) ''Central Tendency and Variability'', Sage University Paper Series on Quantitative Applications in ...

either of a

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...

or of a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

characterized by that distribution. In a

discrete probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...

of a random variable ''X'', the mean is equal to the sum over every possible value weighted by the probability of that value; that is, it is computed by taking the product of each possible value ''x'' of ''X'' and its probability ''p''(''x''), and then adding all these products together, giving

\mu = \sum x p(x)....

. An analogous formula applies to the case of a

continuous probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...

. Not every probability distribution has a defined mean (see the

Cauchy distribution The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) fun ...

for an example). Moreover, the mean can be infinite for some distributions. For a finite population, the population mean of a property is equal to the arithmetic mean of the given property, while considering every member of the population. For example, the population mean height is equal to the sum of the heights of every individual—divided by the total number of individuals. The ''

sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger popu ...

'' may differ from the population mean, especially for small samples. The

law of large numbers In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials shou ...

states that the larger the size of the sample, the more likely it is that the sample mean will be close to the population mean.Schaum's Outline of Theory and Problems of Probability by Seymour Lipschutz and Marc Lipson
p. 141
/ref>

Sub population

A subset of a population that shares one or more additional properties is called a ''sub population''. For example, if the population is all Egyptian people, a sub population is all Egyptian males; if the population is all pharmacies in the world, a sub population is all pharmacies in Egypt. By contrast, a sample is a subset of a population that is not chosen to share any additional property. Descriptive statistics may yield different results for different sub populations. For instance, a particular medicine may have different effects on different sub populations, and these effects may be obscured or dismissed if such special sub populations are not identified and examined in isolation. Similarly, one can often estimate parameters more accurately if one separates out sub populations: the distribution of heights among people is better modeled by considering men and women as separate sub populations, for instance. Populations consisting of sub populations can be modeled by

mixture model In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observatio ...

s, which combine the distributions within sub populations into an overall population distribution. Even if sub populations are well-modeled by given simple models, the overall population may be poorly fit by a given simple model – poor fit may be evidence for the existence of sub populations. For example, given two equal sub populations, both normally distributed, if they have the same standard deviation but different means, the overall distribution will exhibit low

kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurt ...

relative to a single normal distribution – the means of the sub populations fall on the shoulders of the overall distribution. If sufficiently separated, these form a

bimodal distribution In statistics, a multimodal distribution is a probability distribution with more than one mode. These appear as distinct peaks (local maxima) in the probability density function, as shown in Figures 1 and 2. Categorical, continuous, and d ...

; otherwise, it simply has a wide peak. Further, it will exhibit

overdispersion In statistics, overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given statistical model. A common task in applied statistics is choosing a parametric model to fit a ...

relative to a single normal distribution with the given variation. Alternatively, given two sub populations with the same mean but different standard deviations, the overall population will exhibit high kurtosis, with a sharper peak and heavier tails (and correspondingly shallower shoulders) than a single distribution.

References

External links

Statistical Terms Made Simple
{{statistics, collection Statistical theory

Mean

Sub population

See also

References

External links