An empirical statistical law or (in popular terminology) a law of statistics represents a type of behaviour that has been found across a number of

dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record o ...

s and, indeed, across a range of types of data sets. Many of these observances have been formulated and proved as

statistical Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

probabilistic Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

theorems and the term "law" has been carried over to these theorems. There are other statistical and probabilistic theorems that also have "law" as a part of their names that have not obviously derived from empirical observations. However, both types of "law" may be considered instances of a

scientific law Scientific laws or laws of science are statements, based on repeated experiments or observations, that describe or predict a range of natural phenomena. The term ''law'' has diverse usage in many cases (approximate, accurate, broad, or narrow ...

in the field of statistics. What distinguishes an empirical statistical law from a formal statistical theorem is the way these patterns simply appear in natural distributions, without a prior theoretical reasoning about the data.

Examples

There are several such popular "laws of statistics". The Pareto principle is a popular example of such a "law". It states that roughly 80% of the effects come from 20% of the causes, and is thus also known as the 80/20 rule. In business, the 80/20 rule says that 80% of your business comes from just 20% of your customers. In software engineering, it is often said that 80% of the errors are caused by just 20% of the bugs. 20% of the world creates roughly 80% of worldwide GDP. 80% of healthcare expenses in the US are caused by 20% of the population.

Zipf's law Zipf's law (; ) is an empirical law stating that when a list of measured values is sorted in decreasing order, the value of the -th entry is often approximately inversely proportional to . The best known instance of Zipf's law applies to the ...

, described as an "empirical statistical law" of

linguistics Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...

, is another example. According to the "law", given some dataset of text, the frequency of a word is inversely proportional to its frequency rank. In other words, the second most common word should appear about half as often as the most common word, and the fifth most common world would appear about once every five times the most common word appears. However, what sets Zipf's law as an "empirical statistical law" rather than just a theorem of linguistics is that it applies to phenomena outside of its field, too. For example, a ranked list of US metropolitan populations also follow Zipf's law, and even

forgetting Forgetting or disremembering is the apparent loss or modification of information already encoded and stored in an individual's short or long-term memory. It is a spontaneous or gradual process in which old memories are unable to be recalled from ...

follows Zipf's law. This act of summarizing several natural data patterns with simple rules is a defining characteristic of these "empirical statistical laws". Examples of empirically inspired statistical laws that have a firm theoretical basis include: :* Statistical regularity :*

Law of large numbers In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...

Law of truly large numbers The law of truly large numbers (a statistical adage), attributed to Persi Diaconis and Frederick Mosteller, states that with a large enough number of independent samples, any highly implausible (i.e., unlikely in any single sample, but with con ...

Central limit theorem In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...

Regression toward the mean In statistics, regression toward the mean (also called regression to the mean, reversion to the mean, and reversion to mediocrity) is the phenomenon where if one sample of a random variable is extreme, the next sampling of the same random var ...

Examples of "laws" with a weaker foundation include: :*

Safety in numbers Safety in numbers is the hypothesis that, by being part of a large physical group or mass, an individual is less likely to be the victim of a mishap, accident, attack, or other bad event. Some related theories also argue (and can show statistica ...

:* Benford's law Examples of "laws" which are more general observations than having a theoretical background: :*

Rank–size distribution Rank–size distribution is the distribution of size by rank, in decreasing order of size. For example, if a data set consists of items of sizes 5, 100, 5, and 8, the rank-size distribution is 100, 8, 5, 5 (ranks 1 through 4). This is also known ...

Examples of supposed "laws" which are incorrect include: :* Law of averages

Notes

References

*Kitcher, P., Salmon, W.C. (Editors) (2009) ''Scientific Explanation''. University of Minnesota Press. *Gelbukh, A., Sidorov, G. (2008). Zipf and Heaps Laws’ Coefficients Depend on Language. In:''Computational Linguistics and Intelligent Text Processing'' (pp. 332–335), Springer. {{ISBN, 978-3-540-41687-6
link to abstract
Statistical laws

Examples

See also

Notes

References