HOME

TheInfoList



OR:

In
probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
and
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the Zipf–Mandelbrot law is a
discrete probability distribution In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spa ...
. Also known as the Pareto–Zipf law, it is a
power-law In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a relative change in the other quantity proportional to the change raised to a constant exponent: one quantity var ...
distribution on ranked data, named after the
linguist Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
George Kingsley Zipf, who suggested a simpler distribution called
Zipf's law Zipf's law (; ) is an empirical law stating that when a list of measured values is sorted in decreasing order, the value of the -th entry is often approximately inversely proportional to . The best known instance of Zipf's law applies to the ...
, and the mathematician
Benoit Mandelbrot Benoit B. Mandelbrot (20 November 1924 – 14 October 2010) was a Polish-born French-American mathematician and polymath with broad interests in the practical sciences, especially regarding what he labeled as "the art of roughness" of phy ...
, who subsequently generalized it. The
probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
is given by : f(k; N, q, s) = \frac \frac, where H_ is given by : H_ = \sum_^N \frac, which may be thought of as a generalization of a
harmonic number In mathematics, the -th harmonic number is the sum of the reciprocals of the first natural numbers: H_n= 1+\frac+\frac+\cdots+\frac =\sum_^n \frac. Starting from , the sequence of harmonic numbers begins: 1, \frac, \frac, \frac, \frac, \dot ...
. In the formula, k is the rank of the data, and q and s are parameters of the distribution. In the limit as N approaches infinity, this becomes the Hurwitz zeta function \zeta(s, q). For finite N and q = 0 the Zipf–Mandelbrot law becomes
Zipf's law Zipf's law (; ) is an empirical law stating that when a list of measured values is sorted in decreasing order, the value of the -th entry is often approximately inversely proportional to . The best known instance of Zipf's law applies to the ...
. For infinite N and q = 0 it becomes a zeta distribution.


Applications

The distribution of words ranked by their
frequency Frequency is the number of occurrences of a repeating event per unit of time. Frequency is an important parameter used in science and engineering to specify the rate of oscillatory and vibratory phenomena, such as mechanical vibrations, audio ...
in a random
text corpus In linguistics and natural language processing, a corpus (: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated. Annotated, they have been used in corp ...
is approximated by a
power-law In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a relative change in the other quantity proportional to the change raised to a constant exponent: one quantity var ...
distribution, known as
Zipf's law Zipf's law (; ) is an empirical law stating that when a list of measured values is sorted in decreasing order, the value of the -th entry is often approximately inversely proportional to . The best known instance of Zipf's law applies to the ...
. If one plots the frequency rank of words contained in a moderately sized corpus of text data versus the number of occurrences or actual frequencies, one obtains a
power-law In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a relative change in the other quantity proportional to the change raised to a constant exponent: one quantity var ...
distribution, with
exponent In mathematics, exponentiation, denoted , is an operation involving two numbers: the ''base'', , and the ''exponent'' or ''power'', . When is a positive integer, exponentiation corresponds to repeated multiplication of the base: that is, i ...
close to one (but see Powers, 1998 and Gelbukh & Sidorov, 2001). Zipf's law implicitly assumes a fixed vocabulary size, but the Harmonic series with ''s'' = 1 does not converge, while the Zipf–Mandelbrot generalization with ''s'' > 1 does. Furthermore, there is evidence that the closed class of functional words that define a language obeys a Zipf–Mandelbrot distribution with different parameters from the open classes of contentive words that vary by topic, field and register. In ecological field studies, the relative abundance distribution (i.e. the graph of the number of species observed as a function of their abundance) is often found to conform to a Zipf–Mandelbrot law. Within music, many metrics of measuring "pleasing" music conform to Zipf–Mandelbrot distributions.


Notes


References

* Reprinted as ** * * * Van Droogenbroeck F. J. (2019)
"An essential rephrasing of the Zipf–Mandelbrot law to solve authorship attribution applications by Gaussian statistics"


External links


Z. K. Silagadze: Citations and the Zipf–Mandelbrot's law






* ttps://github.com/gkohri/discreteRNG C++ Library for generating random Zipf–Mandelbrot deviates. {{DEFAULTSORT:Zipf-Mandelbrot Law Discrete distributions Power laws Computational linguistics Quantitative linguistics Corpus linguistics