The Pearson distribution is a family of continuous

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

s. It was first published by

Karl Pearson Karl Pearson (; born Carl Pearson; 27 March 1857 – 27 April 1936) was an English biostatistician and mathematician. He has been credited with establishing the discipline of mathematical statistics. He founded the world's first university ...

in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on

biostatistics Biostatistics (also known as biometry) is a branch of statistics that applies statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experimen ...

History

The Pearson system was originally devised in an effort to model visibly skewed observations. It was well known at the time how to adjust a theoretical model to fit the first two

cumulant In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will have ...

s or moments of observed data: Any

can be extended straightforwardly to form a location-scale family. Except in pathological cases, a location-scale family can be made to fit the observed

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

(first cumulant) and

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

(second cumulant) arbitrarily well. However, it was not known how to construct probability distributions in which the

skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal ...

(standardized third cumulant) and

kurtosis In probability theory and statistics, kurtosis (from , ''kyrtos'' or ''kurtos'', meaning "curved, arching") refers to the degree of “tailedness” in the probability distribution of a real-valued random variable. Similar to skewness, kurtos ...

(standardized fourth cumulant) could be adjusted equally freely. This need became apparent when trying to fit known theoretical models to observed data that exhibited skewness. Pearson's examples include survival data, which are usually asymmetric. In his original paper, Pearson (1895, p. 360) identified four types of distributions (numbered I through IV) in addition to the

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

(which was originally known as type V). The classification depended on whether the distributions were supported on a bounded interval, on a half-line, or on the whole

real line A number line is a graphical representation of a straight line that serves as spatial representation of numbers, usually graduated like a ruler with a particular origin (geometry), origin point representing the number zero and evenly spaced mark ...

; and whether they were potentially skewed or necessarily symmetric. A second paper (Pearson 1901) fixed two omissions: it redefined the type V distribution (originally just the

, but now the inverse-gamma distribution) and introduced the type VI distribution. Together the first two papers cover the five main types of the Pearson system (I, III, IV, V, and VI). In a third paper, Pearson (1916) introduced further special cases and subtypes (VII through XII). Rhind (1909, pp. 430–432) devised a simple way of visualizing the parameter space of the Pearson system, which was subsequently adopted by Pearson (1916, plate 1 and pp. 430ff., 448ff.). The Pearson types are characterized by two quantities, commonly referred to as β₁ and β₂. The first is the square of the

: β₁ = γ₁² where γ₁ is the skewness, or third

standardized moment In probability theory and statistics, a standardized moment of a probability distribution is a moment (often a higher degree central moment) that is normalized, typically by a power of the standard deviation, rendering the moment scale invariant ...

. The second is the traditional

, or fourth standardized moment: β₂ = γ₂ + 3. (Modern treatments define kurtosis γ₂ in terms of cumulants instead of moments, so that for a normal distribution we have γ₂ = 0 and β₂ = 3. Here we follow the historical precedent and use β₂.) The diagram shows which Pearson type a given concrete distribution (identified by a point (β₁, β₂)) belongs to. Many of the skewed or non-

mesokurtic In probability theory and statistics, kurtosis (from , ''kyrtos'' or ''kurtos'', meaning "curved, arching") refers to the degree of “tailedness” in the probability distribution of a real-valued random variable. Similar to skewness, kurtosis ...

distributions familiar to statisticians today were still unknown in the early 1890s. What is now known as the

beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...

had been used by

Thomas Bayes Thomas Bayes ( , ; 7 April 1761) was an English statistician, philosopher and Presbyterian minister who is known for formulating a specific case of the theorem that bears his name: Bayes' theorem. Bayes never published what would become his m ...

as a

posterior distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior ...

of the parameter of a

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with pro ...

in his 1763 work on

inverse probability In probability theory, inverse probability is an old term for the probability distribution of an unobserved variable. Today, the problem of determining an unobserved variable (by whatever method) is called inferential statistics. The method of i ...

. The beta distribution gained prominence due to its membership in Pearson's system and was known until the 1940s as the Pearson type I distribution. (Pearson's type II distribution is a special case of type I, but is usually no longer singled out.) The gamma distribution originated from Pearson's work (Pearson 1893, p. 331; Pearson 1895, pp. 357, 360, 373–376) and was known as the Pearson type III distribution, before acquiring its modern name in the 1930s and 1940s. Pearson's 1895 paper introduced the type IV distribution, which contains Student's ''t''-distribution as a special case, predating

William Sealy Gosset William Sealy Gosset (13 June 1876 – 16 October 1937) was an English statistician, chemist and brewer who worked for Guinness. In statistics, he pioneered small sample experimental design. Gosset published under the pen name Student and develo ...

's subsequent use by several years. His 1901 paper introduced the inverse-gamma distribution (type V) and the beta prime distribution (type VI).

Definition

A Pearson

density Density (volumetric mass density or specific mass) is the ratio of a substance's mass to its volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' (or ''d'') can also be u ...

''p'' is defined to be any valid solution to the differential equation (cf. Pearson 1895, p. 381) :

\frac + \frac = 0. \qquad (1)

with: :

b_2 &= \frac. \end

According to Ord, Pearson devised the underlying form of Equation (1) on the basis of, firstly, the formula for the derivative of the logarithm of the density function of the

(which gives a linear function) and, secondly, from a recurrence relation for values in the

probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...

of the

hypergeometric distribution In probability theory and statistics, the hypergeometric distribution is a Probability distribution#Discrete probability distribution, discrete probability distribution that describes the probability of k successes (random draws for which the ...

(which yields the linear-divided-by-quadratic structure). In Equation (1), the parameter ''a'' determines a

stationary point In mathematics, particularly in calculus, a stationary point of a differentiable function of one variable is a point on the graph of a function, graph of the function where the function's derivative is zero. Informally, it is a point where the ...

, and hence under some conditions a mode of the distribution, since :

p'(\mu-a) = 0

follows directly from the differential equation. Since we are confronted with a first-order linear differential equation with variable coefficients, its solution is straightforward: :

p(x) \propto \exp\left( -\int\frac \,dx \right).

The integral in this solution simplifies considerably when certain special cases of the integrand are considered. Pearson (1895, p. 367) distinguished two main cases, determined by the sign of the

discriminant In mathematics, the discriminant of a polynomial is a quantity that depends on the coefficients and allows deducing some properties of the zero of a function, roots without computing them. More precisely, it is a polynomial function of the coef ...

(and hence the number of real

root In vascular plants, the roots are the plant organ, organs of a plant that are modified to provide anchorage for the plant and take in water and nutrients into the plant body, which allows plants to grow taller and faster. They are most often bel ...

s) of the

quadratic function In mathematics, a quadratic function of a single variable (mathematics), variable is a function (mathematics), function of the form :f(x)=ax^2+bx+c,\quad a \ne 0, where is its variable, and , , and are coefficients. The mathematical expression, e ...

f(x) = b_2x^2 + b_1x + b_0. \qquad (2)

Particular types of distribution

Case 1, negative discriminant

The Pearson type IV distribution

If the discriminant of the quadratic function (2) is negative (

b_1^2 - 4 b_2 b_0 < 0

), it has no real roots. Then define :

\alpha &= \frac. \end

Observe that is a well-defined real number and , because by assumption

4 b_2 b_0 - b_1^2 > 0

and therefore . Applying these substitutions, the quadratic function (2) is transformed into :

f(x) = b_2(y^2 + \alpha^2).

The absence of real roots is obvious from this formulation, because α² is necessarily positive. We now express the solution to the differential equation (1) as a function of ''y'': :

p(y) \propto \exp\left(- \frac \int\frac \,dy  \right).

Pearson (1895, p. 362) called this the "trigonometrical case", because the integral :

\int\frac \,dy = \frac \ln(y^2 + \alpha^2) - \frac\arctan\left(\frac\right) + C_0

involves the inverse trigonometric arctan function. Then :

p(y) \propto \exp\left -\frac \ln\left(1+\frac\right) -\frac +\frac \arctan\left(\frac\right) + C_1 \right

Finally, let :

\nu &= -\frac. \end

Applying these substitutions, we obtain the parametric function: :

p(y) \propto  \left + \frac\right \exp\left \nu \arctan\left(\frac\right) \right

This unnormalized density has support on the entire

. It depends on a

scale parameter In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution. Definition If a family ...

α > 0 and

shape parameter In probability theory and statistics, a shape parameter (also known as form parameter) is a kind of numerical parameter of a parametric family of probability distributionsEveritt B.S. (2002) Cambridge Dictionary of Statistics. 2nd Edition. CUP. th ...

s ''m'' > 1/2 and ''ν''. One parameter was lost when we chose to find the solution to the differential equation (1) as a function of ''y'' rather than ''x''. We therefore reintroduce a fourth parameter, namely the

location parameter In statistics, a location parameter of a probability distribution is a scalar- or vector-valued parameter x_0, which determines the "location" or shift of the distribution. In the literature of location parameter estimation, the probability distr ...

''λ''. We have thus derived the density of the Pearson type IV distribution: :

p(x) = \frac
\left + \left(\frac\right)^2 \right \exp\left \nu \arctan\left(\frac \alpha \right)\right

The

normalizing constant In probability theory, a normalizing constant or normalizing factor is used to reduce any probability function to a probability density function with total probability of one. For example, a Gaussian function can be normalized into a probabilit ...

involves the

complex Complex commonly refers to: * Complexity, the behaviour of a system whose components interact in multiple ways so possible interactions are difficult to describe ** Complex system, a system composed of many components which may interact with each ...

Gamma function In mathematics, the gamma function (represented by Γ, capital Greek alphabet, Greek letter gamma) is the most common extension of the factorial function to complex numbers. Derived by Daniel Bernoulli, the gamma function \Gamma(z) is defined ...

(Γ) and the

Beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^ ...

(B). Notice that the

''λ'' here is not the same as the original location parameter introduced in the general formulation, but is related via :

\lambda = \lambda_ + \frac.

The Pearson type VII distribution

The shape parameter ''ν'' of the Pearson type IV distribution controls its

. If we fix its value at zero, we obtain a symmetric three-parameter family. This special case is known as the Pearson type VII distribution (cf. Pearson 1916, p. 450). Its density is :

p(x) = \frac \left + \left(\frac \alpha \right)^2 \right,

where B is the

. An alternative parameterization (and slight specialization) of the type VII distribution is obtained by letting :

\alpha = \sigma\sqrt,

which requires ''m'' > 3/2. This entails a minor loss of generality but ensures that the

of the distribution exists and is equal to σ². Now the parameter ''m'' only controls the

of the distribution. If ''m'' approaches infinity as ''λ'' and ''σ'' are held constant, the

arises as a special case: :

\end

This is the density of a normal distribution with mean ''λ'' and standard deviation ''σ''. It is convenient to require that ''m'' > 5/2 and to let :

m = \frac52 + \frac.

This is another specialization, and it guarantees that the first four moments of the distribution exist. More specifically, the Pearson type VII distribution parameterized in terms of (λ, σ, γ₂) has a mean of ''λ'',

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

of ''σ'',

of zero, and positive excess kurtosis of γ₂.

Student's ''t''-distribution

The Pearson type VII distribution is equivalent to the non-standardized Student's ''t''-distribution with parameters ν > 0, μ, σ² by applying the following substitutions to its original parameterization: :

m &= \frac, \end

Observe that the constraint is satisfied. The resulting density is :

p(x\mid\mu,\sigma^2,\nu) = \frac \left(1+\frac\frac\right)^,

which is easily recognized as the density of a Student's ''t''-distribution. This implies that the Pearson type VII distribution subsumes the standard Student's ''t''-distribution and also the standard

Cauchy distribution The Cauchy distribution, named after Augustin-Louis Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) ...

. In particular, the standard Student's ''t''-distribution arises as a subcase, when ''μ'' = 0 and ''σ''² = 1, equivalent to the following substitutions: :

m &= \frac, \end

The density of this restricted one-parameter family is a standard Student's ''t'': :

p(x) = \frac \left(1 + \frac \right)^,

Case 2, non-negative discriminant

If the quadratic function (2) has a non-negative discriminant (

b_1^2 - 4 b_2 b_0 \geq 0

), it has real roots ''a''₁ and ''a''₂ (not necessarily distinct): :

a_2 &= \frac. \end

In the presence of real roots the quadratic function (2) can be written as :

f(x) = b_2(x-a_1)(x-a_2),

and the solution to the differential equation is therefore :

p(x) \propto \exp\left( -\frac \int\frac \,dx \right).

Pearson (1895, p. 362) called this the "logarithmic case", because the integral :

\int\frac \,dx = \frac + C

involves only the

logarithm In mathematics, the logarithm of a number is the exponent by which another fixed value, the base, must be raised to produce that number. For example, the logarithm of to base is , because is to the rd power: . More generally, if , the ...

function and not the arctan function as in the previous case. Using the substitution :

\nu = \frac,

we obtain the following solution to the differential equation (1): :

p(x) \propto (x-a_1)^ (x-a_2)^.

Since this density is only known up to a hidden constant of proportionality, that constant can be changed and the density written as follows: :

p(x) \propto \left(1-\frac\right)^ \left(1-\frac\right)^.

The Pearson type I distribution

The Pearson type I distribution (a generalization of the

beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...

to more general finite region of support) arises when the roots of the quadratic equation (2) are of opposite sign, that is,

a_1 < 0 < a_2

. Then the solution ''p'' is supported on the interval

(a_1, a_2)

. Apply the substitution :

x = a_1 + y (a_2 - a_1),

where

0, which yields a solution in terms of ''y'' that is supported on the interval (0, 1):
: p(y) \propto \left(\fracy\right)^ \left(\frac(1-y)\right)^. One may define:
: \begin
m_1 &= \frac, \\ pt m_2 &= \frac.
\end Regrouping constants and parameters, this simplifies to
: p(y) \propto y^ (1-y)^, Thus \frac follows a

beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...

\Beta(m_1+1,m_2+1)

with

\lambda=\mu_1-(a_2-a_1) \frac-a_1

. It turns out that ''m''₁, ''m''₂ > −1 is necessary and sufficient for ''p'' to be a proper probability density function.

The Pearson type II distribution

The Pearson type II distribution is a special case of the Pearson type I family restricted to symmetric distributions. Using formulae from the type I section, with

m_1 = m_2 = m

and

-a_1 = a_2 = a

, on the interval (−a, a) it can be written as: :

p(x)   \propto  \left(1-\frac\right)^m.

Or with :

x = -a + 2 y a,

y

is distributed according to the

beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...

on the interval (0, 1), :

p(y) \propto    \left(1 - 4 \left(y - \frac 1 2\right)^2\right)^m    \propto y^ (1 - y)^.

with appropriate constant of proportionality the PDF becomes :

p(y) = y^ (1-y)^ \frac.

The Pearson type III distribution

Defining :

\lambda= \mu_1 + \frac - (m+1) b_1,

b_0+b_1 (x-\lambda)

\operatorname(m+1,b_1^2)

. The Pearson type III distribution is a gamma distribution or

chi-squared distribution In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...

The Pearson type V distribution

Defining new parameters: :

\begin
C_1 &= \frac, \\
\lambda &= \mu_1-\frac ,
\end

x-\lambda

follows an

\operatorname(\frac-1,\frac)

. The Pearson type V distribution is an inverse-gamma distribution.

The Pearson type VI distribution

Defining :

\lambda=\mu_1 + (a_2-a_1) \frac - a_2,

\frac

follows a

\beta^(m_2+1,-m_2-m_1-1)

. The Pearson type VI distribution is a beta prime distribution or ''F''-distribution.

Relation to other distributions

The Pearson family subsumes the following distributions, among others: *

Beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...

(types I and II) * Beta prime distribution (type VI) *

(type IV) *

Chi-squared distribution In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...

(type III) *

Continuous uniform distribution In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that li ...

(limit of type I) *

Exponential distribution In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuousl ...

(type III) * Gamma distribution (type III) * ''F''-distribution (type VI) * Inverse-chi-squared distribution (type V) * Inverse-gamma distribution (type V) *

Normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

(limit of type I, III, IV, V, or VI) * Student's ''t''-distribution (type VII, which is the non-skewed subtype of type IV) Alternatives to the Pearson system of distributions for the purpose of fitting distributions to data are the quantile-parameterized distributions (QPDs) and the metalog distributions. QPDs and metalogs can provide greater shape and bounds flexibility than the Pearson system. Instead of fitting moments, QPDs are typically fit to empirical CDF or other data with linear least squares. Examples of modern alternatives to the Pearson skewness-vs-kurtosis diagram are: (i) https://github.com/SchildCode/PearsonPlot and (ii) the "Cullen and Frey graph" in the statistical application R.

Applications

These models are used in financial markets, given their ability to be parametrized in a way that has intuitive meaning for market traders. A number of models are in current use that capture the stochastic nature of the volatility of rates, stocks, etc., and this family of distributions may prove to be one of the more important. In the United States, the Log-Pearson III is the default distribution for flood frequency analysis. Recently, there have been alternatives developed to the Pearson distributions that are more flexible and easier to fit to data. See the metalog distributions.

Notes

Sources

Primary sources

* * * * *

Secondary sources

* Milton Abramowitz and Irene A. Stegun (1964). '' Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables''.

National Bureau of Standards The National Institute of Standards and Technology (NIST) is an agency of the United States Department of Commerce whose mission is to promote American innovation and industrial competitiveness. NIST's activities are organized into physical sc ...

. * Eric W. Weisstein et al
Pearson Type III Distribution
From

MathWorld ''MathWorld'' is an online mathematics reference work, created and largely written by Eric W. Weisstein. It is sponsored by and licensed to Wolfram Research, Inc. and was partially funded by the National Science Foundation's National Science ...

References

*Elderton, Sir W.P, Johnson, N.L. (1969) ''Systems of Frequency Curves''. Cambridge University Press. *Ord J.K. (1972) ''Families of Frequency Distributions''. Griffin, London. {{DEFAULTSORT:Pearson Distribution Continuous distributions Systems of probability distributions