probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

and statistics, the Gumbel distribution (also known as the type-I

generalized extreme value distribution In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known ...

) is used to model the distribution of the maximum (or the minimum) of a number of samples of various distributions. This distribution might be used to represent the distribution of the maximum level of a river in a particular year if there was a list of maximum values for the past ten years. It is useful in predicting the chance that an extreme earthquake, flood or other natural disaster will occur. The potential applicability of the Gumbel distribution to represent the distribution of maxima relates to

extreme value theory Extreme value theory or extreme value analysis (EVA) is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, the ...

, which indicates that it is likely to be useful if the distribution of the underlying sample data is of the normal or exponential type. ''This article uses the Gumbel distribution to model the distribution of the maximum value''. ''To model the minimum value, use the negative of the original values.'' The Gumbel distribution is a particular case of the

(also known as the Fisher-Tippett distribution). It is also known as the ''log-

Weibull distribution In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice R ...

'' and the ''double exponential distribution'' (a term that is alternatively sometimes used to refer to the

Laplace distribution In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two expo ...

). It is related to the

Gompertz distribution In probability and statistics, the Gompertz distribution is a continuous probability distribution, named after Benjamin Gompertz. The Gompertz distribution is often applied to describe the distribution of adult lifespans by demographers and act ...

: when its density is first reflected about the origin and then restricted to the positive half line, a Gompertz function is obtained. In the

latent variable In statistics, latent variables (from Latin: present participle of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or me ...

formulation of the

multinomial logit In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the pro ...

model — common in

discrete choice In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Su ...

theory — the errors of the latent variables follow a Gumbel distribution. This is useful because the difference of two Gumbel-distributed

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...

s has a

logistic distribution Logistic may refer to: Mathematics * Logistic function, a sigmoid function used in many fields ** Logistic map, a recurrence relation that sometimes exhibits chaos ** Logistic regression, a statistical model using the logistic function ** Logit ...

. The Gumbel distribution is named after

Emil Julius Gumbel Emil Julius Gumbel (18 July 1891, in Munich – 10 September 1966, in New York City) was a German mathematician and political writer. Gumbel specialised in mathematical statistics and, along with Leonard Tippett and Ronald Fisher, was instru ...

(1891–1966), based on his original papers describing the distribution.

Definitions

The

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...

of the Gumbel distribution is :

F(x;\mu,\beta) = e^.\,

Standard Gumbel distribution

The standard Gumbel distribution is the case where

\mu = 0

and

\beta = 1

with cumulative distribution function :

F(x) = e^\,

and probability density function :

f(x) = e^.

In this case the mode is 0, the median is

-\ln(\ln(2)) \approx 0.3665

, the mean is

\gamma\approx 0.5772

(the Euler–Mascheroni constant), and the standard deviation is

\pi/\sqrt \approx 1.2825.

The cumulants, for n>1, are given by :

\kappa_n = (n-1)! \zeta(n).

Properties

The mode is μ, while the median is

\mu-\beta \ln\left(\ln 2\right),

and the mean is given by :

\operatorname(X)=\mu+\gamma\beta

, where

\gamma

is the

Euler-Mascheroni constant Euler's constant (sometimes also called the Euler–Mascheroni constant) is a mathematical constant usually denoted by the lowercase Greek letter gamma (). It is defined as the limiting difference between the harmonic series and the natural ...

. The standard deviation

\sigma

\beta \pi/\sqrt

hence

\beta = \sigma \sqrt / \pi \approx 0.78 \sigma.

At the mode, where

x = \mu

, the value of

F(x;\mu,\beta)

becomes

e^ \approx 0.37

, irrespective of the value of

\beta.

Related distributions

* If

X

has a Gumbel distribution, then the conditional distribution of ''Y=−X'' given that ''Y'' is positive, or equivalently given that ''X'' is negative, has a

. The cdf ''G'' of ''Y'' is related to ''F'', the cdf of ''X'', by the formula

G(y) = P(Y \le y) = P(X \ge -y ,  X \le 0) = (F(0)-F(-y))/F(0)

for ''y''>0. Consequently, the densities are related by

g(y) = f(-y)/F(0)

: the Gompertz density is proportional to a reflected Gumbel density, restricted to the positive half-line. * If ''X'' is an exponentially distributed variable with mean 1, then −log(''X'') has a standard Gumbel distribution. * If

X \sim \mathrm(\alpha_X, \beta)

and

Y \sim \mathrm(\alpha_Y, \beta)

are independent, then

X-Y \sim \mathrm(\alpha_X-\alpha_Y,\beta) \,

(see

Logistic distribution Logistic may refer to: Mathematics * Logistic function, a sigmoid function used in many fields ** Logistic map, a recurrence relation that sometimes exhibits chaos ** Logistic regression, a statistical model using the logistic function ** Logit ...

). * If

X, Y \sim \mathrm(\alpha, \beta)

are independent, then

X+Y \nsim \mathrm(2 \alpha,\beta)

. Note that

E(X+Y) = 2\alpha+2\beta\gamma \neq 2\alpha = E\left(\mathrm(2 \alpha,\beta) \right)

. More generally, the distribution of linear combinations of independent Gumbel random variables can be approximated by GNIG and GIG distributions. Theory related to the

generalized multivariate log-gamma distribution In probability theory and statistics, the generalized multivariate log-gamma (G-MVLG) distribution is a multivariate distribution introduced by Demirhan and Hamurkaroglu in 2011. The G-MVLG is a flexible distribution. Skewness and kurtosis are well ...

provides a multivariate version of the Gumbel distribution.

Occurrence and applications

Gumbel has shown that the maximum value (or last

order statistic In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference. Importa ...

) in a sample of

s following an

exponential distribution In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant averag ...

minus the natural logarithm of the sample size approaches the Gumbel distribution as the sample size increases. Concretely, let

\rho(x)=e^

be the probability distribution of

x

and

Q(x)=1- e^

its cumulative distribution. Then the maximum value out of

N

realizations of

x

is smaller than

X

if and only if all realizations are smaller than

X

. So the cumulative distribution of the maximum value

\tilde

satisfies :

N=\left(1- \frac\right)^N

, and, for large

N

, the right-hand-side converges to

e^.

hydrology Hydrology () is the scientific study of the movement, distribution, and management of water on Earth and other planets, including the water cycle, water resources, and environmental watershed sustainability. A practitioner of hydrology is calle ...

, therefore, the Gumbel distribution is used to analyze such variables as monthly and annual maximum values of daily rainfall and river discharge volumes, and also to describe droughts. Gumbel has also shown that the

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

for the probability of an event — where ''r'' is the rank number of the observed value in the data series and ''n'' is the total number of observations — is an

unbiased estimator In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In st ...

of the

cumulative probability In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Eve ...

around the mode of the distribution. Therefore, this estimator is often used as a

plotting position Plot or Plotting may refer to: Art, media and entertainment * Plot (narrative), the story of a piece of fiction Music * ''The Plot'' (album), a 1976 album by jazz trumpeter Enrico Rava * The Plot (band), a band formed in 2003 Other * ''Plot' ...

. In

number theory Number theory (or arithmetic or higher arithmetic in older usage) is a branch of pure mathematics devoted primarily to the study of the integers and integer-valued functions. German mathematician Carl Friedrich Gauss (1777–1855) said, "Math ...

, the Gumbel distribution approximates the number of terms in a random

partition of an integer In number theory and combinatorics, a partition of a positive integer , also called an integer partition, is a way of writing as a sum of positive integers. Two sums that differ only in the order of their summands are considered the same parti ...

as well as the trend-adjusted sizes of maximal

prime gaps A prime gap is the difference between two successive prime numbers. The ''n''-th prime gap, denoted ''g'n'' or ''g''(''p'n'') is the difference between the (''n'' + 1)-th and the ''n''-th prime numbers, i.e. :g_n = p_ - p_n.\ W ...

and maximal gaps between

prime constellations In number theory, a prime -tuple is a finite collection of values representing a repeatable pattern of differences between prime numbers. For a -tuple , the positions where the -tuple matches a pattern in the prime numbers are given by the set o ...

Gumbel reparametrization tricks

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

, the Gumbel distribution is sometimes employed to generate samples from the

categorical distribution In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution) is a discrete probability distribution that describes the possible results of a random variable that ca ...

. This technique is called "Gumbel-max trick" and is a special example of " reparametrization tricks". In detail, let

(\pi_1, ..., \pi_n)

be nonnegative, and not all zero, and let

g_1,... , g_n

be independent samples of Gumbel(0, 1), then by routine integration,

Pr(j = \arg\max_i (g_i + \log\pi_i)) = \frac

That is,

\arg\max_i (g_i + \log\pi_i) \sim \text\left(\frac\right)_j

Equivalently, given any

x_1, ..., x_n\in \R

, we can sample from its

Boltzmann distribution In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability ...

Pr(j = \arg\max_i (g_i + x_i)) = \frac

Related equations include: * If

x\sim Exp(\lambda)

, then

(-\ln x - \gamma)\sim \text(-\gamma + \ln\lambda, 1)

. *

\arg\max_i (g_i + \log\pi_i) \sim \text\left(\frac\right)_j

. *

\max_i (g_i + \log\pi_i) \sim \text\left(-\gamma + \log\left(\sum_i \pi_i \right), 1\right)

. That is, the Gumbel distribution is a max-stable distribution family. *

= \log \left(\sum_i e^\right)

Random variate generation

Since the quantile function (inverse

Q(p)

, of a Gumbel distribution is given by :

Q(p)=\mu-\beta\ln(-\ln(p)),

the variate

Q(U)

has a Gumbel distribution with parameters

\mu

and

\beta

when the random variate

U

is drawn from the uniform distribution on the interval

(0,1)

Probability paper

In pre-software times probability paper was used to picture the Gumbel distribution (see illustration). The paper is based on linearization of the cumulative distribution function

F

: :

= (x-\mu)/\beta

In the paper the horizontal axis is constructed at a double log scale. The vertical axis is linear. By plotting

F

on the horizontal axis of the paper and the

x

-variable on the vertical axis, the distribution is represented by a straight line with a slope 1

/\beta

. When distribution fitting software like

CumFreq In statistics and data analysis the application software CumFreq is a tool for cumulative frequency analysis of a single variable and for probability distribution fitting. Originally the method was developed for the analysis of hydrologica ...

became available, the task of plotting the distribution was made easier, as is demonstrated in the section below.

References

External links

{{DEFAULTSORT:Gumbel Distribution Continuous distributions Extreme value data Location-scale family probability distributions