HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
and statistics, the Gumbel distribution (also known as the type-I
generalized extreme value distribution In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known ...
) is used to model the distribution of the maximum (or the minimum) of a number of samples of various distributions. This distribution might be used to represent the distribution of the maximum level of a river in a particular year if there was a list of maximum values for the past ten years. It is useful in predicting the chance that an extreme earthquake, flood or other natural disaster will occur. The potential applicability of the Gumbel distribution to represent the distribution of maxima relates to
extreme value theory Extreme value theory or extreme value analysis (EVA) is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, the ...
, which indicates that it is likely to be useful if the distribution of the underlying sample data is of the normal or exponential type. ''This article uses the Gumbel distribution to model the distribution of the maximum value''. ''To model the minimum value, use the negative of the original values.'' The Gumbel distribution is a particular case of the
generalized extreme value distribution In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known ...
(also known as the Fisher-Tippett distribution). It is also known as the ''log-
Weibull distribution In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice R ...
'' and the ''double exponential distribution'' (a term that is alternatively sometimes used to refer to the
Laplace distribution In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two expo ...
). It is related to the
Gompertz distribution In probability and statistics, the Gompertz distribution is a continuous probability distribution, named after Benjamin Gompertz. The Gompertz distribution is often applied to describe the distribution of adult lifespans by demographers and act ...
: when its density is first reflected about the origin and then restricted to the positive half line, a Gompertz function is obtained. In the
latent variable In statistics, latent variables (from Latin: present participle of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or me ...
formulation of the
multinomial logit In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the pro ...
model — common in
discrete choice In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Su ...
theory — the errors of the latent variables follow a Gumbel distribution. This is useful because the difference of two Gumbel-distributed
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
s has a
logistic distribution Logistic may refer to: Mathematics * Logistic function, a sigmoid function used in many fields ** Logistic map, a recurrence relation that sometimes exhibits chaos ** Logistic regression, a statistical model using the logistic function ** Logit ...
. The Gumbel distribution is named after
Emil Julius Gumbel Emil Julius Gumbel (18 July 1891, in Munich – 10 September 1966, in New York City) was a German mathematician and political writer. Gumbel specialised in mathematical statistics and, along with Leonard Tippett and Ronald Fisher, was instru ...
(1891–1966), based on his original papers describing the distribution.


Definitions

The
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...
of the Gumbel distribution is :F(x;\mu,\beta) = e^.\,


Standard Gumbel distribution

The standard Gumbel distribution is the case where \mu = 0 and \beta = 1 with cumulative distribution function :F(x) = e^\, and probability density function :f(x) = e^. In this case the mode is 0, the median is -\ln(\ln(2)) \approx 0.3665, the mean is \gamma\approx 0.5772 (the Euler–Mascheroni constant), and the standard deviation is \pi/\sqrt \approx 1.2825. The cumulants, for n>1, are given by :\kappa_n = (n-1)! \zeta(n).


Properties

The mode is μ, while the median is \mu-\beta \ln\left(\ln 2\right), and the mean is given by :\operatorname(X)=\mu+\gamma\beta, where \gamma is the
Euler-Mascheroni constant Euler's constant (sometimes also called the Euler–Mascheroni constant) is a mathematical constant usually denoted by the lowercase Greek letter gamma (). It is defined as the limiting difference between the harmonic series and the natural ...
. The standard deviation \sigma is \beta \pi/\sqrt hence \beta = \sigma \sqrt / \pi \approx 0.78 \sigma. At the mode, where x = \mu , the value of F(x;\mu,\beta) becomes e^ \approx 0.37 , irrespective of the value of \beta.


Related distributions

* If X has a Gumbel distribution, then the conditional distribution of ''Y=−X'' given that ''Y'' is positive, or equivalently given that ''X'' is negative, has a
Gompertz distribution In probability and statistics, the Gompertz distribution is a continuous probability distribution, named after Benjamin Gompertz. The Gompertz distribution is often applied to describe the distribution of adult lifespans by demographers and act ...
. The cdf ''G'' of ''Y'' is related to ''F'', the cdf of ''X'', by the formula G(y) = P(Y \le y) = P(X \ge -y , X \le 0) = (F(0)-F(-y))/F(0) for ''y''>0. Consequently, the densities are related by g(y) = f(-y)/F(0): the Gompertz density is proportional to a reflected Gumbel density, restricted to the positive half-line. * If ''X'' is an exponentially distributed variable with mean 1, then −log(''X'') has a standard Gumbel distribution. * If X \sim \mathrm(\alpha_X, \beta) and Y \sim \mathrm(\alpha_Y, \beta) are independent, then X-Y \sim \mathrm(\alpha_X-\alpha_Y,\beta) \, (see
Logistic distribution Logistic may refer to: Mathematics * Logistic function, a sigmoid function used in many fields ** Logistic map, a recurrence relation that sometimes exhibits chaos ** Logistic regression, a statistical model using the logistic function ** Logit ...
). * If X, Y \sim \mathrm(\alpha, \beta) are independent, then X+Y \nsim \mathrm(2 \alpha,\beta). Note that E(X+Y) = 2\alpha+2\beta\gamma \neq 2\alpha = E\left(\mathrm(2 \alpha,\beta) \right) . More generally, the distribution of linear combinations of independent Gumbel random variables can be approximated by GNIG and GIG distributions. Theory related to the
generalized multivariate log-gamma distribution In probability theory and statistics, the generalized multivariate log-gamma (G-MVLG) distribution is a multivariate distribution introduced by Demirhan and Hamurkaroglu in 2011. The G-MVLG is a flexible distribution. Skewness and kurtosis are well ...
provides a multivariate version of the Gumbel distribution.


Occurrence and applications

Gumbel has shown that the maximum value (or last
order statistic In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference. Importa ...
) in a sample of
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
s following an
exponential distribution In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant averag ...
minus the natural logarithm of the sample size approaches the Gumbel distribution as the sample size increases. Concretely, let \rho(x)=e^ be the probability distribution of x and Q(x)=1- e^ its cumulative distribution. Then the maximum value out of N realizations of x is smaller than X if and only if all realizations are smaller than X . So the cumulative distribution of the maximum value \tilde satisfies :P(\tilde-\log(N)\le X)=P(\tilde\le X+\log(N))= (X+\log(N))N=\left(1- \frac\right)^N , and, for large N , the right-hand-side converges to e^. In
hydrology Hydrology () is the scientific study of the movement, distribution, and management of water on Earth and other planets, including the water cycle, water resources, and environmental watershed sustainability. A practitioner of hydrology is calle ...
, therefore, the Gumbel distribution is used to analyze such variables as monthly and annual maximum values of daily rainfall and river discharge volumes, and also to describe droughts. Gumbel has also shown that the
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
for the probability of an event — where ''r'' is the rank number of the observed value in the data series and ''n'' is the total number of observations — is an
unbiased estimator In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In st ...
of the
cumulative probability In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Eve ...
around the mode of the distribution. Therefore, this estimator is often used as a
plotting position Plot or Plotting may refer to: Art, media and entertainment * Plot (narrative), the story of a piece of fiction Music * ''The Plot'' (album), a 1976 album by jazz trumpeter Enrico Rava * The Plot (band), a band formed in 2003 Other * ''Plot' ...
. In
number theory Number theory (or arithmetic or higher arithmetic in older usage) is a branch of pure mathematics devoted primarily to the study of the integers and integer-valued functions. German mathematician Carl Friedrich Gauss (1777–1855) said, "Math ...
, the Gumbel distribution approximates the number of terms in a random
partition of an integer In number theory and combinatorics, a partition of a positive integer , also called an integer partition, is a way of writing as a sum of positive integers. Two sums that differ only in the order of their summands are considered the same parti ...
as well as the trend-adjusted sizes of maximal
prime gaps A prime gap is the difference between two successive prime numbers. The ''n''-th prime gap, denoted ''g'n'' or ''g''(''p'n'') is the difference between the (''n'' + 1)-th and the ''n''-th prime numbers, i.e. :g_n = p_ - p_n.\ W ...
and maximal gaps between
prime constellations In number theory, a prime -tuple is a finite collection of values representing a repeatable pattern of differences between prime numbers. For a -tuple , the positions where the -tuple matches a pattern in the prime numbers are given by the set o ...
.


Gumbel reparametrization tricks

In
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, the Gumbel distribution is sometimes employed to generate samples from the
categorical distribution In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution) is a discrete probability distribution that describes the possible results of a random variable that ca ...
. This technique is called "Gumbel-max trick" and is a special example of " reparametrization tricks". In detail, let (\pi_1, ..., \pi_n) be nonnegative, and not all zero, and let g_1,... , g_n be independent samples of Gumbel(0, 1), then by routine integration,Pr(j = \arg\max_i (g_i + \log\pi_i)) = \fracThat is, \arg\max_i (g_i + \log\pi_i) \sim \text\left(\frac\right)_j Equivalently, given any x_1, ..., x_n\in \R, we can sample from its
Boltzmann distribution In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability ...
by Pr(j = \arg\max_i (g_i + x_i)) = \fracRelated equations include: * If x\sim Exp(\lambda), then (-\ln x - \gamma)\sim \text(-\gamma + \ln\lambda, 1). * \arg\max_i (g_i + \log\pi_i) \sim \text\left(\frac\right)_j. * \max_i (g_i + \log\pi_i) \sim \text\left(-\gamma + \log\left(\sum_i \pi_i \right), 1\right). That is, the Gumbel distribution is a max-stable distribution family. * \mathbb E max_i (g_i + \beta x_i)= \log \left(\sum_i e^\right).


Random variate generation

Since the quantile function (inverse
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...
), Q(p), of a Gumbel distribution is given by :Q(p)=\mu-\beta\ln(-\ln(p)), the variate Q(U) has a Gumbel distribution with parameters \mu and \beta when the random variate U is drawn from the uniform distribution on the interval (0,1).


Probability paper

In pre-software times probability paper was used to picture the Gumbel distribution (see illustration). The paper is based on linearization of the cumulative distribution function F : : -\ln \ln(F)= (x-\mu)/\beta In the paper the horizontal axis is constructed at a double log scale. The vertical axis is linear. By plotting F on the horizontal axis of the paper and the x-variable on the vertical axis, the distribution is represented by a straight line with a slope 1/\beta. When distribution fitting software like
CumFreq In statistics and data analysis the application software CumFreq is a tool for cumulative frequency analysis of a single variable and for probability distribution fitting. Originally the method was developed for the analysis of hydrologica ...
became available, the task of plotting the distribution was made easier, as is demonstrated in the section below.


See also

* Type-2 Gumbel distribution *
Extreme value theory Extreme value theory or extreme value analysis (EVA) is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, the ...
*
Generalized extreme value distribution In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known ...
*
Fisher–Tippett–Gnedenko theorem In statistics, the Fisher–Tippett–Gnedenko theorem (also the Fisher–Tippett theorem or the extreme value theorem) is a general result in extreme value theory regarding asymptotic distribution of extreme order statistics. The maximum of a sa ...
*
Emil Julius Gumbel Emil Julius Gumbel (18 July 1891, in Munich – 10 September 1966, in New York City) was a German mathematician and political writer. Gumbel specialised in mathematical statistics and, along with Leonard Tippett and Ronald Fisher, was instru ...


References


External links

{{DEFAULTSORT:Gumbel Distribution Continuous distributions Extreme value data Location-scale family probability distributions