HOME

TheInfoList



OR:

Extreme value theory or extreme value analysis (EVA) is a branch of
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
dealing with the extreme deviations from the
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic f ...
of
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
s. It seeks to assess, from a given ordered
sample Sample or samples may refer to: Base meaning * Sample (statistics), a subset of a population – complete data set * Sample (signal), a digital discrete sample of a continuous analog signal * Sample (material), a specimen or small quantity of ...
of a given random variable, the probability of events that are more extreme than any previously observed. Extreme value analysis is widely used in many disciplines, such as
structural engineering Structural engineering is a sub-discipline of civil engineering in which structural engineers are trained to design the 'bones and muscles' that create the form and shape of man-made structures. Structural engineers also must understand and cal ...
, finance,
earth science Earth science or geoscience includes all fields of natural science related to the planet Earth. This is a branch of science dealing with the physical, chemical, and biological complex constitutions and synergistic linkages of Earth's four sphere ...
s, traffic prediction, and
geological engineering Geological engineering is a discipline of engineering concerned with the application of geological science and engineering principles to fields, such as civil engineering, mining, environmental engineering, and forestry, among others.M. Diederichs, ...
. For example, EVA might be used in the field of
hydrology Hydrology () is the scientific study of the movement, distribution, and management of water on Earth and other planets, including the water cycle, water resources, and environmental watershed sustainability. A practitioner of hydrology is call ...
to estimate the probability of an unusually large flooding event, such as the
100-year flood A 100-year flood is a flood event that has a 1 in 100 chance (1% probability) of being equaled or exceeded in any given year. The 100-year flood is also referred to as the 1% flood, since its annual exceedance probability is 1%.Holmes, R.R., Jr. ...
. Similarly, for the design of a
breakwater Breakwater may refer to: * Breakwater (structure), a structure for protecting a beach or harbour Places * Breakwater, Victoria, a suburb of Geelong, Victoria, Australia * Breakwater Island, Antarctica * Breakwater Islands, Nunavut, Canada * Br ...
, a coastal engineer would seek to estimate the 50-year wave and design the structure accordingly.


Data analysis

Two main approaches exist for practical extreme value analysis. The first method relies on deriving block maxima (minima) series as a preliminary step. In many situations it is customary and convenient to extract the annual maxima (minima), generating an "Annual Maxima Series" (AMS). The second method relies on extracting, from a continuous record, the peak values reached for any period during which values exceed a certain threshold (falls below a certain threshold). This method is generally referred to as the "Peak Over Threshold" method (POT). For AMS data, the analysis may partly rely on the results of the
Fisher–Tippett–Gnedenko theorem In statistics, the Fisher–Tippett–Gnedenko theorem (also the Fisher–Tippett theorem or the extreme value theorem) is a general result in extreme value theory regarding asymptotic distribution of extreme order statistics. The maximum of a sam ...
, leading to the
generalized extreme value distribution In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known a ...
being selected for fitting. However, in practice, various procedures are applied to select between a wider range of distributions. The theorem here relates to the limiting distributions for the minimum or the maximum of a very large collection of
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s from the same distribution. Given that the number of relevant random events within a year may be rather limited, it is unsurprising that analyses of observed AMS data often lead to distributions other than the generalized extreme value distribution (GEVD) being selected. For POT data, the analysis may involve fitting two distributions: one for the number of events in a time period considered and a second for the size of the exceedances. A common assumption for the first is the
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
, with the
generalized Pareto distribution In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location \mu, scale \sigma, and shap ...
being used for the exceedances. A tail-fitting can be based on the Pickands–Balkema–de Haan theorem. Novak reserves the term “POT method” to the case where the threshold is non-random, and distinguishes it from the case where one deals with exceedances of a random threshold.


Applications

Applications of extreme value theory include predicting the probability distribution of: * Extreme
flood A flood is an overflow of water ( or rarely other fluids) that submerges land that is usually dry. In the sense of "flowing water", the word may also be applied to the inflow of the tide. Floods are an area of study of the discipline hydrol ...
s; the size of
freak wave Rogue waves (also known as freak waves, monster waves, episodic waves, killer waves, extreme waves, and abnormal waves) are unusually large, unpredictable, and suddenly appearing surface waves that can be extremely dangerous to ships, even to l ...
s *
Tornado A tornado is a violently rotating column of air that is in contact with both the surface of the Earth and a cumulonimbus cloud or, in rare cases, the base of a cumulus cloud. It is often referred to as a twister, whirlwind or cyclone, alt ...
outbreaks * Maximum sizes of ecological populations * Side effects of drugs (e.g.,
ximelagatran Ximelagatran (Exanta or Exarta, H 376/95) is an anticoagulant that has been investigated extensively as a replacement for warfarin that would overcome the problematic dietary, drug interaction, and monitoring issues associated with warfarin thera ...
) * The magnitudes of large
insurance Insurance is a means of protection from financial loss in which, in exchange for a fee, a party agrees to compensate another party in the event of a certain loss, damage, or injury. It is a form of risk management, primarily used to hedge ...
losses *
Equity risk Equity risk is "the financial risk involved in holding equity in a particular investment." Equity risk often refers to equity in companies through the purchase of stocks, and does not commonly refer to the risk in paying into real estate or build ...
s; day-to-day
market risk Market risk is the risk of losses in positions arising from movements in market variables like prices and volatility. There is no unique classification as each classification may refer to different aspects of market risk. Nevertheless, the most ...
* Mutational events during
evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
* Large
wildfire A wildfire, forest fire, bushfire, wildland fire or rural fire is an unplanned, uncontrolled and unpredictable fire in an area of combustible vegetation. Depending on the type of vegetation present, a wildfire may be more specifically identi ...
s * Environmental loads on structures * Fastest time humans are capable of running the
100 metres The 100 metres, or 100-meter dash, is a sprint race in track and field competitions. The shortest common outdoor running distance, the dash is one of the most popular and prestigious events in the sport of athletics. It has been conteste ...
sprint and performances in other athletic disciplines * Pipeline failures due to
pitting corrosion Pitting corrosion, or pitting, is a form of extremely localized corrosion that leads to the random creation of small holes in metal. The driving power for pitting corrosion is the depassivation of a small area, which becomes anodic (oxidation re ...
* Anomalous IT network traffic, prevent attackers from reaching important data * Road safety analysis * Wireless communications *Epidemics *Neurobiology


History

The field of extreme value theory was pioneered by Leonard Tippett (1902–1985). Tippett was employed by the
British Cotton Industry Research Association The Shirley Institute was established in 1920 as the British Cotton Industry Research Association at The Towers in Didsbury, Manchester, as a research centre dedicated to cotton production technologies. It was funded by the Cotton Board throug ...
, where he worked to make cotton thread stronger. In his studies, he realized that the strength of a thread was controlled by the strength of its weakest fibres. With the help of
R. A. Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...
, Tippet obtained three asymptotic limits describing the distributions of extremes assuming independent variables.
Emil Julius Gumbel Emil Julius Gumbel (18 July 1891, in Munich – 10 September 1966, in New York City) was a German mathematician and political writer. Gumbel specialised in mathematical statistics and, along with Leonard Tippett and Ronald Fisher, was instrumen ...
codified this theory in his 1958 book ''Statistics of Extremes'', including the
Gumbel distribution In probability theory and statistics, the Gumbel distribution (also known as the type-I generalized extreme value distribution) is used to model the distribution of the maximum (or the minimum) of a number of samples of various distributions. Th ...
s that bear his name. These results can be extended to allow for slight correlations between variables, but the classical theory does not extend to strong correlations of the order of the variance. One universality class of particular interest is that of log-correlated fields, where the correlations decay logarithmically with the distance.


Univariate theory

Let X_1, \dots, X_n be a sequence of
independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
random variables with
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Eve ...
''F'' and let M_n =\max(X_1,\dots,X_n) denote the maximum. In theory, the exact distribution of the maximum can be derived: : \begin \Pr(M_n \leq z) & = \Pr(X_1 \leq z, \dots, X_n \leq z) \\ & = \Pr(X_1 \leq z) \cdots \Pr(X_n \leq z) = (F(z))^n. \end The associated
indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\i ...
I_n = I(M_n>z) is a
Bernoulli process In probability and statistics, a Bernoulli process (named after Jacob Bernoulli) is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. T ...
with a success probability p(z)=1-(F(z))^n that depends on the magnitude z of the extreme event. The number of extreme events within n trials thus follows a
binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no ques ...
and the number of trials until an event occurs follows a
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number ''X'' of Bernoulli trials needed to get one success, supported on the set \; * ...
with expected value and standard deviation of the same order O(1/p(z)). In practice, we might not have the distribution function F but the
Fisher–Tippett–Gnedenko theorem In statistics, the Fisher–Tippett–Gnedenko theorem (also the Fisher–Tippett theorem or the extreme value theorem) is a general result in extreme value theory regarding asymptotic distribution of extreme order statistics. The maximum of a sam ...
provides an asymptotic result. If there exist sequences of constants a_n>0 and b_n\in \mathbb R such that : \Pr\ \rightarrow G(z) as n \rightarrow \infty then : G(z) \propto \exp \left (1+\zeta z)^ \right where \zeta depends on the tail shape of the distribution. When normalized, ''G'' belongs to one of the following non-
degenerate distribution In mathematics, a degenerate distribution is, according to some, a probability distribution in a space with support only on a manifold of lower dimension, and according to others a distribution with support only at a single point. By the latter d ...
families: Weibull law: G(z) = \begin \exp\left\ & z when the distribution of M_n has a light tail with finite upper bound. Also known as Type 3. Gumbel law: G(z) = \exp\left\ when the distribution of M_n has an exponential tail. Also known as Type 1. Fréchet law: G(z) = \begin 0 & z\leq b \\ \exp\left\ & z>b \end when the distribution of M_n has a
heavy tail In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distrib ...
(including polynomial decay). Also known as Type 2. For the Weibull and Fréchet laws, \alpha>0.


Multivariate theory

Extreme value theory in more than one variable introduces additional issues that have to be addressed. One problem that arises is that one must specify what constitutes an extreme event. Although this is straightforward in the univariate case, there is no unambiguous way to do this in the multivariate case. The fundamental problem is that although it is possible to order a set of real-valued numbers, there is no natural way to order a set of vectors. As an example, in the univariate case, given a set of observations x_i it is straightforward to find the most extreme event simply by taking the maximum (or minimum) of the observations. However, in the bivariate case, given a set of observations (x_i, y_i) , it is not immediately clear how to find the most extreme event. Suppose that one has measured the values (3, 4) at a specific time and the values (5, 2) at a later time. Which of these events would be considered more extreme? There is no universal answer to this question. Another issue in the multivariate case is that the limiting model is not as fully prescribed as in the univariate case. In the univariate case, the model ( GEV distribution) contains three parameters whose values are not predicted by the theory and must be obtained by fitting the distribution to the data. In the multivariate case, the model not only contains unknown parameters, but also a function whose exact form is not prescribed by the theory. However, this function must obey certain constraints. It is not straightforward to devise estimators that obey such constraints though some have been recently constructed. As an example of an application, bivariate extreme value theory has been applied to ocean research.


Nonstationary extremes

Statistical modeling for nonstationary time series was developed in the 1990's. Methods for nonstationary multivariate extremes have been introduced more recently. The latter can be used for tracking how the dependence between extreme values changes over time, or over another covariate.


See also

*
Extreme risk Extreme risks are risks of very bad outcomes or "high consequence", but of low probability. They include the risks of terrorist attack, biosecurity risks such as the invasion of pests, and extreme natural disasters such as major earthquakes. Int ...
*
Extreme weather Extreme weather or extreme climate events includes unexpected, unusual, severe, or unseasonal weather; weather at the extremes of the historical distribution—the range that has been seen in the past. Often, extreme events are based on a locat ...
*
Fisher–Tippett–Gnedenko theorem In statistics, the Fisher–Tippett–Gnedenko theorem (also the Fisher–Tippett theorem or the extreme value theorem) is a general result in extreme value theory regarding asymptotic distribution of extreme order statistics. The maximum of a sam ...
*
Generalized extreme value distribution In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known a ...
* Large deviation theory *
Outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
*
Pareto distribution The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto ( ), is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, ac ...
* Pickands–Balkema–de Haan theorem * Rare events *
Weibull distribution In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice Re ...
* Redundancy principle


Notes


References

* * * * Burry K.V. (1975). ''Statistical Methods in Applied Science''. John Wiley & Sons. * Castillo E. (1988) ''Extreme value theory in engineering.'' Academic Press, Inc. New York. . * Castillo, E., Hadi, A. S., Balakrishnan, N. and Sarabia, J. M. (2005) Extreme Value and Related Models with Applications in Engineering and Science, Wiley Series in Probability and Statistics Wiley, Hoboken, New Jersey. . * Coles S. (2001) ''An Introduction to Statistical Modeling of Extreme Values''. Springer, London. * Embrechts P., Klüppelberg C. and Mikosch T. (1997) ''Modelling extremal events for insurance and finance''. Berlin: Spring Verlag * * * * * * * Leadbetter M.R., Lindgren G. and Rootzen H. (1982) ''Extremes and related properties of random sequences and processes.'' Springer-Verlag, New York. * * Novak S.Y. (2011) ''Extreme Value Methods with Applications to Finance''. Chapman & Hall/CRC Press, London. *


Software


Extreme Value Statistics in R
- Packages for extreme value statistics in R
ExtremeStats.jl
an
Extremes.jl
- Extreme Value Statistics in
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...


External links


''Extreme Value Theory can save your neck'' Easy non-mathematical introduction (pdf)

''Source Code for Stationary and Nonstationary Extreme Value Analysis'' University of California, Irvine

''Steps in Applying Extreme Value Theory to Finance: A Review''

''Les valeurs extrêmes des distributions statistiques'' Full-text access to conferences held by E. J. Gumbel in 1933–34, in French (pdf)
{{DEFAULTSORT:Extreme Value Theory Actuarial science Statistical theory Extreme value data Tails of probability distributions Financial risk modeling