
Extreme value theory or extreme value analysis (EVA) is the study of extremes in statistical distributions.
It is widely used in many disciplines, such as
structural engineering
Structural engineering is a sub-discipline of civil engineering in which structural engineers are trained to design the 'bones and joints' that create the form and shape of human-made Structure#Load-bearing, structures. Structural engineers also ...
,
finance
Finance refers to monetary resources and to the study and Academic discipline, discipline of money, currency, assets and Liability (financial accounting), liabilities. As a subject of study, is a field of Business administration, Business Admin ...
,
economics
Economics () is a behavioral science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goods and services.
Economics focuses on the behaviour and interac ...
,
earth science
Earth science or geoscience includes all fields of natural science related to the planet Earth. This is a branch of science dealing with the physical, chemical, and biological complex constitutions and synergistic linkages of Earth's four spheres ...
s, traffic prediction, and
geological engineering. For example, EVA might be used in the field of
hydrology
Hydrology () is the scientific study of the movement, distribution, and management of water on Earth and other planets, including the water cycle, water resources, and drainage basin sustainability. A practitioner of hydrology is called a hydro ...
to estimate the probability of an unusually large flooding event, such as the
100-year flood
A 100-year flood, also called a 1% flood,Holmes, R.R., Jr., and Dinicola, K. (2010) ''100-Year flood–it's all about chance 'U.S. Geological Survey General Information Product 106/ref> is a flood event at a level that is reached or exceeded onc ...
. Similarly, for the design of a
breakwater, a
coastal engineer would seek to estimate the 50 year wave and design the structure accordingly.
Data analysis
Two main approaches exist for practical extreme value analysis.
The first method relies on deriving block maxima (minima) series as a preliminary step. In many situations it is customary and convenient to extract the annual maxima (minima), generating an ''annual maxima series'' (AMS).
The second method relies on extracting, from a continuous record, the peak values reached for any period during which values exceed a certain threshold (falls below a certain threshold). This method is generally referred to as the ''peak over threshold'' method (POT).
For AMS data, the analysis may partly rely on the results of the
Fisher–Tippett–Gnedenko theorem, leading to the
generalized extreme value distribution
In probability theory and statistics, the generalized extreme value (GEV) distribution
is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel distribution, Gumbel, Fréchet distribution, F ...
being selected for fitting. However, in practice, various procedures are applied to select between a wider range of distributions. The theorem here relates to the limiting distributions for the minimum or the maximum of a very large collection of
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in Pennsylvania, United States
* Independentes (English: Independents), a Portuguese artist ...
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
s from the same distribution. Given that the number of relevant random events within a year may be rather limited, it is unsurprising that analyses of observed AMS data often lead to distributions other than the ''generalized extreme value distribution'' (GEVD) being selected.
For POT data, the analysis may involve fitting two distributions: One for the number of events in a time period considered and a second for the size of the exceedances.
A common assumption for the first is the
Poisson distribution
In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
, with the
generalized Pareto distribution being used for the exceedances.
A
tail-fitting can be based on the
Pickands–Balkema–de Haan theorem.
Novak (2011) reserves the term "POT method" to the case where the threshold is non-random, and distinguishes it from the case where one deals with exceedances of a random threshold.
Applications
Applications of extreme value theory include predicting the probability distribution of:
* Extreme
flood
A flood is an overflow of water (list of non-water floods, or rarely other fluids) that submerges land that is usually dry. In the sense of "flowing water", the word may also be applied to the inflow of the tide. Floods are of significant con ...
s; the size of
freak waves
*
Tornado
A tornado is a violently rotating column of air that is in contact with the surface of Earth and a cumulonimbus cloud or, in rare cases, the base of a cumulus cloud. It is often referred to as a twister, whirlwind or cyclone, although the ...
outbreaks
* Maximum sizes of
ecological populations
*
Side effect
In medicine, a side effect is an effect of the use of a medicinal drug or other treatment, usually adverse but sometimes beneficial, that is unintended. Herbal and traditional medicines also have side effects.
A drug or procedure usually use ...
s of
drugs
A drug is any chemical substance other than a nutrient or an essential dietary ingredient, which, when administered to a living organism, produces a biological effect. Consumption of drugs can be via inhalation, injection, smoking, ingestio ...
(e.g.,
ximelagatran)
* The magnitudes of large
insurance
Insurance is a means of protection from financial loss in which, in exchange for a fee, a party agrees to compensate another party in the event of a certain loss, damage, or injury. It is a form of risk management, primarily used to protect ...
losses
*
Equity risks; day-to-day
market risk
Market risk is the risk of losses in positions arising from movements in market variables like prices and volatility.
There is no unique classification as each classification may refer to different aspects of market risk. Nevertheless, the m ...
*
Mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
events during
evolution
Evolution is the change in the heritable Phenotypic trait, characteristics of biological populations over successive generations. It occurs when evolutionary processes such as natural selection and genetic drift act on genetic variation, re ...
* Large
wildfire
A wildfire, forest fire, or a bushfire is an unplanned and uncontrolled fire in an area of Combustibility and flammability, combustible vegetation. Depending on the type of vegetation present, a wildfire may be more specifically identified as a ...
s
*
Environmental loads on structures
* Time the fastest
humans could ever run the
100 metres sprint and performances in other
athletic disciplines
* Pipeline failures due to
pitting corrosion
Pitting corrosion, or pitting, is a form of extremely localized corrosion that leads to the random creation of small holes in metal. The driving power for pitting corrosion is the depassivation of a small area, which becomes anodic (oxidation re ...
* Anomalous IT
network traffic Network traffic or data traffic is the amount of data moving across a network at a given point of time. Network data in computer networks is mostly encapsulated in network packets, which provide the load in the network. Network traffic is the main ...
, prevent
attackers from reaching important data
*
Road safety
Road traffic safety refers to the methods and measures, such as traffic calming, to prevent road users from being killed or seriously injured. Typical road users include pedestrians, cyclists, Driving, motorists, passengers of vehicles, and p ...
analysis
*
Wireless communications
*
Epidemic
An epidemic (from Greek ἐπί ''epi'' "upon or above" and δῆμος ''demos'' "people") is the rapid spread of disease to a large number of hosts in a given population within a short period of time. For example, in meningococcal infection ...
s
*
Neurobiology
Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions, and its disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular biology, ...
*
Solar energy
Solar energy is the radiant energy from the Sun's sunlight, light and heat, which can be harnessed using a range of technologies such as solar electricity, solar thermal energy (including solar water heating) and solar architecture. It is a ...
* Extreme
Space weather
Space weather is a branch of space physics and aeronomy, or heliophysics, concerned with the varying conditions within the Solar System and its heliosphere. This includes the effects of the solar wind, especially on the Earth's magnetosphere, ion ...
History
The field of extreme value theory was pioneered by
L. Tippett (1902–1985). Tippett was employed by the
British Cotton Industry Research Association, where he worked to make cotton thread stronger. In his studies, he realized that the strength of a thread was controlled by the strength of its weakest fibres. With the help of
R.A. Fisher, Tippet obtained three asymptotic limits describing the distributions of extremes assuming independent variables.
E.J. Gumbel (1958) codified this theory. These results can be extended to allow for slight correlations between variables, but the classical theory does not extend to strong correlations of the order of the variance. One universality class of particular interest is that of ''log-correlated'' fields, where the correlations decay logarithmically with the distance.
Univariate theory
The theory for extreme values of a single variable is governed by the ''
extreme value theorem
In calculus, the extreme value theorem states that if a real-valued function f is continuous on the closed and bounded interval ,b/math>, then f must attain a maximum and a minimum, each at least once.
That is, there exist numbers c and ...
'', also called the ''
Fisher–Tippett–Gnedenko theorem'', which describes which of the three possible distributions for extreme values applies for a particular statistical variable
.
Multivariate theory
Extreme value theory in more than one variable introduces additional issues that have to be addressed. One problem that arises is that one must specify what constitutes an extreme event.
[
]
Although this is straightforward in the univariate case, there is no unambiguous way to do this in the multivariate case. The fundamental problem is that although it is possible to order a set of real-valued numbers, there is no natural way to order a set of vectors.
As an example, in the univariate case, given a set of observations
it is straightforward to find the most extreme event simply by taking the maximum (or minimum) of the observations. However, in the bivariate case, given a set of observations
, it is not immediately clear how to find the most extreme event. Suppose that one has measured the values
at a specific time and the values
at a later time. Which of these events would be considered more extreme? There is no universal answer to this question.
Another issue in the multivariate case is that the limiting model is not as fully prescribed as in the univariate case. In the univariate case, the model (
GEV distribution) contains three parameters whose values are not predicted by the theory and must be obtained by fitting the distribution to the data. In the multivariate case, the model not only contains unknown parameters, but also a function whose exact form is not prescribed by the theory. However, this function must obey certain constraints.
It is not straightforward to devise estimators that obey such constraints though some have been recently constructed.
[
][
][
]
As an example of an application, bivariate extreme value theory has been applied to ocean research.
[
]
Non-stationary extremes
Statistical modeling for nonstationary time series was developed in the 1990s.[
] Methods for nonstationary multivariate extremes have been introduced more recently.[
]
The latter can be used for tracking how the dependence between extreme values changes over time, or over another covariate.[
][
][
]
See also
* Extreme risk
* Extreme weather
Extreme weather includes unexpected, unusual, severe weather, severe, or unseasonal weather; weather at the extremes of the historical distribution—the range that has been seen in the past. Extreme events are based on a location's recorded weat ...
* Fisher–Tippett–Gnedenko theorem
* Generalized extreme value distribution
In probability theory and statistics, the generalized extreme value (GEV) distribution
is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel distribution, Gumbel, Fréchet distribution, F ...
* Large deviation theory
* Outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
* Pareto distribution
The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial scien ...
* Pickands–Balkema–de Haan theorem
* Rare events
* Redundancy principle
; Extreme value distributions
* Fréchet distribution
* Gumbel distribution
* Weibull distribution
References
Sources
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Software
*
* — Package for extreme value statistics in R.
* — Package for extreme value statistics in Julia.
*
External links
* — Easy non-mathematical introduction.
*
* — Full-text access to conferences held by in 1933–1934.
{{DEFAULTSORT:Extreme Value Theory
Actuarial science
Statistical theory
Extreme value data
Tails of probability distributions
Financial risk modeling