HOME

TheInfoList



OR:

Exponential Tilting (ET), Exponential Twisting, or Exponential Change of Measure (ECM) is a distribution shifting technique used in many parts of mathematics. The different exponential tiltings of a random variable X is known as the
natural exponential family In probability and statistics, a natural exponential family (NEF) is a class of probability distributions that is a special case of an exponential family (EF). Definition Univariate case The natural exponential families (NEF) are a subset of ...
of X. Exponential Tilting is used in Monte Carlo Estimation for rare-event simulation, and rejection and
importance sampling Importance sampling is a Monte Carlo method for evaluating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally att ...
in particular. In mathematical finance Exponential Tilting is also known as Esscher tilting (or the Esscher transform), and often combined with indirect Edgeworth approximation and is used in such contexts as insurance futures pricing. The earliest formalization of Exponential Tilting is often attributed to Esscher with its use in importance sampling being attributed to
David Siegmund __NOTOC__ David Oliver Siegmund (born November 15, 1941) is an American statistician who has worked extensively on sequential analysis.
.


Overview

Given a random variable X with probability distribution \mathbb, density f, and
moment generating function In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compar ...
(MGF) M_X(\theta) = \mathbb ^< \infty, the exponentially tilted measure \mathbb_\theta is defined as follows: :\mathbb_\theta(X \in dx) = \frac=e^\mathbb(X\in dx), where \kappa(\theta) is the
cumulant generating function In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will hav ...
(CGF) defined as :\kappa(\theta) = \log\mathbb ^= \log M_X(\theta). We call :\mathbb_\theta(X \in dx)=f_\theta(x) the \theta-tilted
density Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematicall ...
of X. It satisfies f_\theta(x) \propto e^f(x). The exponential tilting of a random vector X has an analogous definition: :\mathbb_(X\in dx) = e^\mathbb(X\in dx), where \kappa(\theta) = \log \mathbb exp\/math>.


Example

The exponentially tilted measure in many cases has the same parametric form as that of X. One-dimensional examples include the normal distribution, the exponential distribution, the binomial distribution and the Poisson distribution. For example, in the case of the normal distribution, N( \mu, \sigma ^2) the tilted density f_\theta(x) is the N( \mu + \theta \sigma ^2, \sigma ^2) density. The table below provides more examples of tilted density. For some distributions, however, the exponentially tilted distribution does not belong to the same parametric family as f. An example of this is the
Pareto distribution The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto ( ), is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actu ...
with f(x) = \alpha /(1 + x) ^\alpha, x > 0, where f_\theta(x) is well defined for \theta < 0 but is not a standard distribution. In such examples, the random variable generation may not always be straightforward.


Advantages

In many cases, the tilted distribution belongs to the same parametric family as the original. This is particularly true when the original density belongs to the
exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
of distribution. This simplifies random variable generation during Monte-Carlo simulations. Exponential tilting may still be useful if this is not the case, though normalization must be possible and additional sampling algorithms may be needed. In addition, there exists a simple relationship between the original and tilted CFG, :\kappa_\theta(\eta) = \log(\mathbb_\theta ^ = \kappa(\theta + \eta) - \kappa(\theta). We can see this by observing that :F_\theta(x) = \int\limits_^x\exp\f(y)dy. Thus, : \begin \kappa_(\eta) &= \log \int e^dF_(x) \\ &= \log \int e^e^dF(x) \\ &= \log\mathbb ^\\ &= \log(e^) \\ &= \kappa(\eta+\theta)-\kappa(\theta) \end . Clearly, this relationship allows for easy calculation of the CGF of the tilted distribution and thus the distributions moments. Moreover, it results in a simple form of the likelihood ratio. Specifically, :\ell = \frac = \frac = e^.


Properties

* If \kappa(\eta)=\log \mathrm exp(\eta X)/math> is the CGF of X, then the CGF of the \theta-tilted X is ::\kappa_\theta(\eta) = \kappa(\theta + \eta) - \kappa(\theta). :This means that the i-th
cumulant In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will hav ...
of the tilted X is \kappa^(\theta). In particular, the expectation of the tilted distribution is ::\mathrm_\theta \tfrac\kappa_\theta(\eta), _ = \kappa'(\theta). :The variance of the tilted distribution is ::\mathrm_\theta \tfrac\kappa_\theta(\eta), _ = \kappa''(\theta). * Repeated tilting is additive. That is, tilting first by \theta_1 and then \theta_2 is the same as tilting once by \theta_1+\theta_2. * If X is the sum of independent, but not necessarily identical random variables X_1, X_2, \dots, then the \theta-tilted distribution of X is the sum of X_1, X_2, \dots each \theta-tilted individually. * If \mu=\mathrm /math>, then \kappa(\theta)-\theta \mu is the
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
::D_\text(P \parallel P_\theta)=\mathrm \left log\tfrac\right/math> :between the tilted distribution P_\theta and the original distribution P of X. * Similarly, since \mathrm_ \kappa'(\theta), we have the Kullback-Leibler divergence as ::D_\text(P_\theta \parallel P) = \mathrm_\theta \left log\tfrac \right= \theta \kappa'(\theta) - \kappa(\theta).


Applications


Rare-event simulation

The exponential tilting of X, assuming it exists, supplies a family of distributions that can be used as proposal distributions for acceptance-rejection sampling or importance distributions for
importance sampling Importance sampling is a Monte Carlo method for evaluating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally att ...
. One common application is sampling from a distribution conditional on a sub-region of the domain, i.e. X, X\in A. With an appropriate choice of \theta, sampling from \mathbb_\theta can meaningfully reduce the required amount of sampling or the variance of an estimator.


Saddlepoint approximation

The saddlepoint approximation method is a density approximation methodology often used for the distribution of sums and averages of independent, identically distributed random variables that employs
Edgeworth series The Gram–Charlier A series (named in honor of Jørgen Pedersen Gram and Carl Charlier), and the Edgeworth series (named in honor of Francis Ysidro Edgeworth) are series that approximate a probability distribution in terms of its cumulants. The s ...
, but which generally performs better at extreme values. From the definition of the natural exponential family, it follows that :f_(\bar) = f(\bar)\exp\. Applying the
Edgeworth expansion The Gram–Charlier A series (named in honor of Jørgen Pedersen Gram and Carl Charlier), and the Edgeworth series (named in honor of Francis Ysidro Edgeworth) are series that approximate a probability distribution in terms of its cumulants. Th ...
for f_(\bar), we have :f_(\bar) = \psi(z)(\mathrm
bar Bar or BAR may refer to: Food and drink * Bar (establishment), selling alcoholic beverages * Candy bar * Chocolate bar Science and technology * Bar (river morphology), a deposit of sediment * Bar (tropical cyclone), a layer of cloud * Bar (un ...
^\left\, where \psi(z) is the standard normal density of :z = \frac, :\rho_n(\theta) = \kappa^(\theta)\, and h_n are the
hermite polynomials In mathematics, the Hermite polynomials are a classical orthogonal polynomial sequence. The polynomials arise in: * signal processing as Hermitian wavelets for wavelet transform analysis * probability, such as the Edgeworth series, as well ...
. When considering values of \bar progressively farther from the center of the distribution, , z, \rightarrow \infty and the h_n(z) terms become unbounded. However, for each value of \bar, we can choose \theta such that ::\kappa '(\theta) = \bar. This value of \theta is referred to as the saddle-point, and the above expansion is always evaluated at the expectation of the tilted distribution. This choice of \theta leads to the final representation of the approximation given by :f(\bar) \approx \left(\frac\right)^\exp\.


Rejection sampling

Using the tilted distribution \mathbb_ as the proposal, the
rejection sampling In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of ...
algorithm prescribes sampling from f_\theta(x) and accepting with probability :\frac \exp(-\theta x + \kappa(\theta)), where :c = \sup\limits_\frac(x). That is, a uniformly distributed random variable p \sim \mbox(0,1) is generated, and the sample from f_\theta(x) is accepted if :p \leq \frac \exp(-\theta x + \kappa(\theta)).


Importance sampling

Applying the exponentially tilted distribution as the importance distribution yields the equation :\mathbb(h(X)) = \mathbb_ ell(X)h(X)/math>, where :\ell(X) = \frac is the likelihood function. So, one samples from f_ to estimate the probability under the importance distribution \mathbb(dX) and then multiplies it by the likelihood ratio. Moreover, we have the variance given by :\mbox(X) = \mathbb \ell(X)h(X)^2/math>.


Example

Assume independent and identically distributed \ such that \kappa(\theta) < \infty. In order to estimate \mathbb(X_1 + \cdots + X_n > c), we can employ importance sampling by taking :h(X) = \mathbb(\sum_^n X_i > c). The constant c can be rewritten as na for some other constant a. Then, :\mathbb(\sum_^n X_i > na) = \mathbb_ \left exp\\mathbb(\sum_^n X_i > na) \right/math>, where \theta_a denotes the \theta defined by the saddle-point equation :\kappa '(\theta_a) = a.


Stochastic processes

Given the tilting of a normal R.V., it is intuitive that the exponential tilting of X_t, a
Brownian motion Brownian motion, or pedesis (from grc, πήδησις "leaping"), is the random motion of particles suspended in a medium (a liquid or a gas). This pattern of motion typically consists of random fluctuations in a particle's position insi ...
with drift \mu and variance \sigma^2, is a Brownian motion with drift \mu + \theta\sigma^2 and variance \sigma^2. Thus, any Brownian motion with drift under \mathbb can be thought of as a Brownian motion without drift under \mathbb_. To observe this, consider the process X_t = B_t + \mu_t. f(X_t) = f_(X_t)\frac = f(B_t)\exp\. The likelihood ratio term, \exp\, is a
martingale Martingale may refer to: * Martingale (probability theory), a stochastic process in which the conditional expectation of the next value, given the current and preceding values, is the current value * Martingale (tack) for horses * Martingale (coll ...
and commonly denoted M_T. Thus, a Brownian motion with drift process (as well as many other continuous processes adapted to the Brownian filtration) is a \mathbb_-martingale.


Stochastic Differential Equations

The above leads to the alternate representation of the
stochastic differential equation A stochastic differential equation (SDE) is a differential equation in which one or more of the terms is a stochastic process, resulting in a solution which is also a stochastic process. SDEs are used to model various phenomena such as stock ...
dX(t) = \mu(t)dt + \sigma(t) dB(t): dX_(t) = \mu_(t) dt + \sigma(t) dB(t), where \mu_(t) = \mu(t) + \theta\sigma(t). Girsanov's Formula states the likelihood ratio \frac = \exp\. Therefore, Girsanov's Formula can be used to implement importance sampling for certain SDEs. Tilting can also be useful for simulating a process X(t) via rejection sampling of the SDE dX(t) = \mu(X(t))dt+ dB(t). We may focus on the SDE since we know that X(t) can be written \int\limits_0^t dX(t) + X(0). As previously stated, a Brownian motion with drift can be tilted to a Brownian motion without drift. Therefore, we choose \mathbb_=\mathbb_. The likelihood ratio \frac(dX(s): 0 \leq s \leq t) = \prod\limits_\exp\dt = \exp\dt. This likelihood ratio will be denoted M(t). To ensure this is a true likelihood ratio, it must be shown that \mathbb (t)= 1. Assuming this condition holds, it can be shown that f_(y) = f_^(y)\mathbb_ X(t) = y/math>. So, rejection sampling prescribes that one samples from a standard Brownian motion and accept with probability \frac\frac = \frac\mathbb_ X(t) = y/math>.


Choice of tilting parameter


Siegmund's algorithm

Assume i.i.d. X's with light tailed distribution and \mathbb > 0. In order to estimate \psi(c) = \mathbb(\tau(c) < \infty) where \tau(c) = \inf\, when c is large and hence \psi(c) small, the algorithm uses exponential tilting to derive the importance distribution. The algorithm is used in many aspects, such as sequential tests, G/G/1 queue waiting times, and \psi is used as the probability of ultimate ruin in
ruin theory In actuarial science and applied probability, ruin theory (sometimes risk theory or collective risk theory) uses mathematical models to describe an insurer's vulnerability to insolvency/ruin. In such models key quantities of interest are the prob ...
. In this context, it is logical to ensure that \mathbb_\theta(\tau(c) < \infty) = 1. The criterion \theta > \theta_0, where \theta_0 is s.t. \kappa'(\theta_0) = 0 achieves this. Siegmund's algorithm uses \theta = \theta^*, if it exists, where \theta^* is defined in the following way: \kappa(\theta^*) = 0. It has been shown that \theta^* is the only tilting parameter producing bounded relative error (\underset\frac < \infty).


Black-Box algorithms

We can only see the input and output of a black box, without knowing its structure. The algorithm is to use only minimal information on its structure. When we generate random numbers, the output may not be within the same common parametric class, such as normal or exponential distributions. An automated way may be used to perform ECM. Let X_1, X_2,...be i.i.d. r.v.’s with distribution G; for simplicity we assume X\geq 0. Define \mathfrak_n = \sigma(X_1,...,X_n,U_1,..., U_n) , where U_1, U_2, . . . are independent (0, 1) uniforms. A randomized stopping time for X_1, X_2, . . . is then a stopping time w.r.t. the filtration \, . . . Let further \mathfrak be a class of distributions G on [0, \infty) with k_G = \int_0^\infty e^G(dx) < \infty and define G_\theta by \frac = e^. We define a black-box algorithm for ECM for the given \theta and the given class \mathfrakof distributions as a pair of a randomized stopping time \tau and an \mathfrak_\tau- measurable r.v. Z such that Z is distributed according to G_\theta for any G \in \mathfrak. Formally, we write this as \mathbb_G (Z for all x . In other words, the rules of the game are that the algorithm may use simulated values from G and additional uniforms to produce an r.v. from G_\theta .Asmussen, Soren & Glynn, Peter (2007). Stochastic Simulation. Springer. pp. 416–420.


See also

* Importance sampling * Rejection sampling * Monte Carlo method * Exponential family * Esscher transform


References

{{Reflist, 30em Sampling techniques