In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, a power transform is a family of functions applied to create a
monotonic transformation of data using
power functions. It is a
data transformation technique used to
stabilize variance, make the data more
normal distribution-like, improve the validity of measures of association (such as the
Pearson correlation
In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ...
between variables), and for other data stabilization procedures.
Power transforms are used in multiple fields, including
multi-resolution and wavelet analysis, statistical data analysis, medical research, modeling of physical processes,
geochemical data analysis,
epidemiology and many other clinical, environmental and social research areas.
Definition
The power transformation is defined as a continuously varying function, with respect to the power parameter ''λ'', in a piece-wise function form that makes it continuous at the point of singularity (''λ'' = 0). For data vectors (''y''
1,..., ''y''
''n'') in which each ''y''
''i'' > 0, the power transform is
:
where
:
is the
geometric mean
In mathematics, the geometric mean is a mean or average which indicates a central tendency of a set of numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometric mean is defined as the ...
of the observations ''y''
1, ..., ''y''
''n''. The case for
is the limit as
approaches 0. To see this, note that
. Then
, and everything but
becomes negligible for
sufficiently small.
The inclusion of the (''λ'' − 1)th power of the geometric mean in the denominator simplifies the
scientific interpretation of any equation involving , because the units of measurement do not change as ''λ'' changes.
Box and
Cox (1964) introduced the geometric mean into this transformation by first including the
Jacobian
In mathematics, a Jacobian, named for Carl Gustav Jacob Jacobi, may refer to:
*Jacobian matrix and determinant
*Jacobian elliptic functions
*Jacobian variety
*Intermediate Jacobian
In mathematics, the intermediate Jacobian of a compact Kähler m ...
of rescaled power transformation
:
with the likelihood. This Jacobian is as follows:
:
This allows the normal
log likelihood at its maximum to be written as follows:
:
From here, absorbing
into the expression for
produces an expression that establishes that minimizing the sum of squares of
residuals from
is equivalent to maximizing the sum of the normal
log likelihood of deviations from
and the log of the Jacobian of the transformation.
The value at ''Y'' = 1 for any ''λ'' is 0, and the
derivative with respect to ''Y'' there is 1 for any ''λ''. Sometimes ''Y'' is a version of some other variable scaled to give ''Y'' = 1 at some sort of average value.
The transformation is a
power transformation, but done in such a way as to make it
continuous with the parameter ''λ'' at ''λ'' = 0. It has proved popular in
regression analysis, including
econometrics.
Box and Cox also proposed a more general form of the transformation that incorporates a shift parameter.
:
which holds if ''y''
''i'' + α > 0 for all ''i''. If τ(''Y'', λ, α) follows a
truncated normal distribution, then ''Y'' is said to follow a
Box–Cox distribution
In statistics, the Box–Cox distribution (also known as the power-normal distribution) is the distribution of a random variable ''X'' for which the Box–Cox transformation on ''X'' follows a truncated normal distribution. It is a continuous pr ...
.
Bickel and Doksum eliminated the need to use a
truncated distribution by extending the range of the transformation to all ''y'', as follows:
:
where sgn(.) is the
sign function
In mathematics, the sign function or signum function (from '' signum'', Latin for "sign") is an odd mathematical function that extracts the sign of a real number. In mathematical expressions the sign function is often represented as . To avoi ...
. This change in definition has little practical import as long as
is less than
, which it usually is.
Bickel and Doksum also proved that the parameter estimates are
consistent and
asymptotically normal under appropriate regularity conditions, though the standard
Cramér–Rao lower bound can substantially underestimate the variance when parameter values are small relative to the noise variance.
However, this problem of underestimating the variance may not be a substantive problem in many applications.
Box–Cox transformation
The one-parameter Box–Cox transformations are defined as
:
and the two-parameter Box–Cox transformations as
:
as described in the original article.
Moreover, the first transformations hold for
, and the second for
.
The parameter
is estimated using the
profile likelihood
The likelihood function (often simply called the likelihood) represents the probability of Realization (probability), random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a Sample (st ...
function and using goodness-of-fit tests.
Confidence interval
Confidence interval for the Box–Cox transformation can be
asymptotically constructed using
Wilks's theorem on the
profile likelihood
The likelihood function (often simply called the likelihood) represents the probability of Realization (probability), random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a Sample (st ...
function to find all the possible values of
that fulfill the following restriction:
:
Example
The BUPA liver data set contains data on liver enzymes
ALT and
γGT. Suppose we are interested in using log(γGT) to predict ALT. A plot of the data appears in panel (a) of the figure. There appears to be non-constant variance, and a Box–Cox transformation might help.
image:BUPA BoxCox.JPG
The log-likelihood of the power parameter appears in panel (b). The horizontal reference line is at a distance of χ
12/2 from the maximum and can be used to read off an approximate 95% confidence interval for λ. It appears as though a value close to zero would be good, so we take logs.
Possibly, the transformation could be improved by adding a shift parameter to the log transformation. Panel (c) of the figure shows the log-likelihood. In this case, the maximum of the likelihood is close to zero suggesting that a shift parameter is not needed. The final panel shows the transformed data with a superimposed regression line.
Note that although Box–Cox transformations can make big improvements in model fit, there are some issues that the transformation cannot help with. In the current example, the data are rather heavy-tailed so that the assumption of normality is not realistic and a
robust regression approach leads to a more precise model.
Econometric application
Economists often characterize production relationships by some variant of the Box–Cox transformation.
Consider a common representation of production ''Q'' as dependent on services provided by a capital stock ''K'' and by labor hours ''N'':
:
Solving for ''Q'' by inverting the Box–Cox transformation we find
:
which is known as the ''
constant elasticity of substitution (CES)'' production function.
The CES production function is a
homogeneous function
In mathematics, a homogeneous function is a function of several variables such that, if all its arguments are multiplied by a scalar, then its value is multiplied by some power of this scalar, called the degree of homogeneity, or simply the ''deg ...
of degree one.
When ''λ'' = 1, this produces the linear production function:
:
When ''λ'' → 0 this produces the famous
Cobb–Douglas production function:
:
Activities and demonstrations
The
SOCR
The Statistics Online Computational Resource (SOCR) is an online multi-institutional research and education organization. SOCR designs, validates and broadly shares a suite of online tools for statistical computing, and interactive materials for ...
resource pages contain a number of hands-on interactive activities
Power Transform Family Graphs
SOCR webpages demonstrating the Box–Cox (power) transformation using Java applets and charts. These directly illustrate the effects of this transform on Q–Q plots, X–Y scatterplots, time-series plots and histogram
A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or "bucket") the range of values—that is, divide the ent ...
s.
Yeo–Johnson transformation
The Yeo–Johnson transformation
allows also for zero and negative values of .
can be any real number, where produces the identity transformation.
The transformation law reads:
:
Notes
References
*
*
*
*
*
*
External links
* {{SpringerEOM , title=Box–Cox transformation , id=B/b110790 , first=R. , last=Nishii
fixed link
* Sanford Weisberg
Yeo-Johnson Power Transformations
Normal distribution
Statistical data transformation