Directional component analysis (DCA) is a statistical method used in climate science for identifying representative patterns of variability in space-time data-sets such as historical climate observations, weather prediction ensembles or climate ensembles. The first DCA pattern is a pattern of weather or climate variability that is both likely to occur (measured using

likelihood A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the j ...

) and has a large impact (for a specified linear impact function, and given certain mathematical conditions: see below). The first DCA pattern contrasts with the first PCA pattern, which is likely to occur, but may not have a large impact, and with a pattern derived from the

gradient In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...

of the impact function, which has a large impact, but may not be likely to occur. DCA differs from other pattern identification methods used in climate research, such as EOFs, rotated EOFs and extended EOFs in that it takes into account an external vector, the gradient of the impact. DCA provides a way to reduce large ensembles from

weather forecasts Weather is the state of the atmosphere, describing for example the degree to which it is hot or cold, wet or dry, calm or stormy, clear or cloud cover, cloudy. On Earth, most weather phenomena occur in the lowest layer of the planet's atmo ...

or climate models to just two patterns. The first pattern is the ensemble mean, and the second pattern is the DCA pattern, which represents variability around the ensemble mean in a way that takes impact into account. DCA contrasts with other methods that have been proposed for the reduction of ensembles in that it takes impact into account in addition to the structure of the ensemble.

Overview

Inputs

DCA is calculated from two inputs: * a multivariate dataset of weather or climate data, such as historical climate observations, or a weather or climate ensemble * a linear impact function. The linear impact function is a function which defines a level of impact for every spatial pattern in the weather or climate data as a weighted sum of the values at different locations in the spatial pattern. An example is the mean value across the spatial pattern. The linear impact function can be generated as the first term in the multivariate

Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...

of a non-linear impact function.

Formula

Consider a space-time data set

X

, containing individual spatial pattern vectors

x

, where the individual patterns are each considered as single samples from a

multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One d ...

with mean zero and

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...

C

. We define a linear impact function of a spatial pattern as

r^tx

, where

r

is a vector of spatial weights. The first DCA pattern is given in terms the covariance matrix

C

and the weights

r

by the proportional expression

x \propto Cr

. The pattern can then be normalized to any length as required.

Properties

If the weather or climate data is elliptically distributed (e.g., is distributed as a

or a

multivariate t-distribution In statistics, the multivariate ''t''-distribution (or multivariate Student distribution) is a multivariate probability distribution. It is a generalization to random vectors of the Student's ''t''-distribution, which is a distribution applica ...

) then the first DCA pattern (DCA1) is defined as the spatial pattern with the following mathematical properties: * DCA1 maximises probability density for a given value of impact * DCA1 maximises impact for a given value of probability density * DCA1 maximises the product of impact and probability density * DCA1 is the conditional expectation, conditional on exceeding a certain level of impact * DCA1 is the impact-weighted ensemble mean * Any modification of DCA1 will lead to a pattern that is either less extreme, or has a lower probability density.

Rainfall Example

For instance, in a rainfall anomaly dataset, using an impact metric defined as the total rainfall anomaly, the first DCA pattern is the spatial pattern that has the highest probability density for a given total rainfall anomaly. If the given total rainfall anomaly is chosen to have a large value, then this pattern combines being extreme in terms of the metric (i.e., representing large amounts of total rainfall) with being likely in terms of the pattern, and so is well suited as a representative extreme pattern.

Comparison with PCA

The main differences between

Principal component analysis Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data is linearly transformed onto a new coordinate system such that th ...

(PCA) and DCA are * PCA is a function of just the covariance matrix, and the first PCA pattern is defined so as to maximise explained variance * DCA is a function of the covariance matrix and a vector direction (the gradient of the impact function), and the first DCA pattern is defined so as to maximise probability density for a given value of the impact metric As a result, for unit vector spatial patterns: * The first PCA spatial pattern always corresponds to a higher explained variance, but has a lower value of the impact metric (e.g., the total rainfall anomaly), except in degenerate cases * The first DCA spatial pattern always corresponds to a higher value of the impact metric, but has a lower value of the explained variance, except in degenerate cases The degenerate cases occur when the PCA and DCA patterns are equal. Also, given the first PCA pattern, the DCA pattern can be scaled so that: * The scaled DCA pattern has the same probability density as the first PCA pattern, but higher impact, or * The scaled DCA pattern has the same impact as the first PCA pattern, but higher probability density.

Two Dimensional Example

Source:

Figure 1 gives an example, which can be understood as follows: * The two axes represent anomalies of annual mean rainfall at two locations, with the highest total rainfall anomaly values towards the top right corner of the diagram * The joint variability of the rainfall anomalies at the two locations is assumed to follow a bivariate normal distribution * The ellipse shows a single contour of probability density from this bivariate normal, with higher values inside the ellipse * The red dot at the centre of the ellipse shows zero rainfall anomalies at both locations * The blue parallel-line arrow shows the principal axis of the ellipse, which is also the first PCA spatial pattern vector * In this case, the PCA pattern is scaled so that it touches the ellipse * The diagonal straight line shows a line of constant positive total rainfall anomaly, assumed to be at some fairly extreme level * The red dotted-line arrow shows the first DCA pattern, which points towards the point at which the diagonal line is tangent to the ellipse * In this case, the DCA pattern is scaled so that it touches the ellipse From this diagram, the DCA pattern can be seen to possess the following properties: * Of all the points on the diagonal line, it is the one with the highest probability density * Of all the points on the ellipse, it is the one with the highest total rainfall anomaly * It has the same probability density as the PCA pattern, but represents higher total rainfall (i.e., points further towards the top right hand corner of the diagram) * Any change of the DCA pattern will reduce either the probability density (if it moves out of the ellipse) or reduce the total rainfall anomaly (if it moves along or into the ellipse) In this case the total rainfall anomaly of the PCA pattern is quite small, because of anticorrelations between the rainfall anomalies at the two locations. As a result, the first PCA pattern is not a good representative example of a pattern with large total rainfall anomaly, while the first DCA pattern is. In

n

dimensions the ellipse becomes an ellipsoid, the diagonal line becomes an

n-1

dimensional plane, and the PCA and DCA patterns are vectors in

n

dimensions.

Applications

Application to Climate Variability

DCA has been applied to the CRU data-set of historical rainfall variability in order to understand the most likely patterns of rainfall extremes in the US and China.

Application to Ensemble Weather Forecasts

DCA has been applied to ECMWF medium-range weather forecast ensembles in order to identify the most likely patterns of extreme temperatures in the ensemble forecast.

Application to Ensemble Climate Model Projections

DCA has been applied to ensemble climate model projections in order to identify the most likely patterns of extreme future rainfall.

Derivation of the First DCA Pattern

Source: Consider a space-time data-set

X

, containing individual spatial pattern vectors

x

, where the individual patterns are each considered as single samples from a multivariate normal distribution with mean zero and covariance matrix

C

. As a function of

x

, the log probability density is proportional to

-x^t C^ x

. We define a linear impact function of a spatial pattern as

r^tx

, where

r

is a vector of spatial weights. We then seek to find the spatial pattern that maximises the probability density for a given value of the linear impact function. This is equivalent to finding the spatial pattern that maximises the ''log'' probability density for a given value of the linear impact function, which is slightly easier to solve. This is a constrained maximisation problem, and can be solved using the method of

Lagrange multiplier In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function (mathematics), function subject to constraint (mathematics), equation constraints (i.e., subject to the conditio ...

s. The Lagrangian function is given by

L(x,\lambda)=-x^t C^x-\lambda(r^tx-1)

Differentiating by

x

and setting to zero gives the solution

x \propto Cr

Normalising so that

x

is unit vector gives

x = Cr / (r^tCCr)^

This is the first DCA pattern. Subsequent patterns can be derived which are orthogonal to the first, to form an orthonormal set and a method for matrix factorisation.

References

{{Reflist, refs= {{cite journal , first1=J. , last1=Evans , first2=F. , last2=Ji , first3=G. , last3=Abramowitz , first4=M. , last4=Ekstrom , year=2013 , title=Optimally choosing small ensemble members to produce robust climate simulations , journal=Environmental Research Letters , volume=8 , issue=4 , page=044050 , doi=10.1088/1748-9326/8/4/044050 , bibcode=2013ERL.....8d4050E , s2cid=155021417 , doi-access=free , url=https://unsworks.unsw.edu.au/bitstreams/f00af8be-f420-43c8-bae1-21fc1fa98374/download , hdl=1959.4/unsworks_34663 , hdl-access=free {{cite journal , first1=K. , last1=Fraedrich , first2=J. , last2=McBride , first3=W. , last3=Frank , first4=R. , last4=Wang , year=1997 , title=Extended EOF Analysis of Tropical Disturbances: TOGA COARE , journal=Journal of the Atmospheric Sciences , volume=41 , issue=19 , page=2363 , doi=10.1175/1520-0469(1997)054<2363:EEAOTD>2.0.CO;2 , bibcode=1997JAtS...54.2363F , doi-access=free {{cite journal , first1=I. , last1=Harris , first2=P. , last2=Jones , first3=T. , last3=Osborn , first4=D. , last4=Lister , year=2013 , title=Updated high-resolution grids of monthly climatic observations— The CRU TS3.10 Dataset , journal=International Journal of Climatology , volume=34 , issue=3 , page=623 , doi=10.1002/joc.3711 , bibcode=2014IJCli..34..623H , s2cid=54866679 , url=https://ueaeprints.uea.ac.uk/id/eprint/47192/1/badc_work_supplementary_information_post_review2_final.pdf {{cite journal , first1=A. , last1=Hannachi , first2=I. , last2=Jolliffe , first3=D. , last3=Stephenson , year=2007 , title=Empirical orthogonal functions and related techniques in atmospheric science: A review , journal=International Journal of Climatology , volume=27 , issue=9 , page=1119 , doi=10.1002/joc.1499 , bibcode=2007IJCli..27.1119H , s2cid=52232574 {{cite journal , first1=N. , last1=Herger , first2=G. , last2=Abramowitz , first3=R. , last3=Knutti , first4=O. , last4=Angelil , first5=K. , last5=Lehmann , first6=B. , last6=Sanderson , year=2017 , title=Selecting a climate model subset to optimise key ensemble properties , journal=Earth System Dynamics , volume=9 , pages=135–151 , doi=10.5194/esd-9-135-2018 , hdl=20.500.11850/246202 , hdl-access=free , doi-access=free {{cite journal , first1=S. , last1=Jewson , year=2020 , title=An Alternative to PCA for Estimating Dominant Patterns of Climate Variability and Extremes, with Application to U.S. and China Seasonal Rainfall , journal=Atmosphere , volume=11 , issue=4 , page=354 , doi=10.3390/atmos11040354 , bibcode=2020Atmos..11..354J , doi-access=free , url=https://www.preprints.org/manuscript/202002.0073/v1/download {{cite journal , first1=A. , last1=Mestas-Nunez , year=2000 , title=Orthogonality properties of rotated empirical modes , journal=International Journal of Climatology , volume=20 , issue=12 , pages=1509–1516 , doi=10.1002/1097-0088(200010)20:12<1509::AID-JOC553>3.0.CO;2-Q {{cite journal , first1=S. , last1=Scher , first2=S. , last2=Jewson , first3=G. , last3=Messori , year=2021 , title=Robust Worst-Case Scenarios from Ensemble Forecasts , journal=Weather and Forecasting , volume=36 , issue=4 , pages=1357–1373 , doi=10.1175/WAF-D-20-0219.1 , bibcode=2021WtFor..36.1357S , s2cid=236300040 , doi-access=free {{cite journal , first1=S. , last1=Jewson , first2=G. , last2=Messori , first3=G. , last3=Barbato , first4=P. , last4=Mercogliano , first5=J. , last5=Mysiak , first6=M. , last6=Sassi , year=2022 , title=Developing Representative Impact Scenarios From Climate Projection Ensembles, With Application to UKCP18 and EURO-CORDEX Precipitation , journal=Journal of Advances in Modeling Earth Systems , volume=15 , issue=1 , doi=10.1029/2022MS003038 , s2cid=254965361 , url=https://uu.diva-portal.org/smash/get/diva2:1723042/FULLTEXT02 , doi-access=free Climate and weather statistics Numerical climate and weather models Data analysis Multivariate statistics Climate