The adjoint state method is a

numerical method In numerical analysis, a numerical method is a mathematical tool designed to solve numerical problems. The implementation of a numerical method with an appropriate convergence check in a programming language is called a numerical algorithm. Mathem ...

for efficiently computing the gradient of a function or

operator Operator may refer to: Mathematics * A symbol indicating a mathematical operation * Logical operator or logical connective in mathematical logic * Operator (mathematics), mapping that acts on elements of a space to produce elements of another ...

in a

numerical optimization problem Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

. It has applications in geophysics, seismic imaging, photonics and more recently in

neural networks A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...

. The adjoint state space is chosen to simplify the physical interpretation of equation

constraint Constraint may refer to: * Constraint (computer-aided design), a demarcation of geometrical characteristics between two or more entities or solid modeling bodies * Constraint (mathematics), a condition of an optimization problem that the solution ...

s.Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International,2006,167(2): 495-503
free access on GJI website
/ref> Adjoint state techniques allow the use of integration by parts, resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable. The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient. The adjoint method is derived from the dual problem and is used e.g. in the Landweber iteration method. The name adjoint state method refers to the

dual Dual or Duals may refer to: Paired/two things * Dual (mathematics), a notion of paired concepts that mirror one another ** Dual (category theory), a formalization of mathematical duality *** see more cases in :Duality theories * Dual (grammatical ...

form of the problem, where the

adjoint matrix In mathematics, the conjugate transpose, also known as the Hermitian transpose, of an m \times n complex matrix \boldsymbol is an n \times m matrix obtained by transposing \boldsymbol and applying complex conjugate on each entry (the complex ...

A^*=\overline A ^T

is used. When the initial problem consists of calculating the product

s^T x

and

x

must satisfy

Ax=b

, the dual problem can be realized as calculating the product , where

r

must satisfy

A^* r = s

. And

r

is called the adjoint state vector.

General case

The original adjoint calculation method goes back to Jean Cea, with the use of the lagrangian of the optimization problem to compute the derivative of a

functional Functional may refer to: * Movements in architecture: ** Functionalism (architecture) ** Form follows function * Functional group, combination of atoms within molecules * Medical conditions without currently visible organic basis: ** Functional sy ...

with respect to a shape parameter. For a state variable

u \in \mathcal

, an optimization variable

v\in\mathcal

, an objective functional

J:\mathcal\times\mathcal\to\mathbb

is defined. The state variable

u

is often implicitly dependant on

v

through the (direct) state equation

D_v(u)=0

(usually the weak form of a

partial differential equation In mathematics, a partial differential equation (PDE) is an equation which imposes relations between the various partial derivatives of a Multivariable calculus, multivariable function. The function is often thought of as an "unknown" to be sol ...

), thus the considered objective is

j(v) = J(u_v,v)

. Usually, one would be interested in calculating

\nabla j(v)

using the chain rule: :

\nabla j(v) = \nabla_v J(u_v,v) + \nabla_u J(u_v)\nabla_v u_v.

Unfortunately, the term

\nabla_v u_v

is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of

j

, the problem :

\text\ j(v) = J(u_v,v)

\text\ D_v(u_v) = 0

has an associate lagrangian functional

\mathcal:\mathcal\times\mathcal\times\mathcal \to \mathbb

defined by :

\mathcal(u,v,\lambda) = J(u,v) + \langle D_v(u),\lambda\rangle,

where

\lambda\in\mathcal

is a Lagrange multiplier or adjoint state variable and

\langle\cdot,\cdot\rangle

is an inner product on

\mathcal

. The method of Lagrange multipliers states that a solution to the problem has to be a stationary point of the lagrangian, namely :

\begin
d_u\mathcal(u,v,\lambda;\delta_u) = d_uJ(u,v;\delta_u) + \langle \delta_u,D_v^*(\lambda)\rangle = 0 & \forall \delta_u \in \mathcal,\\
d_v\mathcal(u,v,\lambda;\delta_v) = d_vJ(u,v;\delta_v) + \langle d_vD_v(u;\delta_v),\lambda\rangle = 0 & \forall \delta_v\in\mathcal,\\
d_\lambda\mathcal(u,v,\lambda;\delta_\lambda) = \langle D_v(u), \delta_\lambda\rangle = 0 \quad & \forall \delta_\lambda \in \mathcal,
\end

where

d_xF(x;\delta_x)

is the Gateaux derivative of

F

with respect to

x

in the direction

\delta_x

. The last equation is equivalent to

D_v(u) = 0

, the state equation, to which the solution is

u_v

. The first equation is the so-called adjoint state equation, :

\langle \delta_u,D_v^*(\lambda)\rangle = -d_uJ(u_v,v;\delta_u) \quad \forall \delta_u \in \mathcal,

because the operator involved is the adjoint operator of

D_v

D_v^*

. Resolving this equation yields the adjoint state

\lambda_v

. The gradient of the quantity of interest

j

with respect to

v

\langle \nabla j(v),\delta_v\rangle = d_v j(v;\delta_v) = d_v \mathcal(u_v,v,\lambda_v;\delta_v)

(the second equation with

u=u_v

and

\lambda=\lambda_v

), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator

D_v

is self-adjoint or symmetric since the direct and adjoint state equations differ only by their right-hand side.

Example: Linear case

In a real finite dimensional

linear programming Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear function#As a polynomial function, li ...

context, the objective function could be

J(u,v) = \langle Au , v \rangle

, for

v\in\mathbb^n

u\in\mathbb^m

and

A \in \mathbb^

, and let the state equation be

B_vu = b

, with

B_v \in \mathbb^

and

b\in \mathbb^m

. The lagrangian function of the problem is

\mathcal(u,v,\lambda) = \langle Au,v\rangle + \langle B_vu - b,\lambda\rangle

, where

\lambda\in\mathbb^m

. The derivative of

\mathcal

with respect to

\lambda

yields the state equation as shown before, and the state variable is

u_v = B_v^b

. The derivative of

\mathcal

with respect to

u

is equivalent to the adjoint equation, which is, for every

\delta_u \in \mathbb^m

, :

u;\delta_u)= - \langle A^\top v,\delta u\rangle \iff \langle B_v \delta_u,\lambda\rangle = - \langle A^\top v,\delta u\rangle \iff \langle B_v^\top \lambda + A^\top v,\delta_u\rangle = 0 \iff B_v^\top \lambda = - A^\top v.

Thus, we can write symbolically

\lambda_v = B_v^A^\top v

. The gradient would be :

\langle \nabla j(v),\delta_v\rangle = \langle Au_v,\delta_v\rangle + \langle \nabla_vB_v : \lambda_v\otimes u_v,\delta_v\rangle,

where

\nabla_v B_v = \frac

is a third order tensor,

\lambda_v \otimes u_v = \lambda^\top_v u_v

is the dyadic product between the direct and adjoint states and

:

denotes a double tensor contraction. It is assumed that

B_v

has a known analytic expression that can be differentiated easily.

Numerical consideration for the self-adjoint case

If the operator

B_v

was self-adjoint,

B_v = B_v^\top

, the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a LU decomposition can be used instead to solve the state equation, in

O(m^3)

operations for the decomposition and

O(m^2)

operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only

O(m^2)

operations since the matrices are the same.

References

External links

* A well written explanation by Errico
What is an adjoint Model?
* Another well written explanation with worked examples, written by Bradle

* More technical explanation:
review
of the adjoint-state method for computing the gradient of a functional with geophysical applications * MIT cours

* MIT note

Numerical analysis {{Mathapplied-stub

General case

Example: Linear case

Numerical consideration for the self-adjoint case

See also

References

External links