The Hamilton-Jacobi-Bellman (HJB) equation is a

nonlinear partial differential equation In mathematics and physics, a nonlinear partial differential equation is a partial differential equation with nonlinear system, nonlinear terms. They describe many different physical systems, ranging from gravitation to fluid dynamics, and have b ...

that provides

necessary and sufficient condition In logic and mathematics, necessity and sufficiency are terms used to describe a conditional or implicational relationship between two statements. For example, in the conditional statement: "If then ", is necessary for , because the truth of ...

s for optimality of a control with respect to a

loss function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...

. Its solution is the

value function The value function of an optimization problem gives the value attained by the objective function at a solution, while only depending on the parameters of the problem. In a controlled dynamical system, the value function represents the optimal payo ...

of the optimal control problem which, once known, can be used to obtain the optimal control by taking the maximizer (or minimizer) of the

Hamiltonian Hamiltonian may refer to: * Hamiltonian mechanics, a function that represents the total energy of a system * Hamiltonian (quantum mechanics), an operator corresponding to the total energy of that system ** Dyall Hamiltonian, a modified Hamiltonian ...

involved in the HJB equation. The equation is a result of the theory of dynamic programming which was pioneered in the 1950s by

Richard Bellman Richard Ernest Bellman (August 26, 1920 – March 19, 1984) was an American applied mathematician, who introduced dynamic programming in 1953, and made important contributions in other fields of mathematics, such as biomathematics. He foun ...

and coworkers. The connection to the

Hamilton–Jacobi equation In physics, the Hamilton–Jacobi equation, named after William Rowan Hamilton and Carl Gustav Jacob Jacobi, is an alternative formulation of classical mechanics, equivalent to other formulations such as Newton's laws of motion, Lagrangian mecha ...

from

classical physics Classical physics refers to physics theories that are non-quantum or both non-quantum and non-relativistic, depending on the context. In historical discussions, ''classical physics'' refers to pre-1900 physics, while '' modern physics'' refers to ...

was first drawn by Rudolf Kálmán. In

discrete-time In mathematical dynamics, discrete time and continuous time are two alternative frameworks within which variables that evolve over time are modeled. Discrete time Discrete time views values of variables as occurring at distinct, separate "poi ...

problems, the analogous

difference equation In mathematics, a recurrence relation is an equation according to which the nth term of a sequence of numbers is equal to some combination of the previous terms. Often, only k previous terms of the sequence appear in the equation, for a parameter ...

is usually referred to as the

Bellman equation A Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical Optimization (mathematics), optimization method known as dynamic programming. It writes the "value" of a decision problem ...

. While classical variational problems, such as the

brachistochrone problem In physics and mathematics, a brachistochrone curve (), or curve of fastest descent, is the one lying on the plane between a point ''A'' and a lower point ''B'', where ''B'' is not directly below ''A'', on which a bead slides frictionlessly under ...

, can be solved using the Hamilton–Jacobi–Bellman equation, the method can be applied to a broader spectrum of problems. Further it can be generalized to

stochastic Stochastic (; ) is the property of being well-described by a random probability distribution. ''Stochasticity'' and ''randomness'' are technically distinct concepts: the former refers to a modeling approach, while the latter describes phenomena; i ...

systems, in which case the HJB equation is a second-order

elliptic partial differential equation In mathematics, an elliptic partial differential equation is a type of partial differential equation (PDE). In mathematical modeling, elliptic PDEs are frequently used to model steady states, unlike parabolic PDE and hyperbolic PDE which gene ...

. A major drawback, however, is that the HJB equation admits classical solutions only for a sufficiently smooth value function, which is not guaranteed in most situations. Instead, the notion of a viscosity solution is required, in which conventional derivatives are replaced by (set-valued)

subderivative In mathematics, the subderivative (or subgradient) generalizes the derivative to convex functions which are not necessarily differentiable. The set of subderivatives at a point is called the subdifferential at that point. Subderivatives arise in c ...

Optimal Control Problems

Consider the following problem in deterministic optimal control over the time period

,T /math>:

: V(x(0), 0) = \min_u \left\ where C cdot /math> is the scalar cost rate function and D cdot /math> is a function that gives the bequest value at the final state, x(t) is the system state vector, x(0) is assumed given, and u(t) for 0 \leq t \leq T is the control vector that we are trying to find.  Thus, V(x, t) is the

. The system must also be subject to :

\dot(t)=F (t),u(t) \,

where

F cdot /math> gives the vector determining physical evolution of the state vector over time.

The Partial Differential Equation

For this simple system, the Hamilton–Jacobi–Bellman partial differential equation is :

\frac + \min_u \left\ = 0

subject to the terminal condition :

V(x,T) = D(x),\,

As before, the unknown scalar function

V(x, t)

in the above partial differential equation is the Bellman

, which represents the cost incurred from starting in state

x

at time

t

and controlling the system optimally from then until time

T

Deriving the Equation

Intuitively, the HJB equation can be derived as follows. If

V(x(t), t)

is the optimal cost-to-go function (also called the 'value function'), then by Richard Bellman's principle of optimality, going from time ''t'' to ''t'' + ''dt'', we have :

V(x(t), t) = \min_u \left\.

Note that the

Taylor expansion In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...

of the first term on the right-hand side is :

V(x(t+dt), t+dt) = V(x(t), t) + \frac \, dt + \frac \cdot \dot(t) \, dt + \mathcal(dt),

where

\mathcal(dt)

denotes the terms in the Taylor expansion of higher order than one in little-''o'' notation. Then if we subtract

V(x(t), t)

from both sides, divide by ''dt'', and take the limit as ''dt'' approaches zero, we obtain the HJB equation defined above.

Solving the Equation

The HJB equation is usually solved backwards in time, starting from

t = T

and ending at

t = 0

. When solved over the whole of state space and

V(x)

is continuously differentiable, the HJB equation is a

for an optimum when the terminal state is unconstrained. If we can solve for

V

then we can find from it a control

u

that achieves the minimum cost. In general case, the HJB equation does not have a classical (smooth) solution. Several notions of generalized solutions have been developed to cover such situations, including viscosity solution ( Pierre-Louis Lions and Michael Crandall), minimax solution (), and others. Approximate dynamic programming has been introduced by D. P. Bertsekas and J. N. Tsitsiklis with the use of

artificial neural network In machine learning, a neural network (also artificial neural network or neural net, abbreviated ANN or NN) is a computational model inspired by the structure and functions of biological neural networks. A neural network consists of connected ...

s (

multilayer perceptron In deep learning, a multilayer perceptron (MLP) is a name for a modern feedforward neural network consisting of fully connected neurons with nonlinear activation functions, organized in layers, notable for being able to distinguish data that is ...

s) for approximating the Bellman function in general. This is an effective mitigation strategy for reducing the impact of dimensionality by replacing the memorization of the complete function mapping for the whole space domain with the memorization of the sole neural network parameters. In particular, for continuous-time systems, an approximate dynamic programming approach that combines both policy iterations with neural networks was introduced. In discrete-time, an approach to solve the HJB equation combining value iterations and neural networks was introduced. Alternatively, it has been shown that sum-of-squares optimization can yield an approximate polynomial solution to the Hamilton–Jacobi–Bellman equation arbitrarily well with respect to the

L^1

norm.

Extension to Stochastic Problems

The idea of solving a control problem by applying Bellman's principle of optimality and then working out backwards in time an optimizing strategy can be generalized to stochastic control problems. Consider similar as above :

\min_u \mathbb E  \left\

now with

(X_t)_\,\!

the stochastic process to optimize and

(u_t)_\,\!

the steering. By first using Bellman and then expanding

V(X_t,t)

with Itô's rule, one finds the stochastic HJB equation :

\min_u \left\ = 0,

where

\mathcal

represents the stochastic differentiation operator, and subject to the terminal condition :

V(x,T) = D(x)\,\!.

Note that the randomness has disappeared. In this case a solution

V\,\!

of the latter does not necessarily solve the primal problem, it is a candidate only and a further verifying argument is required. This technique is widely used in Financial Mathematics to determine optimal investment strategies in the market (see for example

Merton's portfolio problem Merton's portfolio problem is a problem in continuous-time finance and in particular intertemporal portfolio choice. An investor must choose how much to consume and must allocate their wealth between stocks and a risk-free asset so as to maximiz ...

Application to LQG-Control

As an example, we can look at a system with linear stochastic dynamics and quadratic cost. If the system dynamics is given by :

dx_t = (a x_t + b u_t) dt + \sigma dw_t,

and the cost accumulates at rate

C(x_t,u_t) = r(t) u_t^2/2 + q(t) x_t^2/2

, the HJB equation is given by :

-\frac = \fracq(t) x^2 + \frac a x - \frac \left(\frac\right)^2 + \frac \frac.

with optimal action given by :

u_t = -\frac\frac

Assuming a quadratic form for the value function, we obtain the usual

Riccati equation In mathematics, a Riccati equation in the narrowest sense is any first-order ordinary differential equation that is quadratic in the unknown function. In other words, it is an equation of the form y'(x) = q_0(x) + q_1(x) \, y(x) + q_2(x) \, y^2( ...

for the Hessian of the value function as is usual for Linear-quadratic-Gaussian control.

Optimal Control Problems

The Partial Differential Equation

Deriving the Equation

Solving the Equation

Extension to Stochastic Problems

Application to LQG-Control

See also

References

Further reading