Dynamic programming is both a
mathematical optimization method and a computer programming method. The method was developed by
Richard Bellman
Richard Ernest Bellman (August 26, 1920 – March 19, 1984) was an American applied mathematician, who introduced dynamic programming in 1953, and made important contributions in other fields of mathematics, such as biomathematics. He founde ...
in the 1950s and has found applications in numerous fields, from
aerospace engineering to
economics
Economics () is the social science that studies the production, distribution, and consumption of goods and services.
Economics focuses on the behaviour and interactions of economic agents and how economies work. Microeconomics anal ...
.
In both contexts it refers to simplifying a complicated problem by breaking it down into simpler sub-problems in a
recursive manner. While some decision problems cannot be taken apart this way, decisions that span several points in time do often break apart recursively. Likewise, in computer science, if a problem can be solved optimally by breaking it into sub-problems and then recursively finding the optimal solutions to the sub-problems, then it is said to have ''
optimal substructure''.
If sub-problems can be nested recursively inside larger problems, so that dynamic programming methods are applicable, then there is a relation between the value of the larger problem and the values of the sub-problems.
[Cormen, T. H.; Leiserson, C. E.; Rivest, R. L.; Stein, C. (2001), Introduction to Algorithms (2nd ed.), MIT Press & McGraw–Hill, . pp. 344.] In the optimization literature this relationship is called the
Bellman equation.
Overview
Mathematical optimization
In terms of mathematical optimization, dynamic programming usually refers to simplifying a decision by breaking it down into a sequence of decision steps over time. This is done by defining a sequence of value functions ''V''
1, ''V''
2, ..., ''V''
''n'' taking ''y'' as an argument representing the
state of the system at times ''i'' from 1 to ''n''. The definition of ''V''
''n''(''y'') is the value obtained in state ''y'' at the last time ''n''. The values ''V''
''i'' at earlier times ''i'' = ''n'' −1, ''n'' − 2, ..., 2, 1 can be found by working backwards, using a
recursive relationship called the
Bellman equation. For ''i'' = 2, ..., ''n'', ''V''
''i''−1 at any state ''y'' is calculated from ''V''
''i'' by maximizing a simple function (usually the sum) of the gain from a decision at time ''i'' − 1 and the function ''V''
''i'' at the new state of the system if this decision is made. Since ''V''
''i'' has already been calculated for the needed states, the above operation yields ''V''
''i''−1 for those states. Finally, ''V''
1 at the initial state of the system is the value of the optimal solution. The optimal values of the decision variables can be recovered, one by one, by tracking back the calculations already performed.
Control theory
In
control theory, a typical problem is to find an admissible control
which causes the system
to follow an admissible trajectory
on a continuous time interval
that minimizes a
cost function
:
The solution to this problem is an optimal control law or policy
, which produces an optimal trajectory
and a
cost-to-go function . The latter obeys the fundamental equation of dynamic programming:
:
a
partial differential equation known as the
Hamilton–Jacobi–Bellman equation In optimal control theory, the Hamilton-Jacobi-Bellman (HJB) equation gives a necessary and sufficient condition for optimality of a control with respect to a loss function. It is, in general, a nonlinear partial differential equation in the val ...
, in which
and
. One finds that minimizing
in terms of
,
, and the unknown function
and then substitutes the result into the Hamilton–Jacobi–Bellman equation to get the partial differential equation to be solved with boundary condition
. In practice, this generally requires
numerical techniques for some discrete approximation to the exact optimization relationship.
Alternatively, the continuous process can be approximated by a discrete system, which leads to a following recurrence relation analog to the Hamilton–Jacobi–Bellman equation:
:
at the
-th stage of
equally spaced discrete time intervals, and where
and
denote discrete approximations to
and
. This functional equation is known as the
Bellman equation, which can be solved for an exact solution of the discrete approximation of the optimization equation.
Example from economics: Ramsey's problem of optimal saving
In economics, the objective is generally to maximize (rather than minimize) some dynamic
social welfare function. In Ramsey's problem, this function relates amounts of consumption to levels of
utility
As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosophe ...
. Loosely speaking, the planner faces the trade-off between contemporaneous consumption and future consumption (via investment in
capital stock that is used in production), known as
intertemporal choice. Future consumption is discounted at a constant rate
. A discrete approximation to the transition equation of capital is given by
:
where
is consumption,
is capital, and
is a
production function satisfying the
Inada conditions
In macroeconomics, the Inada conditions, named after Japanese economist Ken-Ichi Inada, are assumptions about the shape of a function, usually applied to a production function or a utility function. When the production function of a neoclassica ...
. An initial capital stock
is assumed.
Let
be consumption in period , and assume consumption yields
utility
As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosophe ...
as long as the consumer lives. Assume the consumer is impatient, so that he
discounts future utility by a factor each period, where