Dynamic programming is both a
mathematical optimization method and a computer programming method. The method was developed by
Richard Bellman in the 1950s and has found applications in numerous fields, from
aerospace engineering to
economics
Economics () is the social science that studies the production, distribution, and consumption of goods and services.
Economics focuses on the behaviour and interactions of economic agents and how economies work. Microeconomics analyzes ...
.
In both contexts it refers to simplifying a complicated problem by breaking it down into simpler sub-problems in a
recursive
Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
manner. While some decision problems cannot be taken apart this way, decisions that span several points in time do often break apart recursively. Likewise, in computer science, if a problem can be solved optimally by breaking it into sub-problems and then recursively finding the optimal solutions to the sub-problems, then it is said to have ''
optimal substructure''.
If sub-problems can be nested recursively inside larger problems, so that dynamic programming methods are applicable, then there is a relation between the value of the larger problem and the values of the sub-problems.
[Cormen, T. H.; Leiserson, C. E.; Rivest, R. L.; Stein, C. (2001), Introduction to Algorithms (2nd ed.), MIT Press & McGraw–Hill, . pp. 344.] In the optimization literature this relationship is called the
Bellman equation
A Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. It writes the "value" of a decision problem at a certain point in time ...
.
Overview
Mathematical optimization
In terms of mathematical optimization, dynamic programming usually refers to simplifying a decision by breaking it down into a sequence of decision steps over time. This is done by defining a sequence of value functions ''V''
1, ''V''
2, ..., ''V''
''n'' taking ''y'' as an argument representing the
state
State may refer to:
Arts, entertainment, and media Literature
* ''State Magazine'', a monthly magazine published by the U.S. Department of State
* ''The State'' (newspaper), a daily newspaper in Columbia, South Carolina, United States
* ''Our S ...
of the system at times ''i'' from 1 to ''n''. The definition of ''V''
''n''(''y'') is the value obtained in state ''y'' at the last time ''n''. The values ''V''
''i'' at earlier times ''i'' = ''n'' −1, ''n'' − 2, ..., 2, 1 can be found by working backwards, using a
recursive
Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
relationship called the
Bellman equation
A Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. It writes the "value" of a decision problem at a certain point in time ...
. For ''i'' = 2, ..., ''n'', ''V''
''i''−1 at any state ''y'' is calculated from ''V''
''i'' by maximizing a simple function (usually the sum) of the gain from a decision at time ''i'' − 1 and the function ''V''
''i'' at the new state of the system if this decision is made. Since ''V''
''i'' has already been calculated for the needed states, the above operation yields ''V''
''i''−1 for those states. Finally, ''V''
1 at the initial state of the system is the value of the optimal solution. The optimal values of the decision variables can be recovered, one by one, by tracking back the calculations already performed.
Control theory
In
control theory
Control theory is a field of mathematics that deals with the control of dynamical systems in engineered processes and machines. The objective is to develop a model or algorithm governing the application of system inputs to drive the system to a ...
, a typical problem is to find an admissible control
which causes the system
to follow an admissible trajectory
on a continuous time interval
that minimizes a
cost function
:
The solution to this problem is an optimal control law or policy
, which produces an optimal trajectory
and a
cost-to-go function . The latter obeys the fundamental equation of dynamic programming:
:
a
partial differential equation known as the
Hamilton–Jacobi–Bellman equation, in which
and
. One finds that minimizing
in terms of
,
, and the unknown function
and then substitutes the result into the Hamilton–Jacobi–Bellman equation to get the partial differential equation to be solved with boundary condition
. In practice, this generally requires
numerical techniques for some discrete approximation to the exact optimization relationship.
Alternatively, the continuous process can be approximated by a discrete system, which leads to a following recurrence relation analog to the Hamilton–Jacobi–Bellman equation:
:
at the
-th stage of
equally spaced discrete time intervals, and where
and
denote discrete approximations to
and
. This functional equation is known as the
Bellman equation
A Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. It writes the "value" of a decision problem at a certain point in time ...
, which can be solved for an exact solution of the discrete approximation of the optimization equation.
Example from economics: Ramsey's problem of optimal saving
In economics, the objective is generally to maximize (rather than minimize) some dynamic
social welfare function
In welfare economics, a social welfare function is a function that ranks social states (alternative complete descriptions of the society) as less desirable, more desirable, or indifferent for every possible pair of social states. Inputs of the f ...
. In Ramsey's problem, this function relates amounts of consumption to levels of
utility
As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosoph ...
. Loosely speaking, the planner faces the trade-off between contemporaneous consumption and future consumption (via investment in
capital stock
A corporation's share capital, commonly referred to as capital stock in the United States, is the portion of a corporation's equity that has been derived by the issue of shares in the corporation to a shareholder, usually for cash. "Share capi ...
that is used in production), known as
intertemporal choice Intertemporal choice is the process by which people make decisions about what and how much to do at various points in time, when choices at one time influence the possibilities available at other points in time. These choices are influenced by the r ...
. Future consumption is discounted at a constant rate
. A discrete approximation to the transition equation of capital is given by
:
where
is consumption,
is capital, and
is a
production function
In economics, a production function gives the technological relation between quantities of physical inputs and quantities of output of goods. The production function is one of the key concepts of mainstream neoclassical theories, used to define ...
satisfying the
Inada conditions. An initial capital stock
is assumed.
Let
be consumption in period , and assume consumption yields
utility
As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosoph ...
as long as the consumer lives. Assume the consumer is impatient, so that he
discounts future utility by a factor each period, where