HOME

TheInfoList



OR:

Optimal control theory is a branch of
mathematical optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
that deals with finding a control for a
dynamical system In mathematics, a dynamical system is a system in which a function describes the time dependence of a point in an ambient space. Examples include the mathematical models that describe the swinging of a clock pendulum, the flow of water i ...
over a period of time such that an
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...
is optimized. It has numerous applications in science, engineering and operations research. For example, the dynamical system might be a
spacecraft A spacecraft is a vehicle or machine designed to spaceflight, fly in outer space. A type of artificial satellite, spacecraft are used for a variety of purposes, including Telecommunications, communications, Earth observation satellite, Earth ...
with controls corresponding to rocket thrusters, and the objective might be to reach the
moon The Moon is Earth's only natural satellite. It is the fifth largest satellite in the Solar System and the largest and most massive relative to its parent planet, with a diameter about one-quarter that of Earth (comparable to the width ...
with minimum fuel expenditure. Or the dynamical system could be a nation's
economy An economy is an area of the production, distribution and trade, as well as consumption of goods and services. In general, it is defined as a social domain that emphasize the practices, discourses, and material expressions associated with t ...
, with the objective to minimize
unemployment Unemployment, according to the OECD (Organisation for Economic Co-operation and Development), is people above a specified age (usually 15) not being in paid employment or self-employment but currently available for work during the refer ...
; the controls in this case could be
fiscal Fiscal usually refers to government finance. In this context, it may refer to: Economics * Fiscal policy, use of government expenditure to influence economic development * Fiscal policy debate * Fiscal adjustment, a reduction in the government pr ...
and
monetary policy Monetary policy is the policy adopted by the monetary authority of a nation to control either the interest rate payable for very short-term borrowing (borrowing by banks from each other to meet their short-term needs) or the money supply, often ...
. A dynamical system may also be introduced to embed operations research problems within the framework of optimal control theory. Optimal control is an extension of the
calculus of variations The calculus of variations (or Variational Calculus) is a field of mathematical analysis that uses variations, which are small changes in functions and functionals, to find maxima and minima of functionals: mappings from a set of functions t ...
, and is a
mathematical optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
method for deriving control policies. The method is largely due to the work of Lev Pontryagin and
Richard Bellman Richard Ernest Bellman (August 26, 1920 – March 19, 1984) was an American applied mathematician, who introduced dynamic programming in 1953, and made important contributions in other fields of mathematics, such as biomathematics. He founde ...
in the 1950s, after contributions to calculus of variations by Edward J. McShane. Optimal control can be seen as a
control strategy Control theory is a field of mathematics that deals with the control of dynamical systems in engineered processes and machines. The objective is to develop a model or algorithm governing the application of system inputs to drive the system to a ...
in
control theory Control theory is a field of mathematics that deals with the control of dynamical systems in engineered processes and machines. The objective is to develop a model or algorithm governing the application of system inputs to drive the system to a ...
.


General method

Optimal control deals with the problem of finding a control law for a given system such that a certain
optimality criterion In statistics, an optimality criterion provides a measure of the fit of the data to a given hypothesis, to aid in model selection. A model is designated as the "best" of the candidate models if it gives the best value of an objective function m ...
is achieved. A control problem includes a
cost functional Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
that is a function of state and control variables. An optimal control is a set of
differential equation In mathematics, a differential equation is an equation that relates one or more unknown functions and their derivatives. In applications, the functions generally represent physical quantities, the derivatives represent their rates of change, a ...
s describing the paths of the control variables that minimize the cost function. The optimal control can be derived using Pontryagin's maximum principle (a
necessary condition In logic and mathematics, necessity and sufficiency are terms used to describe a conditional or implicational relationship between two statements. For example, in the conditional statement: "If then ", is necessary for , because the truth o ...
also known as Pontryagin's minimum principle or simply Pontryagin's principle), or by solving the Hamilton–Jacobi–Bellman equation (a
sufficient condition In logic and mathematics, necessity and sufficiency are terms used to describe a conditional or implicational relationship between two statements. For example, in the conditional statement: "If then ", is necessary for , because the truth of ...
). We begin with a simple example. Consider a car traveling in a straight line on a hilly road. The question is, how should the driver press the accelerator pedal in order to ''minimize'' the total traveling time? In this example, the term ''control law'' refers specifically to the way in which the driver presses the accelerator and shifts the gears. The ''system'' consists of both the car and the road, and the ''optimality criterion'' is the minimization of the total traveling time. Control problems usually include ancillary
constraint Constraint may refer to: * Constraint (computer-aided design), a demarcation of geometrical characteristics between two or more entities or solid modeling bodies * Constraint (mathematics), a condition of an optimization problem that the solution ...
s. For example, the amount of available fuel might be limited, the accelerator pedal cannot be pushed through the floor of the car, speed limits, etc. A proper cost function will be a mathematical expression giving the traveling time as a function of the speed, geometrical considerations, and initial conditions of the system.
Constraint Constraint may refer to: * Constraint (computer-aided design), a demarcation of geometrical characteristics between two or more entities or solid modeling bodies * Constraint (mathematics), a condition of an optimization problem that the solution ...
s are often interchangeable with the cost function. Another related optimal control problem may be to find the way to drive the car so as to minimize its fuel consumption, given that it must complete a given course in a time not exceeding some amount. Yet another related control problem may be to minimize the total monetary cost of completing the trip, given assumed monetary prices for time and fuel. A more abstract framework goes as follows. Minimize the continuous-time cost functional J textbf(\cdot), \textbf(\cdot), t_0, t_f:= E\, textbf(t_0),t_0,\textbf(t_f),t_f+ \int_^ F\, textbf(t),\textbf(t),t\,\mathrm dt subject to the first-order dynamic constraints (the state equation) \dot(t) = \textbf\, ,\textbf(t), \textbf(t), t the algebraic ''path constraints'' \textbf\, textbf(t),\textbf(t),t\leq \textbf, and the endpoint conditions \textbf textbf(t_0),t_0,\textbf(t_f),t_f= 0 where \textbf(t) is the ''state'', \textbf(t) is the ''control'', t is the independent variable (generally speaking, time), t_0 is the initial time, and t_f is the terminal time. The terms E and F are called the ''endpoint cost '' and the ''running cost'' respectively. In the calculus of variations, E and F are referred to as the Mayer term and the ''
Lagrangian Lagrangian may refer to: Mathematics * Lagrangian function, used to solve constrained minimization problems in optimization theory; see Lagrange multiplier ** Lagrangian relaxation, the method of approximating a difficult constrained problem with ...
'', respectively. Furthermore, it is noted that the path constraints are in general ''inequality'' constraints and thus may not be active (i.e., equal to zero) at the optimal solution. It is also noted that the optimal control problem as stated above may have multiple solutions (i.e., the solution may not be unique). Thus, it is most often the case that any solution textbf^*(t),\textbf^*(t),t_0^*, t_f^*/math> to the optimal control problem is ''locally minimizing''.


Linear quadratic control

A special case of the general nonlinear optimal control problem given in the previous section is the ''linear quadratic'' (LQ) optimal control problem. The LQ problem is stated as follows. Minimize the ''quadratic'' continuous-time cost functional J=\tfrac \mathbf^(t_f)\mathbf_f\mathbf(t_f) + \tfrac \int_^ ,\mathbf^(t)\mathbf(t)\mathbf(t) + \mathbf^(t)\mathbf(t) \mathbf(t), \mathrm dt Subject to the ''linear'' first-order dynamic constraints \dot(t)= \mathbf(t) \mathbf(t) + \mathbf(t) \mathbf(t), and the initial condition \mathbf(t_0) = \mathbf_0 A particular form of the LQ problem that arises in many control system problems is that of the ''linear quadratic regulator'' (LQR) where all of the matrices (i.e., \mathbf, \mathbf, \mathbf, and \mathbf) are ''constant'', the initial time is arbitrarily set to zero, and the terminal time is taken in the limit t_f\rightarrow\infty (this last assumption is what is known as ''infinite horizon''). The LQR problem is stated as follows. Minimize the infinite horizon quadratic continuous-time cost functional J= \tfrac \int_^ mathbf^(t)\mathbf\mathbf(t) + \mathbf^(t)\mathbf\mathbf(t), \mathrm dt Subject to the ''linear time-invariant'' first-order dynamic constraints \dot(t) = \mathbf \mathbf(t) + \mathbf \mathbf(t), and the initial condition \mathbf(t_0) = \mathbf_0 In the finite-horizon case the matrices are restricted in that \mathbf and \mathbf are positive semi-definite and positive definite, respectively. In the infinite-horizon case, however, the matrices \mathbf and \mathbf are not only positive-semidefinite and positive-definite, respectively, but are also ''constant''. These additional restrictions on \mathbf and \mathbf in the infinite-horizon case are enforced to ensure that the cost functional remains positive. Furthermore, in order to ensure that the cost function is ''bounded'', the additional restriction is imposed that the pair (\mathbf,\mathbf) is '' controllable''. Note that the LQ or LQR cost functional can be thought of physically as attempting to minimize the ''control energy'' (measured as a quadratic form). The infinite horizon problem (i.e., LQR) may seem overly restrictive and essentially useless because it assumes that the operator is driving the system to zero-state and hence driving the output of the system to zero. This is indeed correct. However the problem of driving the output to a desired nonzero level can be solved ''after'' the zero output one is. In fact, it can be proved that this secondary LQR problem can be solved in a very straightforward manner. It has been shown in classical optimal control theory that the LQ (or LQR) optimal control has the feedback form \mathbf(t) = -\mathbf(t)\mathbf(t) where \mathbf(t) is a properly dimensioned matrix, given as \mathbf(t) = \mathbf^\mathbf^\mathbf(t), and \mathbf(t) is the solution of the differential
Riccati equation In mathematics, a Riccati equation in the narrowest sense is any first-order ordinary differential equation that is quadratic in the unknown function. In other words, it is an equation of the form : y'(x) = q_0(x) + q_1(x) \, y(x) + q_2(x) \, y^2(x ...
. The differential Riccati equation is given as \dot(t) = -\mathbf(t)\mathbf-\mathbf^ \mathbf(t) +\mathbf(t)\mathbf\mathbf^\mathbf^\mathbf(t) - \mathbf For the finite horizon LQ problem, the Riccati equation is integrated backward in time using the terminal boundary condition \mathbf(t_f) = \mathbf_f For the infinite horizon LQR problem, the differential Riccati equation is replaced with the ''algebraic'' Riccati equation (ARE) given as \mathbf = -\mathbf\mathbf-\mathbf^\mathbf+\mathbf\mathbf\mathbf^\mathbf^\mathbf-\mathbf Understanding that the ARE arises from infinite horizon problem, the matrices \mathbf, \mathbf, \mathbf, and \mathbf are all ''constant''. It is noted that there are in general multiple solutions to the algebraic Riccati equation and the ''positive definite'' (or positive semi-definite) solution is the one that is used to compute the feedback gain. The LQ (LQR) problem was elegantly solved by Rudolf E. Kálmán.


Numerical methods for optimal control

Optimal control problems are generally nonlinear and therefore, generally do not have analytic solutions (e.g., like the linear-quadratic optimal control problem). As a result, it is necessary to employ numerical methods to solve optimal control problems. In the early years of optimal control ( 1950s to 1980s) the favored approach for solving optimal control problems was that of ''indirect methods''. In an indirect method, the calculus of variations is employed to obtain the first-order optimality conditions. These conditions result in a two-point (or, in the case of a complex problem, a multi-point)
boundary-value problem In mathematics, in the field of differential equations, a boundary value problem is a differential equation together with a set of additional constraints, called the boundary conditions. A solution to a boundary value problem is a solution to t ...
. This boundary-value problem actually has a special structure because it arises from taking the derivative of a Hamiltonian. Thus, the resulting
dynamical system In mathematics, a dynamical system is a system in which a function describes the time dependence of a point in an ambient space. Examples include the mathematical models that describe the swinging of a clock pendulum, the flow of water i ...
is a
Hamiltonian system A Hamiltonian system is a dynamical system governed by Hamilton's equations. In physics, this dynamical system describes the evolution of a physical system such as a planetary system or an electron in an electromagnetic field. These systems can ...
of the form \begin \dot & = \frac \\ .2ex\dot & = -\frac \end where H= F +\boldsymbol^\textbf- \boldsymbol^\textbf is the ''augmented Hamiltonian'' and in an indirect method, the boundary-value problem is solved (using the appropriate boundary or ''transversality'' conditions). The beauty of using an indirect method is that the state and adjoint (i.e., \boldsymbol) are solved for and the resulting solution is readily verified to be an extremal trajectory. The disadvantage of indirect methods is that the boundary-value problem is often extremely difficult to solve (particularly for problems that span large time intervals or problems with interior point constraints). A well-known software program that implements indirect methods is BNDSCO. The approach that has risen to prominence in numerical optimal control since the 1980s is that of so-called ''direct methods''. In a direct method, the state or the control, or both, are approximated using an appropriate function approximation (e.g., polynomial approximation or piecewise constant parameterization). Simultaneously, the cost functional is approximated as a ''cost function''. Then, the coefficients of the function approximations are treated as optimization variables and the problem is "transcribed" to a nonlinear optimization problem of the form: Minimize F(\mathbf) subject to the algebraic constraints \begin \mathbf(\mathbf) & = \mathbf \\ \mathbf(\mathbf) & \leq \mathbf \end Depending upon the type of direct method employed, the size of the nonlinear optimization problem can be quite small (e.g., as in a direct shooting or quasilinearization method), moderate (e.g.
pseudospectral optimal control Pseudospectral optimal control is a joint theoretical-computational method for solving optimal control problems. It combines pseudospectral (PS) theory with optimal control theory to produce PS optimal control theory. PS optimal control theory ...
) or may be quite large (e.g., a direct collocation method). In the latter case (i.e., a collocation method), the nonlinear optimization problem may be literally thousands to tens of thousands of variables and constraints. Given the size of many NLPs arising from a direct method, it may appear somewhat counter-intuitive that solving the nonlinear optimization problem is easier than solving the boundary-value problem. It is, however, the fact that the NLP is easier to solve than the boundary-value problem. The reason for the relative ease of computation, particularly of a direct collocation method, is that the NLP is ''sparse'' and many well-known software programs exist (e.g., SNOPT) to solve large sparse NLPs. As a result, the range of problems that can be solved via direct methods (particularly direct ''collocation methods'' which are very popular these days) is significantly larger than the range of problems that can be solved via indirect methods. In fact, direct methods have become so popular these days that many people have written elaborate software programs that employ these methods. In particular, many such programs include ''DIRCOL'', SOCS, OTIS, GESOP/ ASTOS, DITAN. and PyGMO/PyKEP. In recent years, due to the advent of the
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...
programming language, optimal control software in MATLAB has become more common. Examples of academically developed MATLAB software tools implementing direct methods include ''RIOTS'', ''
DIDO Dido ( ; , ), also known as Elissa ( , ), was the legendary founder and first queen of the Phoenician city-state of Carthage (located in modern Tunisia), in 814 BC. In most accounts, she was the queen of the Phoenician city-state of Tyre (t ...
'', ''DIRECT'', FALCON.m, and ''GPOPS,'' while an example of an industry developed MATLAB tool is '' PROPT''. These software tools have increased significantly the opportunity for people to explore complex optimal control problems both for academic research and industrial problems. Finally, it is noted that general-purpose MATLAB optimization environments such as TOMLAB have made coding complex optimal control problems significantly easier than was previously possible in languages such as C and FORTRAN.


Discrete-time optimal control

The examples thus far have shown continuous time systems and control solutions. In fact, as optimal control solutions are now often implemented digitally, contemporary control theory is now primarily concerned with
discrete time In mathematical dynamics, discrete time and continuous time are two alternative frameworks within which variables that evolve over time are modeled. Discrete time Discrete time views values of variables as occurring at distinct, separate "po ...
systems and solutions. The Theory of
Consistent Approximations In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...
provides conditions under which solutions to a series of increasingly accurate discretized optimal control problem converge to the solution of the original, continuous-time problem. Not all discretization methods have this property, even seemingly obvious ones. For instance, using a variable step-size routine to integrate the problem's dynamic equations may generate a gradient which does not converge to zero (or point in the right direction) as the solution is approached. The direct method
RIOTS
' is based on the Theory of Consistent Approximation.


Examples

A common solution strategy in many optimal control problems is to solve for the costate (sometimes called the
shadow price A shadow price is the monetary value assigned to an abstract or intangible commodity which is not traded in the marketplace. This often takes the form of an externality. Shadow prices are also known as the recalculation of known market prices in o ...
) \lambda(t). The costate summarizes in one number the marginal value of expanding or contracting the state variable next turn. The marginal value is not only the gains accruing to it next turn but associated with the duration of the program. It is nice when \lambda(t) can be solved analytically, but usually, the most one can do is describe it sufficiently well that the intuition can grasp the character of the solution and an equation solver can solve numerically for the values. Having obtained \lambda(t), the turn-t optimal value for the control can usually be solved as a differential equation conditional on knowledge of \lambda(t). Again it is infrequent, especially in continuous-time problems, that one obtains the value of the control or the state explicitly. Usually, the strategy is to solve for thresholds and regions that characterize the optimal control and use a numerical solver to isolate the actual choice values in time.


Finite time

Consider the problem of a mine owner who must decide at what rate to extract ore from their mine. They own rights to the ore from date 0 to date T. At date 0 there is x_0 ore in the ground, and the time-dependent amount of ore x(t) left in the ground declines at the rate of u(t) that the mine owner extracts it. The mine owner extracts ore at cost u(t)^2/x(t) (the cost of extraction increasing with the square of the extraction speed and the inverse of the amount of ore left) and sells ore at a constant price p. Any ore left in the ground at time T cannot be sold and has no value (there is no "scrap value"). The owner chooses the rate of extraction varying with time u(t) to maximize profits over the period of ownership with no time discounting.


See also

* Active inference * Bellman equation *
Bellman pseudospectral method The Bellman pseudospectral method is a pseudospectral method for optimal control based on Bellman's principle of optimality. It is part of the larger theory of pseudospectral optimal control, a term coined by Ross. The method is named after Richar ...
* Brachistochrone *
DIDO Dido ( ; , ), also known as Elissa ( , ), was the legendary founder and first queen of the Phoenician city-state of Carthage (located in modern Tunisia), in 814 BC. In most accounts, she was the queen of the Phoenician city-state of Tyre (t ...
*
DNSS point Sethi-Skiba points, also known as DNSS points, arise in optimal control problems that exhibit multiple optimal solutions. A Sethi-Skiba point is an indifference point in an optimal control problem such that starting from such a point, the problem ha ...
*
Dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. I ...
* Gauss pseudospectral method * Generalized filtering *
GPOPS-II GPOPS-II (pronounced "GPOPS 2") is a general-purpose MATLAB software for solving continuous optimal control problems using hp-adaptive Gaussian quadrature collocation and sparse nonlinear programming. The acronym GPOPS stands for "General Purpose ...
* CasADi * JModelica.org (Modelica-based open source platform for dynamic optimization) *
Kalman filter For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estima ...
* Linear-quadratic regulator * Model Predictive Control *
Overtaking criterion In economics, the overtaking criterion is used to compare infinite streams of outcomes. Mathematically, it is used to properly define a notion of optimality for a problem of optimal control on an unbounded time interval. Often, the decisions of a ...
*
PID controller A proportional–integral–derivative controller (PID controller or three-term controller) is a control loop mechanism employing feedback that is widely used in industrial control systems and a variety of other applications requiring continuou ...
* PROPT (Optimal Control Software for MATLAB) *
Pseudospectral optimal control Pseudospectral optimal control is a joint theoretical-computational method for solving optimal control problems. It combines pseudospectral (PS) theory with optimal control theory to produce PS optimal control theory. PS optimal control theory ...
* Pursuit-evasion games *
Sliding mode control In control systems, sliding mode control (SMC) is a nonlinear control method that alters the dynamics of a nonlinear system by applying a discontinuous control signal (or more rigorously, a set-valued control signal) that forces the system to "s ...
* SNOPT * Stochastic control *
Trajectory optimization Trajectory optimization is the process of designing a trajectory that minimizes (or maximizes) some measure of performance while satisfying a set of constraints. Generally speaking, trajectory optimization is a technique for computing an open-loop ...


References


Further reading

* * * * *


External links


Computational Optimal Control
* Dr. Benoît CHACHUAT

– Nonlinear Programming, Calculus of Variations and Optimal Control.


GEKKO - Python package for optimal control

GESOP – Graphical Environment for Simulation and OPtimization

GPOPS-II – General-Purpose MATLAB Optimal Control Software

CasADi – Free and open source symbolic framework for optimal control

PROPT – MATLAB Optimal Control Software

OpenOCL – Open Optimal Control Library
* Elmer G. Wiens
Optimal Control
– Applications of Optimal Control Theory Using the Pontryagin Maximum Principle with interactive models.
Pontryagin's Principle Illustrated with Examples

On Optimal Control
by Yu-Chi Ho
Pseudospectral optimal control: Part 1

Pseudospectral optimal control: Part 2
{{DEFAULTSORT:Optimal Control Mathematical optimization