In the theory of

stochastic processes In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Stoc ...

, filtering describes the problem of determining the

state State most commonly refers to: * State (polity), a centralized political organization that regulates law and society within a territory **Sovereign state, a sovereign polity in international law, commonly referred to as a country **Nation state, a ...

of a system from an incomplete and potentially noisy set of observations. For example, in GPS navigation, filtering helps estimate a car’s true position (the state) from noisy satellite signals (the observations). While originally motivated by problems in engineering, filtering found applications in many fields from signal processing to finance. The problem of optimal non-linear filtering (even for the non-stationary case) was solved by Ruslan L. Stratonovich (1959, 1960), see also Harold J. Kushner's work and Moshe Zakai's, who introduced a simplified dynamics for the unnormalized conditional law of the filter known as the

Zakai equation In filtering theory the Zakai equation is a linear stochastic partial differential equation for the un-normalized density of a hidden state. In contrast, the Kushner equation gives a non-linear stochastic partial differential equation for the nor ...

. The solution, however, is infinite-dimensional in the general case.Mireille Chaleyat-Maurel and Dominique Michel. Des resultats de non existence de filtre de dimension finie. Stochastics, 13(1+2):83-102, 1984. Certain approximations and special cases are well understood: for example, the linear filters are optimal for Gaussian random variables, and are known as the Wiener filter and the Kalman-Bucy filter. More generally, as the solution is infinite dimensional, it requires finite dimensional approximations to be implemented in a computer with finite memory. A finite dimensional approximated

nonlinear filter In signal processing, a nonlinear filter is a filter whose output is not a linear function of its input. That is, if the filter outputs signals and for two input signals and separately, but does not always output when the input is a linear ...

may be more based on heuristics, such as the

extended Kalman filter In estimation theory, the extended Kalman filter (EKF) is the nonlinear version of the Kalman filter which linearizes about an estimate of the current mean and covariance. In the case of well defined transition models, the EKF has been considered t ...

or the assumed density filters, or more methodologically oriented such as for example the

projection filters Projection filters are a set of algorithms based on stochastic analysis and information geometry, or the differential geometric approach to statistics, used to find approximate solutions for Filtering problem (stochastic processes), filtering prob ...

, some sub-families of which are shown to coincide with the Assumed Density Filters.Damiano Brigo, Bernard Hanzon and François Le Gland, Approximate Nonlinear Filtering by Projection on Exponential Manifolds of Densities, Bernoulli, Vol. 5, N. 3 (1999), pp. 495--534

Particle filter Particle filters, also known as sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to find approximate solutions for filtering problems for nonlinear state-space systems, such as signal processing and Bayesian statistical ...

s are another option to attack the infinite dimensional filtering problem and are based on sequential Monte Carlo methods. In general, if the

separation principle In control theory, a separation principle, more formally known as a principle of separation of estimation and control, states that under some assumptions the problem of designing an optimal feedback controller for a stochastic system can be solved b ...

applies, then filtering also arises as part of the solution of an

optimal control Optimal control theory is a branch of control theory that deals with finding a control for a dynamical system over a period of time such that an objective function is optimized. It has numerous applications in science, engineering and operations ...

problem. For example, the

Kalman filter In statistics and control theory, Kalman filtering (also known as linear quadratic estimation) is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, to produce estimates of unk ...

is the estimation part of the optimal control solution to the linear-quadratic-Gaussian control problem.

The mathematical formalism

Consider a

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models ...

(Ω, Σ, P) and suppose that the (random) state ''Y''_''t'' in ''n''-

dimension In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coo ...

Euclidean space Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces ...

R^''n'' of a system of interest at time ''t'' is a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

''Y''_''t'' : Ω → R^''n'' given by the solution to an Itō

stochastic differential equation A stochastic differential equation (SDE) is a differential equation in which one or more of the terms is a stochastic process, resulting in a solution which is also a stochastic process. SDEs have many applications throughout pure mathematics an ...

of the form :

\mathrm Y_ = b(t, Y_) \, \mathrm t + \sigma (t, Y_) \, \mathrm B_,

where ''B'' denotes standard ''p''-dimensional

Brownian motion Brownian motion is the random motion of particles suspended in a medium (a liquid or a gas). The traditional mathematical formulation of Brownian motion is that of the Wiener process, which is often called Brownian motion, even in mathematical ...

, ''b'' : [0, +∞) × R^''n'' → R^''n'' is the drift field, and ''σ'' : [0, +∞) × R^''n'' → R^''n''×''p'' is the diffusion field. It is assumed that observations ''H''_''t'' in R^''m'' (note that ''m'' and ''n'' may, in general, be unequal) are taken for each time ''t'' according to :

H_ = c(t, Y_) + \gamma (t, Y_) \cdot \mbox.

Adopting the Itō interpretation of the stochastic differential and setting :

Z_ = \int_^ H_ \, \mathrm s,

this gives the following stochastic integral representation for the observations ''Z''_''t'': :

\mathrm Z_ = c(t, Y_) \, \mathrm t + \gamma (t, Y_) \, \mathrm W_,

where ''W'' denotes standard ''r''-dimensional

, independent of ''B'' and the initial condition ''Y''₀, and ''c'' : [0, +∞) × R^''n'' → R^''n'' and ''γ'' : [0, +∞) × R^''n'' → R^''n''×''r'' satisfy :

\big,  c (t, x) \big,  + \big,  \gamma (t, x) \big,  \leq C \big( 1 + ,  x ,  \big)

for all ''t'' and ''x'' and some constant ''C''. The filtering problem is the following: given observations ''Z''_''s'' for 0 ≤ ''s'' ≤ ''t'', what is the best estimate ''Ŷ''_''t'' of the true state ''Y''_''t'' of the system based on those observations? By "based on those observations" it is meant that ''Ŷ''_''t'' is measurable function, measurable with respect to the sigma algebra, ''σ''-algebra ''G''_''t'' generated by the observations ''Z''_''s'', 0 ≤ ''s'' ≤ ''t''. Denote by ''K'' = ''K''(''Z'', ''t'') the collection of all R^''n''-valued random variables ''Y'' that are square-integrable and ''G''_''t''-measurable: :

K = K(Z, t) = L^ (\Omega, G_, \mathbf; \mathbf^).

By "best estimate", it is meant that ''Ŷ''_''t'' minimizes the mean-square distance between ''Y''_''t'' and all candidates in ''K'': :

\qquad \mbox

Basic result: orthogonal projection

The space ''K''(''Z'', ''t'') of candidates is a

Hilbert space In mathematics, a Hilbert space is a real number, real or complex number, complex inner product space that is also a complete metric space with respect to the metric induced by the inner product. It generalizes the notion of Euclidean space. The ...

, and the general theory of Hilbert spaces implies that the solution ''Ŷ''_''t'' of the minimization problem (M) is given by :

\hat_ = P_ \big( Y_ \big),

where ''P''_{''K''(''Z'',''t'')} denotes the

orthogonal projection In linear algebra and functional analysis, a projection is a linear transformation P from a vector space to itself (an endomorphism) such that P\circ P=P. That is, whenever P is applied twice to any vector, it gives the same result as if it we ...

of ''L''²(Ω, Σ, P; R^''n'') onto the

linear subspace In mathematics, the term ''linear'' is used in two distinct senses for two different properties: * linearity of a ''function (mathematics), function'' (or ''mapping (mathematics), mapping''); * linearity of a ''polynomial''. An example of a li ...

''K''(''Z'', ''t'') = ''L''²(Ω, ''G''_''t'', P; R^''n''). Furthermore, it is a general fact about

conditional expectation In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on ...

s that if ''F'' is any sub-''σ''-algebra of Σ then the orthogonal projection :

P_ : L^ (\Omega, \Sigma, \mathbf; \mathbf^) \to  L^ (\Omega, F, \mathbf; \mathbf^)

is exactly the conditional expectation operator E ''F'' i.e., :

P_ (X) = \mathbf \big F \big

Hence, :

\hat_ = P_ \big( Y_ \big) = \mathbf \big G_ \big

This elementary result is the basis for the general Fujisaki-Kallianpur-Kunita equation of filtering theory.

More advanced result: nonlinear filtering SPDE

The complete knowledge of the filter at a time ''t'' would be given by the probability law of the signal ''Y''_''t'' conditional on the sigma-field ''G''_''t'' generated by observations ''Z'' up to time ''t''. If this probability law admits a density, informally :

p_t(y)\ dy = (Y_t \in dy, G_t),

then under some regularity assumptions the density

p_t(y)

satisfies a non-linear

stochastic partial differential equation Stochastic partial differential equations (SPDEs) generalize partial differential equations via random force terms and coefficients, in the same way ordinary stochastic differential equations generalize ordinary differential equations. They hav ...

(SPDE) driven by

dZ_t

and called Kushner-Stratonovich equation,Bain, A., and Crisan, D. (2009). Fundamentals of Stochastic Filtering. Springer-Verlag, New York, https://doi.org/10.1007/978-0-387-76896-0 or a unnormalized version

q_t(y)

of the density

p_t(y)

satisfies a linear SPDE called

. These equations can be formulated for the above system, but to simplify the exposition one can assume that the unobserved signal ''Y'' and the partially observed noisy signal ''Z'' satisfy the equations :

\mathrm Y_ = b(t, Y_) \, \mathrm t + \sigma (t, Y_) \, \mathrm B_,

\mathrm Z_ = c(t, Y_) \, \mathrm t +   \mathrm W_.

In other terms, the system is simplified by assuming that the observation noise ''W'' is not state dependent. One might keep a deterministic time dependent

\gamma

in front of

dW

but we assume this has been taken out by re-scaling. For this particular system, the Kushner-Stratonovich SPDE for the density

p_t

reads :

\mathrm p_t = ^*_t  p_t \  dt 
+ p_t (t,\cdot) - E_(c(t,\cdot)) T d Z_t - E_(c(t,\cdot)) d t

where ''T'' denotes transposition,

E_p

denotes the expectation with respect to the density ''p'',

= \int f(y) p(y) dy,

and the forward diffusion operator

^*_t

is :

_t^* f(t,y) = - \sum_i \frac b_i(t,y) f(t,y) + \frac \sum_ \frac_(t,y) f(t,y)

where

a=\sigma \sigma^T

. If we choose the unnormalized density

q_t(y)

, the Zakai SPDE for the same system reads :

\mathrm q_t = ^*_t  q_t \  dt 
+ q_t (t,\cdot) T  d Z_t .

These SPDEs for ''p'' and ''q'' are written in Ito calculus form. It is possible to write them in Stratonovich calculus form, which turns out to be helpful when deriving filtering approximations based on differential geometry, as in the projection filters. For example, the Kushner-Stratonovich equation written in Stratonovich calculus reads :

T \circ dZ_t\ .

From any of the densities ''p'' and ''q'' one can calculate all statistics of the signal ''Y''_''t'' conditional on the sigma-field generated by observations ''Z'' up to time ''t'', so that the densities give complete knowledge of the filter. Under the particular linear-constant assumptions with respect to ''Y'', where the systems coefficients ''b'' and ''c'' are linear functions of ''Y'' and where

\sigma

and

\gamma

do not depend on ''Y'', with the initial condition for the signal ''Y'' being Gaussian or deterministic, the density

p_t(y)

is Gaussian and it can be characterized by its mean and variance-covariance matrix, whose evolution is described by the Kalman-Bucy filter, which is finite dimensional. More generally, the evolution of the filter density occurs in an infinite-dimensional function space, and it has to be approximated via a finite dimensional approximation, as hinted above.

The mathematical formalism

Basic result: orthogonal projection

More advanced result: nonlinear filtering SPDE

See also

References

Further reading