Deep backward stochastic differential equation method is a numerical method that combines

deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...

with

Backward stochastic differential equation A backward stochastic differential equation (BSDE) is a stochastic differential equation with a terminal condition in which the solution is required to be adapted with respect to an underlying filtration. BSDEs naturally arise in various applicat ...

(BSDE). This method is particularly useful for solving high-dimensional problems in

financial derivatives In finance, a derivative is a contract between a buyer and a seller. The derivative can take various forms, depending on the transaction, but every derivative has the following four elements: # an item (the "underlier") that can or must be bou ...

pricing and

risk management Risk management is the identification, evaluation, and prioritization of risks, followed by the minimization, monitoring, and control of the impact or probability of those risks occurring. Risks can come from various sources (i.e, Threat (sec ...

. By leveraging the powerful function approximation capabilities of

deep neural networks Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...

, deep BSDE addresses the computational challenges faced by traditional numerical methods in high-dimensional settings.

History

Backwards stochastic differential equations

BSDEs were first introduced by Pardoux and Peng in 1990 and have since become essential tools in

stochastic control Stochastic control or stochastic optimal control is a sub field of control theory that deals with the existence of uncertainty either in observations or in the noise that drives the evolution of the system. The system designer assumes, in a Bayesi ...

and

financial mathematics Mathematical finance, also known as quantitative finance and financial mathematics, is a field of applied mathematics, concerned with mathematical modeling in the Finance#Quantitative_finance, financial field. In general, there exist two separate ...

. In the 1990s, Étienne Pardoux and Shige Peng established the existence and uniqueness theory for BSDE solutions, applying BSDEs to financial mathematics and control theory. For instance, BSDEs have been widely used in option pricing, risk measurement, and dynamic hedging.

Deep learning

Deep Learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...

is a

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

method based on multilayer

neural networks A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...

. Its core concept can be traced back to the neural computing models of the 1940s. In the 1980s, the proposal of the

backpropagation In machine learning, backpropagation is a gradient computation method commonly used for training a neural network to compute its parameter updates. It is an efficient application of the chain rule to neural networks. Backpropagation computes th ...

algorithm made the training of multilayer neural networks possible. In 2006, the Deep Belief Networks proposed by

Geoffrey Hinton Geoffrey Everest Hinton (born 1947) is a British-Canadian computer scientist, cognitive scientist, and cognitive psychologist known for his work on artificial neural networks, which earned him the title "the Godfather of AI". Hinton is Univer ...

and others rekindled interest in deep learning. Since then, deep learning has made groundbreaking advancements in

image processing An image or picture is a visual representation. An image can be two-dimensional, such as a drawing, painting, or photograph, or three-dimensional, such as a carving or sculpture. Images may be displayed through other media, including a pr ...

speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also ...

natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...

, and other fields.

Limitations of traditional numerical methods

Traditional numerical methods for solving stochastic differential equationsKloeden, P.E., Platen E. (1992). Numerical Solution of Stochastic Differential Equations. Springer, Berlin, Heidelberg. DOI: https://doi.org/10.1007/978-3-662-12616-5 include the

Euler–Maruyama method In Itô calculus, the Euler–Maruyama method (also simply called the Euler method) is a method for the approximate numerical analysis, numerical solution of a stochastic differential equation (SDE). It is an extension of the Euler method for ord ...

Milstein method In mathematics, the Milstein method is a technique for the approximate numerical analysis, numerical solution of a stochastic differential equation. It is named after Grigori Milstein who first published it in 1974. Description Consider the autono ...

Runge–Kutta method (SDE) In mathematics of stochastic systems, the Runge–Kutta method is a technique for the approximate numerical solution of a stochastic differential equation. It is a generalisation of the Runge–Kutta method for ordinary differential equations to st ...

and methods based on different representations of iterated stochastic integrals.Kuznetsov, D.F. (2023). Strong approximation of iterated Itô and Stratonovich stochastic integrals: Method of generalized multiple Fourier series. Application to numerical integration of Itô SDEs and semilinear SPDEs. Differ. Uravn. Protsesy Upr., no. 1. DOI: https://doi.org/10.21638/11701/spbu35.2023.110Rybakov, K.A. (2023). Spectral representations of iterated stochastic integrals and their application for modeling nonlinear stochastic dynamics. Mathematics, vol. 11, 4047. DOI: https://doi.org/10.3390/math11194047 But as financial problems become more complex, traditional numerical methods for BSDEs (such as the

Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be ...

finite difference method In numerical analysis, finite-difference methods (FDM) are a class of numerical techniques for solving differential equations by approximating Derivative, derivatives with Finite difference approximation, finite differences. Both the spatial doma ...

, etc.) have shown limitations such as high computational complexity and the curse of dimensionality. #In high-dimensional scenarios, the Monte Carlo method requires numerous simulation paths to ensure accuracy, resulting in lengthy computation times. In particular, for nonlinear BSDEs, the convergence rate is slow, making it challenging to handle complex financial derivative pricing problems. Pi monte carlo all

#The finite difference method, on the other hand, experiences exponential growth in the number of computation grids with increasing dimensions, leading to significant computational and storage demands. This method is generally suitable for simple boundary conditions and low-dimensional BSDEs, but it is less effective in complex situations.

Deep BSDE method

The combination of deep learning with BSDEs, known as deep BSDE, was proposed by Han, Jentzen, and E in 2018 as a solution to the high-dimensional challenges faced by traditional numerical methods. The Deep BSDE approach leverages the powerful nonlinear fitting capabilities of deep learning, approximating the solution of BSDEs by constructing neural networks. The specific idea is to represent the solution of a BSDE as the output of a neural network and train the network to approximate the solution.

Model

Mathematical method

Backward Stochastic Differential Equations (BSDEs) represent a powerful mathematical tool extensively applied in fields such as

, and beyond. Unlike traditional

Stochastic differential equations A stochastic differential equation (SDE) is a differential equation in which one or more of the terms is a stochastic process, resulting in a solution which is also a stochastic process. SDEs have many applications throughout pure mathematics an ...

(SDEs), which are solved forward in time, BSDEs are solved backward, starting from a future time and moving backwards to the present. This unique characteristic makes BSDEs particularly suitable for problems involving terminal conditions and uncertainties. A backward stochastic differential equation (BSDE) can be formulated as: :

Y_t = \xi + \int_t^T f(s, Y_s, Z_s) \, ds - \int_t^T Z_s \, dW_s, \quad t \in

, T The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

In this equation: *

\xi

is the terminal condition specified at time

T

. *

times\mathbb\times\mathbb\to\mathbb

is called the generator of the BSDE *

(Y_t,Z_t)_

is the solution consists of stochastic processes

(Y_t)_

and

(Z_t)_

which are adapted to the filtration

(\mathcal_t)_

W_s

is a standard

Brownian motion Brownian motion is the random motion of particles suspended in a medium (a liquid or a gas). The traditional mathematical formulation of Brownian motion is that of the Wiener process, which is often called Brownian motion, even in mathematical ...

. The goal is to find adapted processes

Y_t

and

Z_t

that satisfy this equation. Traditional numerical methods struggle with BSDEs due to the curse of dimensionality, which makes computations in high-dimensional spaces extremely challenging.

Methodology overview

Source:

1. Semilinear parabolic PDEs

We consider a general class of PDEs represented by

\frac(t,x) + \frac \text\left(\sigma\sigma^T(t,x)\left(\text_x u(t,x)\right)\right) + \nabla u(t,x) \cdot \mu(t,x) + f\left(t,x,u(t,x),\sigma^T(t,x)\nabla u(t,x)\right) = 0

In this equation: *

u(T,x) = g(x)

is the terminal condition specified at time

T

. *

t

and

x

represent the time and

d

-dimensional space variable, respectively. *

\sigma

is a known vector-valued function,

\sigma^T

denotes the transpose associated to

\sigma

, and

\text_x u

denotes the Hessian of function

u

with respect to

x

. *

\mu

is a known vector-valued function, and

f

is a known nonlinear function.

2. Stochastic process representation

Let

\_

be a

d

-dimensional Brownian motion and

\_

be a

d

-dimensional stochastic process which satisfies

X_t = \xi + \int_0^t \mu(s, X_s) \, ds + \int_0^t \sigma(s, X_s) \, dW_s

3. Backward stochastic differential equation (BSDE)

Then the solution of the PDE satisfies the following BSDE:

u(t, X_t) - u(0, X_0)

= - \int_0^t f\left(s, X_s, u(s, X_s), \sigma^T(s, X_s)\nabla u(s, X_s)\right) \, ds + \int_0^t \nabla u(s, X_s) \cdot \sigma(s, X_s) \, dW_s

4. Temporal discretization

Discretize the time interval

into steps

0 = t_0 < t_1 < \cdots < t_N = T

X_ - X_ \approx \mu(t_n, X_) \Delta t_n + \sigma(t_n, X_) \Delta W_n

u(t_n, X_) - u(t_n, X_)

\Delta W_n

where

\Delta t_n = t_ - t_n

and

\Delta W_n = W_ - W_n

5. Neural network approximation

Use a multilayer feedforward neural network to approximate:

\sigma^T(t_n, X_n) \nabla u(t_n, X_n) \approx (\sigma^T \nabla u)(t_n, X_n; \theta_n)

for

n = 1, \ldots, N

, where

\theta_n

are parameters of the neural network approximating

x \mapsto \sigma^T(t, x) \nabla u(t, x)

t = t_n

6. Training the neural network

Stack all sub-networks in the approximation step to form a deep neural network. Train the network using paths

\_

and

\_

as input data, minimizing the loss function:

l(\theta) = \mathbb \left,  g(X_) - \hat\left(\_, \_; \theta \right) \^2

where

\hat

is the approximation of

u(t, X_t)

Neural network architecture

Source: Deep learning encompass a class of machine learning techniques that have transformed numerous fields by enabling the modeling and interpretation of intricate data structures. These methods, often referred to as

, are distinguished by their hierarchical architecture comprising multiple layers of interconnected nodes, or neurons. This architecture allows deep neural networks to autonomously learn abstract representations of data, making them particularly effective in tasks such as

image recognition Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form o ...

, and

financial modeling Financial modeling is the task of building an abstract representation (a model) of a real world financial situation. This is a mathematical model designed to represent (a simplified version of) the performance of a financial asset or portfolio o ...

. The core of this method lies in designing an appropriate neural network structure (such as fully connected networks or

recurrent neural networks Recurrent neural networks (RNNs) are a class of artificial neural networks designed for processing sequential data, such as text, speech, and time series, where the order of elements is important. Unlike feedforward neural networks, which proces ...

) and selecting effective optimization algorithms. The choice of deep BSDE network architecture, the number of layers, and the number of neurons per layer are crucial hyperparameters that significantly impact the performance of the deep BSDE method. The deep BSDE method constructs neural networks to approximate the solutions for

Y

and

Z

, and utilizes

stochastic gradient descent Stochastic gradient descent (often abbreviated SGD) is an Iterative method, iterative method for optimizing an objective function with suitable smoothness properties (e.g. Differentiable function, differentiable or Subderivative, subdifferentiable ...

and other optimization algorithms for training. The fig illustrates the network architecture for the deep BSDE method. Note that

\nabla u(t_n, X_)

denotes the variable approximated directly by subnetworks, and

u(t_n, X_)

denotes the variable computed iteratively in the network. There are three types of connections in this network: i)

X_ \rightarrow h_1^n \rightarrow h_2^n \rightarrow \ldots \rightarrow h_H^n \rightarrow \nabla u(t_n, X_)

is the multilayer feedforward neural network approximating the spatial gradients at time

t = t_n

. The weights

\theta_n

of this subnetwork are the parameters optimized. ii)

(u(t_n, X_), \nabla u(t_n, X_), W_ - W_) \rightarrow u(t_, X_)

is the forward iteration providing the final output of the network as an approximation of

u(t_N, X_)

, characterized by Eqs. 5 and 6. There are no parameters optimized in this type of connection. iii)

(X_, W_ - W_) \rightarrow X_

is the shortcut connecting blocks at different times, characterized by Eqs. 4 and 6. There are also no parameters optimized in this type of connection.

Algorithms

Gradient descent Hamiltonian Monte Carlo comparison

Adam optimizer

This function implements the Adam algorithm for minimizing the target function

\mathcal(\theta)

. Function: ADAM(

\alpha

\beta_1

\beta_2

\epsilon

\mathcal(\theta)

\theta_0

) is

m_0 := 0

''// Initialize the first moment vector''

v_0 := 0

''// Initialize the second moment vector''

t := 0

''// Initialize timestep'' ''// Step 1: Initialize parameters''

\theta_t := \theta_0

''// Step 2: Optimization loop'' while

\theta_t

has not converged do

t := t + 1

g_t := \nabla_\theta \mathcal_t(\theta_)

''// Compute gradient of

\mathcal

at timestep

t

m_t := \beta_1 \cdot m_ + (1 - \beta_1) \cdot g_t

''// Update biased first moment estimate''

v_t := \beta_2 \cdot v_ + (1 - \beta_2) \cdot g_t^2

''// Update biased second raw moment estimate''

\widehat_t := \frac

''// Compute bias-corrected first moment estimate''

\widehat_t := \frac

''// Compute bias-corrected second moment estimate''

\theta_t := \theta_ - \frac

''// Update parameters'' return

\theta_t

* With the ADAM algorithm described above, we now present the pseudocode corresponding to a multilayer feedforward neural network:

Backpropagation algorithm

This function implements the backpropagation algorithm for training a multi-layer feedforward neural network. Function: BackPropagation(''set''

D=\left\_^

) is ''// Step 1: Random initialization'' ''// Step 2: Optimization loop'' repeat until termination condition is met: for each

(\mathbf_k,\mathbf_k) \in D

\hat_k := f(\beta_j - \theta_j)

''// Compute output'' ''// Compute gradients'' for each output neuron

j

g_j := \hat_^ (1 - \hat_^) (\hat_^ - y_^)

''// Gradient of output neuron'' for each hidden neuron

h

e_h := b_h (1 - b_h) \sum_^ w_ g_

''// Gradient of hidden neuron'' ''// Update weights'' for each weight

w_

\Delta w_ := \eta g_j b_h

''// Update rule for weight'' for each weight

v_

\Delta v_ := \eta e_h x_i

''// Update rule for weight'' ''// Update parameters'' for each parameter

\theta_j

\Delta \theta_j := -\eta g_j

''// Update rule for parameter'' for each parameter

\gamma_

\Delta \gamma_ := -\eta e_h

''// Update rule for parameter'' ''// Step 3: Construct the trained multi-layer feedforward neural network'' return trained neural network * Combining the ADAM algorithm and a multilayer feedforward neural network, we provide the following pseudocode for solving the optimal investment portfolio:

Numerical solution for optimal investment portfolio

Source: This function calculates the optimal investment portfolio using the specified parameters and stochastic processes. function OptimalInvestment(

W_ - W_

x

\theta=(X_, H_, \theta_, \theta_, \dots, \theta_)

) is ''// Step 1: Initialization'' for

k := 0

to maxstep do

M_0^ := 0

X_0^ := X_0^k

''// Parameter initialization'' for

i := 0

N-1

H_^ := \mathcal(M_^; \theta_i^k)

''// Update feedforward neural network unit''

M_^ := M_^ + \big((1 - \phi)(\mu_ - M_^)\big)(t_ - t_) + \sigma_(W_ - W_)

X_^ := X_^ + \big_^(\phi (M_^ - \mu_) + \mu_)\big t_ - t_) + H_^ (W_ - W_)

''// Step 2: Compute loss function''

\mathcal(t) := \frac \sum_^M \left,  X_^ - g(M_^) \^2

''// Step 3: Update parameters using ADAM optimization''

\theta^ := \operatorname(\theta^k, \nabla \mathcal(t))

X_0^ := \operatorname(X_0^k, \nabla \mathcal(t))

''// Step 4: Return terminal state'' return

(M_, X_)

Application

Deep BSDE is widely used in the fields of financial derivatives pricing, risk management, and asset allocation. It is particularly suitable for: * High-Dimensional Option Pricing: Pricing complex derivatives like basket options and

Asian options An Asian option (or ''average value'' option) is a special type of option contract. For Asian options, the payoff is determined by the average underlying price over some pre-set period of time. This is different from the case of the usual European ...

, which involve multiple underlying assets. Traditional methods such as finite difference methods and Monte Carlo simulations struggle with these high-dimensional problems due to the curse of dimensionality, where the computational cost increases exponentially with the number of dimensions. Deep BSDE methods utilize the function approximation capabilities of

to manage this complexity and provide accurate pricing solutions. The deep BSDE approach is particularly beneficial in scenarios where traditional numerical methods fall short. For instance, in high-dimensional option pricing, methods like finite difference or Monte Carlo simulations face significant challenges due to the exponential increase in computational requirements with the number of dimensions. Deep BSDE methods overcome this by leveraging deep learning to approximate solutions to high-dimensional PDEs efficiently. * Risk Measurement: Calculating risk measures such as

Conditional Value-at-Risk Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the wo ...

(CVaR) and

Expected shortfall Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the wor ...

(ES). These risk measures are crucial for financial institutions to assess potential losses in their portfolios. Deep BSDE methods enable efficient computation of these risk metrics even in high-dimensional settings, thereby improving the accuracy and robustness of risk assessments. In risk management, deep BSDE methods enhance the computation of advanced risk measures like CVaR and ES, which are essential for capturing tail risk in portfolios. These measures provide a more comprehensive understanding of potential losses compared to simpler metrics like Value-at-Risk (VaR). The use of deep neural networks enables these computations to be feasible even in high-dimensional contexts, ensuring accurate and reliable risk assessments. * Dynamic Asset Allocation: Determining optimal strategies for asset allocation over time in a stochastic environment. This involves creating investment strategies that adapt to changing market conditions and asset price dynamics. By modeling the stochastic behavior of asset returns and incorporating it into the allocation decisions, deep BSDE methods allow investors to dynamically adjust their portfolios, maximizing expected returns while managing risk effectively. For dynamic asset allocation, deep BSDE methods offer significant advantages by optimizing investment strategies in response to market changes. This dynamic approach is critical for managing portfolios in a stochastic financial environment, where asset prices are subject to random fluctuations. Deep BSDE methods provide a framework for developing and executing strategies that adapt to these fluctuations, leading to more resilient and effective asset management.

Advantages and disadvantages

Advantages

Sources: # High-dimensional capability: Compared to traditional numerical methods, deep BSDE performs exceptionally well in high-dimensional problems. # Flexibility: The incorporation of deep neural networks allows this method to adapt to various types of BSDEs and financial models. # Parallel computing: Deep learning frameworks support GPU acceleration, significantly improving computational efficiency.

Disadvantages

Sources: # Training time: Training deep neural networks typically requires substantial data and computational resources. # Parameter sensitivity: The choice of neural network architecture and hyperparameters greatly impacts the results, often requiring experience and trial-and-error.