machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

, automatic basis function construction, also known as basis discovery, is a technique that finds a set of general-purpose (task-independent)

basis functions In mathematics, a basis function is an element of a particular basis for a function space. Every function in the function space can be represented as a linear combination of basis functions, just as every vector in a vector space can be repre ...

to simplify complex data (

state space A state space is the set of all possible configurations of a system. It is a useful abstraction for reasoning about the behavior of a given system and is widely used in the fields of artificial intelligence and game theory. For instance, the t ...

) into a smaller, manageable form (lower-dimensional embedding) while accurately capturing the

value function The value function of an optimization problem gives the value attained by the objective function at a solution, while only depending on the parameters of the problem. In a controlled dynamical system, the value function represents the optimal payo ...

. Unlike methods relying on expert-designed functions, this approach works without prior knowledge of the specific problem area (domain), making it effective in situations where creating tailored basis functions is challenging or impractical.

Motivation

reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...

(RL), many real-world problems modeled as Markov Decision Processes (MDPs) involve large or continuous state spaces—sets of all possible situations an agent might encounter—which are too complex to handle directly and often need approximation for efficient computation. o reference provided in original/ref> Linear function approximatorsKeller, Philipp; Mannor, Shie; Precup, Doina. (2006) Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA. (LFAs), valued for their simplicity and low computational demands, are a common solution. Using LFAs effectively requires addressing two challenges: optimizing weights (adjusting importance of features) and building basis functions (creating simplified representations). While specially designed basis functions can excel in specific tasks, they are limited to those particular areas (domains). Automatic basis function construction offers a more flexible approach, enabling broader use across diverse problems without relying on task-specific expertise.

Problem definition

A Markov decision process with finite state space and fixed policy is defined with a 5-tuple

, which includes the finite state space

S=

, the finite action space

A

, the reward function

r

, discount factor

\gamma\in S, \times n

matrix in which every row contains a Feature (machine learning), feature vector for corresponding row,

\theta

is a weight vector with n parameters and usually

n\ll , s,

. Basis construction looks for ways to automatically construct better basis function

\Phi

which can represent the value function well. A good construction method should have the following characteristics: * Small error bounds between the estimate and real value function * Form orthogonal basis in the value function space * Converge to stationary value function fast

Popular methods

Proto-value basis

In this approach, Mahadevan analyzes the connectivity graph between states to determine a set of basis functions. The normalized graph Laplacian is defined as: :

L=I-D^WD^

Here W is an

adjacency matrix In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. In the special case of a finite simple ...

which represents the states of fixed policy MDP which forms an

undirected graph In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called '' v ...

(N,E). D is a

diagonal matrix In linear algebra, a diagonal matrix is a matrix in which the entries outside the main diagonal are all zero; the term usually refers to square matrices. Elements of the main diagonal can either be zero or nonzero. An example of a 2×2 diagonal ...

related to nodes' degrees. In discrete state space, the adjacency matrix

W

could be constructed by simply checking whether two states are connected, and D could be calculated by summing up every row of W. In continuous state space, we could take

random walk In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space. An elementary example of a random walk is the random walk on the integer number line \mathbb ...

Laplacian of W. This spectral framework can be used for value

function approximation In general, a function approximation problem asks us to select a function among a that closely matches ("approximates") a in a task-specific way. The need for function approximations arises in many branches of applied mathematics, and comput ...

(VFA). Given the fixed policy, the edge weights are determined by corresponding states' transition probability. To get smooth value approximation, diffusion wavelets are used.Mahadevan, Sridhar; Maggioni, Mauro. (2005) Value function approximation with diffusion wavelets and Laplacian eigenfuctions. Proceedings of Advances in Neural Information Processing Systems.

Krylov basis

Krylov basis construction uses the actual transition matrix instead of random walk Laplacian. The assumption of this method is that transition model ''P'' and reward ''r'' are available. The vectors in

Neumann series A Neumann series is a mathematical series of the form : \sum_^\infty T^k where T is an operator and T^k := T^\circ its k times repeated application. This generalizes the geometric series. The series is named after the mathematician Carl Neumann, ...

are denoted as

y_i=P^ir

for all

i\in

Ilse C. F. Ipsen and Carl D. Meyer. The idea behind Krylov methods. American Mathematical Monthly, 105(10):889–899, 1998. and m is the degree of minimal polynomial of

(I-\gamma P)

. Suppose the minimal polynomial is

p(A)=\frac\sum_^\alpha_A^i

, and we have

BA=I

, the value function can be written as: :

v=Br=\frac\sum_^\alpha_(I-\gamma P)^ir=\sum_^\alpha_\beta_i y_i.

:Algorithm Augmented Krylov MethodM. Petrik. An analysis of Laplacian methods for value function approximation in MDPs. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2574–2579, 2007 :

z_1,z_2,\ldots,z_k

are top real Eigenvalues and eigenvectors">eigenvectors In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted ...

of P :

z_:=r

:''for''

i:=1:(l+k)

''do'' ::''if''

i>k+1

''then'' :::

z_i:=Pz_

; ::''end if'' ::''for''

j:=1:(i-1)

''do'' :::

z_i:=z_i- z_j;

::''end for'' ::''if''

\parallel z_i\parallel\approx 0

''then'' :::''break''; ::''end if'' :''end for'' :* k: number of eigenvectors in basis :* l: total number of vectors

Bellman error basis

Bellman error (or BEBFs) is defined as:

\varepsilon=r+\gamma P\hat-\hat=r+\gamma P\Phi\theta-\Phi\theta

. Loosely speaking, Bellman error points towards the optimal value function.R. Parr, C. Painter-Wakefield, L.-H. Li, and M. Littman. Analyzing feature generation for value-function approximation. In ICML’07, 2007. The sequence of BEBF form a basis space which is orthogonal to the real value function space; thus with sufficient number of BEBFs, any value function can be represented exactly. :Algorithm BEBF :stage stage i=1,

\phi_=r

; :stage

i\in,N /math>
::compute the weight vector \theta_i according to current basis function \Phi_i;
::compute new bellman error by \varepsilon=r+\gamma P \Phi_\theta_-\Phi_\theta_;
::add bellman error to form new basis function: \Phi_= Phi_:\varepsilon /math>;
:* N represents the number of iterations till convergence.
:* ":" means juxtaposing matrices or vectors.

Bellman average reward bases

Bellman Average Reward Bases (or BARBs)S. Mahadevan and B. Liu. Basis construction from power series expansions of value functions. In NIPS’10, 2010 is similar to Krylov Bases, but the reward function is being dilated by the average adjusted transition matrix

P-P^*

. Here

P^*

can be calculated by many methods in.William J. Stewart. Numerical methods for computing stationary distributions of finite irreducible

Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...

s. In Advances in Computational Probability. Kluwer Academic Publishers, 1997. BARBs converges faster than BEBFs and Krylov when

\gamma

is close to 1. :Algorithm BARBs :stage stage i=1,

P^*r

; :stage

i\in,N /math>
::compute the weight vector \theta_i according to current basis function \Phi_i;
::compute new basis: :\phi_=r-P^*r+P\Phi_\theta_i-\Phi_\theta_i, and add it to form new bases matrix \Phi_= Phi_:\phi_/math>;
:* N represents the number of iterations till convergence.
:* ":" means juxtaposing matrices or vectors.

Discussion and analysis

There are two principal types of basis construction methods. The first type of methods are reward-sensitive, like Krylov and BEBFs; they dilate the reward function geometrically through transition matrix. However, when discount factor

\gamma

approaches to 1, Krylov and BEBFs converge slowly. This is because the error Krylov based methods are restricted by Chebyshev polynomial bound. To solve this problem, methods such as BARBs are proposed. BARBs is an incremental variant of Drazin bases, and converges faster than Krylov and BEBFs when

\gamma

becomes large. Another type is reward-insensitive proto value basis function derived from graph Lapalacian. This method uses graph information, but the construction of adjacency matrix makes this method hard to analyze.

References

{{Reflist

External links

UMASS ALL lab Optimal decisions Dynamic programming Stochastic control