State–action–reward–state–action (SARSA) is an
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
for learning a
Markov decision process policy, used in the
reinforcement learning
Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...
area of
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
. It was proposed by Rummery and Niranjan in a technical note with the name "Modified Connectionist Q-Learning" (MCQ-L). The alternative name SARSA, proposed by
Rich Sutton, was only mentioned as a footnote.
This name reflects the fact that the main function for updating the Q-value depends on the current state of the agent "S
1", the action the agent chooses "A
1", the reward "R" the agent gets for choosing this action, the state "S
2" that the agent enters after taking that action, and finally the next action "A
2" the agent chooses in its new state. The acronym for the quintuple (s
t, a
t, r
t, s
t+1, a
t+1) is SARSA. Some authors use a slightly different convention and write the quintuple (s
t, a
t, r
t+1, s
t+1, a
t+1), depending on which time step the reward is formally assigned. The rest of the article uses the former convention.
Algorithm
: