Reasoning language models are

artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...

systems that combine

natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...

with structured reasoning capabilities. These models are usually constructed by prompting, supervised finetuning (SFT), and

reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...

(RL) initialized with pretrained language models.

Prompting

A language model is a generative model of a training dataset of texts. Prompting means constructing a text prompt, such that, conditional on the text prompt, the language model generates a solution to the task. Prompting can be applied to a pretrained model ("base model"), a base model that has undergone SFT, or RL, or both.

Chain of thought

Chain of Thought prompting (CoT) prompts the model to answer a question by first generating a "chain of thought", i.e. steps of reasoning that mimic a

train of thought The train of thought or track of thought refers to the interconnection in the sequence of ideas expressed during a connected discourse or thought, as well as the sequence itself, especially in discussion how this sequence leads from one idea to a ...

. It was published in 2022 by the Brain team of Google on the PaLM-540B model. In CoT prompting, the prompt is of the form " Let's think step by step", and the model would respond with a chain of reasoning steps, ended with an answer:

\text \rightarrow \underbrace_ \rightarrow \text

Similarly, Tree of Thought prompting generalizes CoT by prompting the model to generate one or more "possible next steps", and then running the model on each of the possible next steps by