Gerald J. "Gerry" Tesauro is an American computer scientist and a researcher at IBM, known for his development of

TD-Gammon TD-Gammon is a computer backgammon program developed in the 1990s by Gerald Tesauro at IBM's Thomas J. Watson Research Center. Its name comes from the fact that it is an artificial neural net trained by a form of temporal-difference learning, speci ...

, a

backgammon Backgammon is a two-player board game played with counters and dice on tables boards. It is the most widespread Western member of the large family of tables games, whose ancestors date back at least 1,600 years. The earliest record of backgammo ...

program that taught itself to play at a world-championship level through self-play and

temporal difference learning Temporal difference (TD) learning refers to a class of model-free reinforcement learning methods which learn by bootstrapping from the current estimate of the value function. These methods sample from the environment, like Monte Carlo methods, a ...

, an early success in

reinforcement learning Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...

and

neural networks A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...

. He subsequently researched on

autonomic computing Autonomic computing (AC) is distributed computing resources with self-management (computer science), self-managing characteristics, adapting to unpredictable changes while hiding intrinsic complexity to operators and users. Initiated by IBM in 2001 ...

multi-agent system A multi-agent system (MAS or "self-organized system") is a computerized system composed of multiple interacting intelligent agents.H. Pan; M. Zahmatkesh; F. Rekabi-Bana; F. Arvin; J. HuT-STAR: Time-Optimal Swarm Trajectory Planning for Quadroto ...

s for

e-commerce E-commerce (electronic commerce) refers to commercial activities including the electronic buying or selling products and services which are conducted on online platforms or over the Internet. E-commerce draws on technologies such as mobile co ...

, and contributed to the game strategy algorithms for

IBM Watson IBM Watson is a computer system capable of answering questions posed in natural language. It was developed as a part of IBM's DeepQA project by a research team, led by principal investigator David Ferrucci. Watson was named after IBM's fou ...

Career

Education

Tesauro earned a B.S. in physics from the

University of Maryland, College Park The University of Maryland, College Park (University of Maryland, UMD, or simply Maryland) is a public university, public Land-grant university, land-grant research university in College Park, Maryland, United States. Founded in 1856, UMD i ...

. He then pursued graduate studies in

plasma physics Plasma () is a state of matter characterized by the presence of a significant portion of charged particles in any combination of ions or electrons. It is the most abundant form of ordinary matter in the universe, mostly in stars (including th ...

Princeton University Princeton University is a private university, private Ivy League research university in Princeton, New Jersey, United States. Founded in 1746 in Elizabeth, New Jersey, Elizabeth as the College of New Jersey, Princeton is the List of Colonial ...

, supported by a

Hertz Foundation The Fannie and John Hertz Foundation is an American non-profit organization that awards fellowships to Ph.D. students in the applied physical, biological and engineering sciences. The fellowship begins with up to $250,000 of financial support ...

Fellowship starting in 1980. He completed his Ph.D. in theoretical physics in 1986 under the supervision of Nobel laureate Philip W. Anderson.

Backgammon

After completing his Ph.D., he undertook postdoctoral research at the Center for Complex Systems Research,

University of Illinois at Urbana-Champaign The University of Illinois Urbana-Champaign (UIUC, U of I, Illinois, or University of Illinois) is a public land-grant research university in the Champaign–Urbana metropolitan area, Illinois, United States. Established in 1867, it is the f ...

. During this period, he began applying neural networks to games, co-authoring a

NeurIPS The Conference and Workshop on Neural Information Processing Systems (abbreviated as NeurIPS and formerly NIPS) is a machine learning and computational neuroscience conference held every December. Along with ICLR and ICML, it is one of the three ...

paper in 1987 with Terrence Sejnowski on a neural network that learned to play backgammon. By the late 1980s, Tesauro joined

IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...

Thomas J. Watson Research Center The Thomas J. Watson Research Center is the headquarters for IBM Research. Its main laboratory is in Yorktown Heights, New York, 38 miles (61 km) north of New York City. It also operates facilities in Cambridge, Massachusetts and Albany, ...

(IBM Research) as a research scientist, where he would spend several decades, eventually rising to the position of Principal Research Staff Member in AI Science. During late 1980s, he developed '' Neurogammon'', a backgammon program trained on expert human games using

supervised learning In machine learning, supervised learning (SL) is a paradigm where a Statistical model, model is trained using input objects (e.g. a vector of predictor variables) and desired output values (also known as a ''supervisory signal''), which are often ...

. Neurogammon won the backgammon tournament at the 1st Computer Olympiad in 1989, demonstrating the potential of neural networks in game AI. He developed ''

'' during the 1990 to 1998 period, using

, specifically temporal-difference (TD) learning. TD-Gammon learned through self-play, using a neural network to evaluate board positions and improving its strategy over millions of games. The program achieved world-championship-level play, capable of challenging top human players. It is often regarded as an early success of neural networks, machine learning, and RL, and often cited as a precursor in publications on later game-playing systems, such as

AlphaZero AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and Go (game), go. This algorithm uses an approach similar to AlphaGo Zero. On December 5, 2017, the DeepMind ...

. During this period, Tesauro also contributed to

computer chess Computer chess includes both hardware (dedicated computers) and software capable of playing chess. Computer chess provides opportunities for players to practice even in the absence of human opponents, and also provides opportunities for analysi ...

research at IBM, exploring machine learning methods for training

evaluation function An evaluation function, also known as a heuristic evaluation function or static evaluation function, is a function used by game-playing computer programs to estimate the value or goodness of a position (usually at a leaf or terminal node) in a g ...

s, although the main Deep Blue project was led by others. Specifically, some linear evaluation function weights were trained by discretized comparison training. The weights primarily evaluated king safety. Since 2010, he also contributed to

computer Go Computer Go is the field of artificial intelligence (AI) dedicated to creating a computer program that plays the traditional board game Go. The field is sharply divided into two eras. Before 2015, the programs of the era were weak. The best ...

by working on a program called Fuego.

E-commerce

In the late 1990s, Tesauro shifted his focus towards

s and their application in

, such as autonomous "pricebots", which are

software agent In computer science, a software agent is a computer program that acts for a user or another program in a relationship of agency. The term ''agent'' is derived from the Latin ''agere'' (to do): an agreement to act on one's behalf. Such "action on ...

s designed to learn optimal pricing and bidding strategies in electronic marketplaces. Methods included

Q-learning ''Q''-learning is a reinforcement learning algorithm that trains an agent to assign values to its possible actions based on its current state, without requiring a model of the environment ( model-free). It can handle problems with stochastic tra ...

for dynamic pricing strategies (e.g., cooperation or undercutting) in competitive environments. It was an early application of multi-agent reinforcement learning to economic modeling and automated trading. He also explored applying neural networks to

computer virus A computer virus is a type of malware that, when executed, replicates itself by modifying other computer programs and Code injection, inserting its own Computer language, code into those programs. If this replication succeeds, the affected areas ...

detection.

Autonomic computing

From the early 2000s, Tesauro became a key contributor to IBM's

initiative, which aimed to create self-managing IT systems. He applied reinforcement learning to automate tasks like resource allocation, performance tuning, and power management in data centers and distributed systems. Examples include multiple cooperating RL agents that learned to optimize server resources (CPU, memory, power) to meet performance goals or minimize energy consumption. Tesauro is listed as an inventor on numerous U.S. patents, largely focused on autonomic computing and AI applications for systems management, filed primarily between 2004 and 2007. These usually included methods for reward-based learning of system policies, utility-based dynamic resource allocation, and autonomic model transfer in computing systems.

IBM Watson

Around 2009, Tesauro joined the IBM Research team, led by David Ferrucci, that developed

, the question-answering system famous for defeating human champions

Ken Jennings Kenneth Wayne Jennings III (born May 23, 1974) is an American game show host, former contestant, and author. He is best known for his work on the syndicated quiz show ''Jeopardy!'' as a contestant and later its host. Jennings was born in Edm ...

and Brad Rutter on the quiz show ''

Jeopardy! ''Jeopardy!'' is an American television game show created by Merv Griffin. The show is a quiz competition that reverses the traditional question-and-answer format of many quiz shows. Rather than being given questions, contestants are instead g ...

'' in 2011. Tesauro focused on Watson's game strategy components, including algorithms for buzzer timing, clue selection, and wagering decisions (especially for ''Daily Doubles'' and ''

Final Jeopardy! ''Jeopardy!'' is an American television game show created by Merv Griffin. The show is a quiz competition that reverses the traditional question-and-answer format of many quiz shows. Rather than being given questions, contestants are instead g ...

''). He and colleagues developed a Game State Evaluator and used simulation-based optimization, employing techniques from

Bayesian inference Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...

game theory Game theory is the study of mathematical models of strategic interactions. It has applications in many fields of social science, and is used extensively in economics, logic, systems science and computer science. Initially, game theory addressed ...

, dynamic programming, and reinforcement learning to refine Watson's strategic play. These strategic algorithms contributed significantly to Watson's success, enabling it to manage risk effectively and make near-optimal wagering decisions. During this time, Tesauro also continued research in core AI algorithms, co-authoring a paper on Monte Carlo Simulation Balancing with David Silver (later of

DeepMind DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Go ...

) at ICML 2009. After Watson, Tesauro continued research at IBM, on areas such as

deep reinforcement learning {{Short description, Subfield of machine learning Deep reinforcement learning (DRL) is a subfield of machine learning that combines principles of reinforcement learning (RL) and deep learning. It involves training agents to make decisions by interac ...

, hierarchical RL, multi-agent systems, and continual learning.

Honors and awards

Fellow (Class of 1980) * Fellow of the

Association for the Advancement of Artificial Intelligence The Association for the Advancement of Artificial Intelligence (AAAI) is an international Learned society, scientific society devoted to promote research in, and responsible use of, artificial intelligence. AAAI also aims to increase public under ...

(AAAI), elected 2013, "For significant contributions to neural computation, game-playing (Backgammon, Chess and Jeopardy!), autonomic computing, and economic agents." * Fellow of the

Association for Computing Machinery The Association for Computing Machinery (ACM) is a US-based international learned society for computing. It was founded in 1947 and is the world's largest scientific and educational computing society. The ACM is a non-profit professional membe ...

(ACM), elected 2018, "for contributions to reinforcement learning, neural networks, and intelligent autonomous agents."

References

{{reflist, 30em

External links

Gerald Tesauro page
on Chess Programming Wiki.

bibliography at

DBLP DBLP is a computer science bibliography website. Starting in 1993 at Universität Trier in Germany, it grew from a small collection of HTML files and became an organization hosting a database and logic programming bibliography site. Since Novem ...

. American computer scientists Artificial intelligence researchers IBM employees Princeton University alumni University of Maryland, College Park alumni Living people Reinforcement learning Association for the Advancement of Artificial Intelligence Association for Computing Machinery