A unified analysis of value-function-based reinforcement-learning algorithms

被引:122
作者
Szepesvári, C
Littman, ML
机构
[1] Mindmaker Ltd, H-1121 Budapest, Hungary
[2] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA
关键词
D O I
10.1162/089976699300016070
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinfarcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.
引用
收藏
页码:2017 / 2060
页数:44
相关论文
共 46 条
[1]  
[Anonymous], 1994, ON LINE Q LEARNING U
[2]  
[Anonymous], 1982, GAME THEORY
[3]  
[Anonymous], NUCCS9311
[4]  
[Anonymous], PROC ICML
[5]  
Barto A.G., 1989, 8995 U MASS DEP COMP
[6]   LEARNING TO ACT USING REAL-TIME DYNAMIC-PROGRAMMING [J].
BARTO, AG ;
BRADTKE, SJ ;
SINGH, SP .
ARTIFICIAL INTELLIGENCE, 1995, 72 (1-2) :81-138
[7]  
Benveniste A, 1990, Adaptive algorithms and stochastic approximations
[8]  
Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st
[9]  
Bertsekas Dimitri P., 1989, PARALLEL DISTRIBUTED
[10]   ADAPTIVE AGGREGATION METHODS FOR INFINITE HORIZON DYNAMIC-PROGRAMMING [J].
BERTSEKAS, DP ;
CASTANON, DA .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1989, 34 (06) :589-598