A unified analysis of value-function-based reinforcement-learning algorithms

被引：122

作者：

Szepesvári, C

Littman, ML

机构：

[1] Mindmaker Ltd, H-1121 Budapest, Hungary

[2] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA

来源：

NEURAL COMPUTATION | 1999年 / 11卷 / 08期

关键词：

D O I：

10.1162/089976699300016070

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinfarcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.

引用

页码：2017 / 2060

页数：44

共 46 条

[1]

[Anonymous], 1994, ON LINE Q LEARNING U

[2]

[Anonymous], 1982, GAME THEORY

[3]

[Anonymous], NUCCS9311

[4]

[Anonymous], PROC ICML

[5]

Barto A.G., 1989, 8995 U MASS DEP COMP

[6] LEARNING TO ACT USING REAL-TIME DYNAMIC-PROGRAMMING [J].

BARTO, AG ;

BRADTKE, SJ ;

SINGH, SP .

ARTIFICIAL INTELLIGENCE, 1995, 72 (1-2) :81-138

[7]

Benveniste A, 1990, Adaptive algorithms and stochastic approximations

[8]

Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st

[9]

Bertsekas Dimitri P., 1989, PARALLEL DISTRIBUTED

[10] ADAPTIVE AGGREGATION METHODS FOR INFINITE HORIZON DYNAMIC-PROGRAMMING [J].

BERTSEKAS, DP ;

CASTANON, DA .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1989, 34 (06) :589-598

← 1 2 3 4 5 →