MAXIMAL REWARDS AND EPSILON-OPTIMAL POLICIES IN CONTINUOUS TIME MARKOV DECISION CHAINS

被引:12
作者
LEMBERSKY, MR [1 ]
机构
[1] OREGON STATE UNIV, DEPT STATISTICS, CORVALLIS, OR 97331 USA
关键词
D O I
10.1214/aos/1176342621
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
引用
收藏
页码:159 / 169
页数:11
相关论文
共 23 条
[1]  
[Anonymous], SOVIET MATH DOKLADY
[2]  
Bellman R. E., 2010, Dynamic Programming
[3]   DISCRETE DYNAMIC-PROGRAMMING [J].
BLACKWELL, D .
ANNALS OF MATHEMATICAL STATISTICS, 1962, 33 (02) :719-&
[4]   ON THE ITERATIVE METHOD OF DYNAMIC-PROGRAMMING ON A FINITE SPACE DISCRETE-TIME MARKOV PROCESS [J].
BROWN, BW .
ANNALS OF MATHEMATICAL STATISTICS, 1965, 36 (04) :1279-1285
[5]  
Doob Joseph L., 1953, Wiley Publications in Statistics
[6]  
Howard R., 1960, Dynamic programming and Markov processes
[7]  
KARLIN S, 1966, FIRST COURSE STOCHAS
[8]  
Kemeny JG., 1961, THEORY PROBAB APPL, V6, P101, DOI [DOI 10.1137/1106012, 10.1137/1106012]
[9]  
LANERY E, 1967, REV FR INFORM RECH O, V1, P3
[10]  
LANERY E, 1968, COMPLEMENTS ETUDE AS