共 10 条
[1]
Antos A.(2008)Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path Machine Learning 71 89-129
[2]
Szepesvári C.(1996)Linear least-squares algorithms for temporal difference learning Machine Learning 22 33-57
[3]
Munos R.(2003)Least-squares policy iteration Journal of Machine Learning Research 4 1107-1149
[4]
Bradtke S.(1985)Generalized polynomial approximations in Markovian decision processes Journal of Mathematical Analysis and Applications 110 568-582
[5]
Barto A.(1988)Learning to predict by the methods of temporal differences Machine Learning 3 9-44
[6]
Lagoudakis M.(undefined)undefined undefined undefined undefined-undefined
[7]
Parr R.(undefined)undefined undefined undefined undefined-undefined
[8]
Schweitzer P.(undefined)undefined undefined undefined undefined-undefined
[9]
Seidmann A.(undefined)undefined undefined undefined undefined-undefined
[10]
Sutton R.(undefined)undefined undefined undefined undefined-undefined