共 10 条
[4]
A fast learning algorithm for deep belief nets
[J].
NEURAL COMPUTATION,
2006, 18 (07)
:1527-1554
[6]
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning[J] . Ronald J. Williams.Machine Learning . 1992 (3)
[7]
Prioritized experience replay .2 Schaul T,Quan J,Antonoglou I,Silver D. Proceedings of the 4th International Conference on Learning Representations . 2016
[8]
End-to-end training of deep visuomotor policies .2 LEVINE S,FINN C,DARRELL T,et al. Journal of Machine Learning Research . 2016
[9]
Reinforcement learning with unsupervised auxiliary tasks .2 JADERBERG M,MNIH V,CZARNECKI W,et al. https://arxiv.org/abs/ 1611.05397 .
[10]
Deep reinforcement learning for dialogue generation .2 LI J,MONROE W,RITTER A,et al. https://arxiv.org/abs/ 1707.06347 .