-
1
-
-
84998969754
-
The arcade learning environment: An evaluation platform for general agents
-
Bellemare, Marc G, Naddaf, Yavar, Veness, Joel, and Bowling, Michael. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 2012.
-
(2012)
Journal of Artificial Intelligence Research
-
-
Bellemare, M.G.1
Naddaf, Y.2
Veness, J.3
Bowling, M.4
-
2
-
-
85007236718
-
Increasing the action gap: New operators for reinforcement learning
-
Bellemare, Marc G., Ostrovski, Georg, Guez, Arthur, Thomas, Philip S., and Munos, Remi. Increasing the action gap: New operators for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2016.
-
(2016)
Proceedings of the AAAI Conference on Artificial Intelligence
-
-
Bellemare, M.G.1
Ostrovski, G.2
Guez, A.3
Thomas, P.S.4
Munos, R.5
-
4
-
-
85010288847
-
Distributed deep q-learning
-
June
-
Chavez, Kevin, Ong, Hao Yi, and Hong, Augustus. Distributed deep q-learning. Technical report, Stanford University, June 2015.
-
(2015)
Technical Report, Stanford University
-
-
Chavez, K.1
Ong, H.Y.2
Hong, A.3
-
5
-
-
84869424969
-
Model-free reinforcement learning with continuous action in practice
-
IEEE
-
Degris, Thomas, Pilarski, Patrick M, and Sutton, Richard S. Model-free reinforcement learning with continuous action in practice. In American Control Conference (ACC), 2012, pp. 2177-2182. IEEE, 2012.
-
(2012)
American Control Conference (ACC)
, pp. 2177-2182
-
-
Degris, T.1
Pilarski, P.M.2
Sutton, R.S.3
-
7
-
-
84905695541
-
Evolving deep unsupervised convolutional networks for vision-based reinforcement learning
-
ACM
-
Koutnik, Jan, Schmidhuber, Jiirgen, and Gomez, Faustino. Evolving deep unsupervised convolutional networks for vision-based reinforcement learning. In Proceedings of the 2014 conference on Genetic and evolutionary computation, pp. 541-548. ACM, 2014.
-
(2014)
Proceedings of the 2014 Conference on Genetic and Evolutionary Computation
, pp. 541-548
-
-
Koutnik, J.1
Schmidhuber, J.2
Gomez, F.3
-
8
-
-
84943767635
-
-
arXiv preprint arXiv:1504.00702
-
Levine, Sergey, Finn, Chelsea, Darrell, Trevor, and Abbeel, Pieter. End-to-end training of deep visuomotor policies. arXiv preprint arXiv:1504.00702, 2015.
-
(2015)
End-to-end Training of Deep Visuomotor Policies
-
-
Levine, S.1
Finn, C.2
Darrell, T.3
Abbeel, P.4
-
9
-
-
84861705660
-
Mapreduce for parallel reinforcement learning
-
EWRL 2011, Athens, Greece, September 9-11, 2011, Revised Selected Papers
-
Li, Yuxi and Schuurmans, Dale. Mapreduce for parallel reinforcement learning. In Recent Advances in Reinforcement Learning - 9th European Workshop, EWRL 2011, Athens, Greece, September 9-11, 2011, Revised Selected Papers, pp. 309-320, 2011.
-
(2011)
Recent Advances in Reinforcement Learning - 9th European Workshop
, pp. 309-320
-
-
Li, Y.1
Schuurmans, D.2
-
10
-
-
84904867557
-
Playing atari with deep reinforcement learning
-
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Graves, Alex, Antonoglou, Ioannis, Wierstra, Daan, and Riedmiller, Martin. Playing atari with deep reinforcement learning. In NIPS Deep Learning Workshop. 2013.
-
(2013)
NIPS Deep Learning Workshop
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Graves, A.4
Antonoglou, I.5
Wierstra, D.6
Riedmiller, M.7
-
11
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
02, URL
-
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A., Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, Petersen, Stig, Beattie, Charles, Sadik, Amir, Antonoglou, Ioannis, King, Helen, Kumaran, Dharshan, Wierstra, Daan, Legg, Shane, and Hassabis, Demis. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 02 2015. URL http://dx.doi.org/10.1038/naturel4236.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Bellemare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
Petersen, S.11
Beattie, C.12
Sadik, A.13
Antonoglou, I.14
King, H.15
Kumaran, D.16
Wierstra, D.17
Legg, S.18
Hassabis, D.19
-
12
-
-
84980007683
-
Massively parallel methods for deep reinforcement learning
-
Nair, Arun, Srinivasan, Praveen, Blackwell, Sam, Alcicek, Cagdas, Fearon, Rory, Maria, Alessandro De, Pan- neershelvam, Vedavyas, Suleyman, Mustafa, Beattie, Charles, Petersen, Stig, Legg, Shane, Mnih, Volodymyr, Kavukcuoglu, Koray, and Silver, David. Massively parallel methods for deep reinforcement learning. In ICML Deep Learning Workshop. 2015.
-
(2015)
ICML Deep Learning Workshop
-
-
Nair, A.1
Srinivasan, P.2
Blackwell, S.3
Alcicek, C.4
Fearon, R.5
Maria, A.D.6
Pan-Neershelvam, V.7
Suleyman, M.8
Beattie, C.9
Petersen, S.10
Legg, S.11
Mnih, V.12
Kavukcuoglu, K.13
Silver, D.14
-
13
-
-
0000955979
-
Incremental multi-step q-learning
-
Peng, Jing and Williams, Ronald J. Incremental multi-step q-learning. Machine Learning, 22(1-3):283-290, 1996.
-
(1996)
Machine Learning
, vol.22
, Issue.1-3
, pp. 283-290
-
-
Peng, J.1
Williams, R.J.2
-
14
-
-
85162467517
-
Hogwild: A lock-free approach to parallelizing stochastic gradient descent
-
Recht, Benjamin, Re, Christopher, Wright, Stephen, and Niu, Feng. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems, pp. 693-701, 2011.
-
(2011)
Advances in Neural Information Processing Systems
, pp. 693-701
-
-
Recht, B.1
Re, C.2
Wright, S.3
Niu, F.4
-
15
-
-
33646398129
-
Neural fitted q iteration-first experiences with a data efficient neural reinforcement learning method
-
Springer Berlin Heidelberg
-
Riedmiller, Martin. Neural fitted q iteration-first experiences with a data efficient neural reinforcement learning method. In Machine Learning: ECML 2005, pp. 317- 328. Springer Berlin Heidelberg, 2005.
-
(2005)
Machine Learning: ECML 2005
, pp. 317-328
-
-
Riedmiller, M.1
-
17
-
-
84980041049
-
-
arXiv preprint arXiv: 1511.05952
-
Schaul, Tom, Quan, John, Antonoglou, Ioannis, and Silver, David. Prioritized experience replay. arXiv preprint arXiv: 1511.05952, 2015.
-
(2015)
Prioritized Experience Replay
-
-
Schaul, T.1
Quan, J.2
Antonoglou, I.3
Silver, D.4
-
18
-
-
84969963490
-
Trust region policy optimization
-
Schulman, John, Levine, Sergey, Moritz, Philipp, Jordan, Michael I, and Abbeel, Pieter. Trust region policy optimization. In International Conference on Machine Learning (ICML), 2015a.
-
(2015)
International Conference on Machine Learning (ICML)
-
-
Schulman, J.1
Levine, S.2
Moritz, P.3
Jordan, M.I.4
Abbeel, P.5
-
19
-
-
84993963574
-
-
arXiv preprint arXiv:1506.02438
-
Schulman, John, Moritz, Philipp, Levine, Sergey, Jordan, Michael, and Abbeel, Pieter. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015b.
-
(2015)
High-dimensional Continuous Control Using Generalized Advantage Estimation
-
-
Schulman, J.1
Moritz, P.2
Levine, S.3
Jordan, M.4
Abbeel, P.5
-
21
-
-
84893343292
-
Lecture 6.5- rmsprop: Divide the gradient by a running average of its recent magnitude
-
Tieleman, Tijmen and Hinton, Geoffrey. Lecture 6.5- rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4, 2012.
-
(2012)
COURSERA: Neural Networks for Machine Learning
, vol.4
-
-
Tieleman, T.1
Hinton, G.2
-
25
-
-
84998527939
-
-
ArXiv e-prints, December
-
van Seijen, H., Rupam Mahmood, A., Pilarski, P. M., Machado, M. C., and Sutton, R. S. True Online Temporal-Difference Learning. ArXiv e-prints, December 2015.
-
(2015)
True Online Temporal-Difference Learning
-
-
Van Seijen, H.1
Rupam Mahmood, A.2
Pilarski, P.M.3
Machado, M.C.4
Sutton, R.S.5
-
26
-
-
84998595997
-
-
ArXiv e-prints, November
-
Wang, Z., de Freitas, N., and Lanctot, M. Dueling Network Architectures for Deep Reinforcement Learning. ArXiv e-prints, November 2015.
-
(2015)
Dueling Network Architectures for Deep Reinforcement Learning
-
-
Wang, Z.1
De Freitas, N.2
Lanctot, M.3
-
28
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3):229-256, 1992.
-
(1992)
Machine Learning
, vol.8
, Issue.3
, pp. 229-256
-
-
Williams, R.J.1
-
29
-
-
0041154467
-
Function optimization using connectionist reinforcement learning algorithms
-
Williams, Ronald J and Peng, Jing. Function optimization using connectionist reinforcement learning algorithms. Connection Science, 3(3):241-268, 1991.
-
(1991)
Connection Science
, vol.3
, Issue.3
, pp. 241-268
-
-
Williams, R.J.1
Peng, J.2
-
30
-
-
84998666375
-
-
Wymann, B., Espi, E., Guionneau, C., Dimitrakakis, C., Coulom, R., and Sumner, A. Tores: The open racing car simulator, 3.5, 2013.
-
(2013)
Tores: The Open Racing Car Simulator
, vol.3
, Issue.5
-
-
Wymann, B.1
Espi, E.2
Guionneau, C.3
Dimitrakakis, C.4
Coulom, R.5
Sumner, A.6
|