-
1
-
-
84958264664
-
-
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., et al. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv: 1603.04467.
-
(2016)
Tensorflow: Large-scale Machine Learning on Heterogeneous Distributed Systems
-
-
Abadi, M.1
Agarwal, A.2
Barham, P.3
Brevdo, E.4
Chen, Z.5
Citro, C.6
Corrado, G.S.7
Davis, A.8
Dean, J.9
Devin, M.10
-
2
-
-
33845876447
-
Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization
-
Bakker, B. and Schmidhuber, J. (2004). Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In Proc. of the 8-th Conf. on Intelligent Autonomous Systems, pages 438-445.
-
(2004)
Proc. of the 8-th Conf. on Intelligent Autonomous Systems
, pp. 438-445
-
-
Bakker, B.1
Schmidhuber, J.2
-
3
-
-
85018935382
-
Unifying count-based exploration and intrinsic motivation
-
Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., and Munos, R. (2016). Unifying count-based exploration and intrinsic motivation. In Advances in Neural Information Processing Systems, pages 1471-1479.
-
(2016)
Advances in Neural Information Processing Systems
, pp. 1471-1479
-
-
Bellemare, M.1
Srinivasan, S.2
Ostrovski, G.3
Schaul, T.4
Saxton, D.5
Munos, R.6
-
4
-
-
70049091039
-
Curriculum learning
-
ACM
-
Bengio, Y., Louradour, J., Collobert, R., and Weston, J. (2009). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41-48. ACM.
-
(2009)
Proceedings of the 26th Annual International Conference on Machine Learning
, pp. 41-48
-
-
Bengio, Y.1
Louradour, J.2
Collobert, R.3
Weston, J.4
-
5
-
-
1942470793
-
Multitask learning
-
Springer
-
Caruana, R. (1998). Multitask learning. In Learning to learn, pages 95-133. Springer.
-
(1998)
Learning to Learn
, pp. 95-133
-
-
Caruana, R.1
-
6
-
-
85041963177
-
-
Chebotar, Y., Kalakrishnan, M., Yahya, A., Li, A., Schaal, S., and Levine, S. (2016). Path integral guided policy search. arXiv preprint arXiv: 1610.00529.
-
(2016)
Path Integral Guided Policy Search
-
-
Chebotar, Y.1
Kalakrishnan, M.2
Yahya, A.3
Li, A.4
Schaal, S.5
Levine, S.6
-
8
-
-
85041944275
-
-
Devin, C., Gupta, A., Darrell, T., Abbeel, P., and Levine, S. (2016). Learning modular neural network policies for multi-task and multi-robot transfer. arXiv preprint arXiv: 1609.07088.
-
(2016)
Learning Modular Neural Network Policies for Multi-task and Multi-robot Transfer
-
-
Devin, C.1
Gupta, A.2
Darrell, T.3
Abbeel, P.4
Levine, S.5
-
9
-
-
0027636611
-
Learning and development in neural networks: The importance of starting small
-
Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48(1): 71-99.
-
(1993)
Cognition
, vol.48
, Issue.1
, pp. 71-99
-
-
Elman, J.L.1
-
10
-
-
0036832959
-
Structure in the space of value functions
-
Foster, D. and Dayan, P. (2002). Structure in the space of value functions. Machine Learning, 49(2): 325-346.
-
(2002)
Machine Learning
, vol.49
, Issue.2
, pp. 325-346
-
-
Foster, D.1
Dayan, P.2
-
11
-
-
85045723194
-
-
Graves, A., and Bellemare, M. G., Menick, J., Munos, R., and Kavukcuoglu, K. (2017). Automated curriculum learning for neural networks. arXiv preprint arXiv: 1704.03003.
-
(2017)
Automated Curriculum Learning for Neural Networks
-
-
Graves, A.1
Bellemare, M.G.2
Menick, J.3
Munos, R.4
Kavukcuoglu, K.5
-
12
-
-
84993949467
-
Hybrid computing using a neural network with dynamic external memory
-
Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J., et al. (2016). Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626): 471-476.
-
(2016)
Nature
, vol.538
, Issue.7626
, pp. 471-476
-
-
Graves, A.1
Wayne, G.2
Reynolds, M.3
Harley, T.4
Danihelka, I.5
Grabska-Barwińska, A.6
Colmenarejo, S.G.7
Grefenstette, E.8
Ramalho, T.9
Agapiou, J.10
-
13
-
-
84979289652
-
-
Gu, S., Lillicrap, T., Sutskever, I., and Levine, S. (2016). Continuous deep q-learning with model-based acceleration. arXiv preprint arXiv: 1603.00748.
-
(2016)
Continuous Deep Q-learning with Model-based Acceleration
-
-
Gu, S.1
Lillicrap, T.2
Sutskever, I.3
Levine, S.4
-
14
-
-
85047013727
-
-
Held, D., Geng, X., Florensa, C., and Abbeel, P. (2017). Automatic goal generation for reinforcement learning agents. arXiv preprint arXiv: 1705.06366.
-
(2017)
Automatic Goal Generation for Reinforcement Learning Agents
-
-
Held, D.1
Geng, X.2
Florensa, C.3
Abbeel, P.4
-
15
-
-
85019212563
-
Vime: Variational information maximizing exploration
-
Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., and Abbeel, P. (2016). Vime: Variational information maximizing exploration. In Advances in Neural Information Processing Systems, pages 1109-1117.
-
(2016)
Advances in Neural Information Processing Systems
, pp. 1109-1117
-
-
Houthooft, R.1
Chen, X.2
Duan, Y.3
Schulman, J.4
De Turck, F.5
Abbeel, P.6
-
17
-
-
84868358933
-
Reinforcement learning to adjust parametrized motor primitives to new situations
-
Kober, J., Wilhelm, A., Oztop, E., and Peters, J. (2012). Reinforcement learning to adjust parametrized motor primitives to new situations. Autonomous Robots, 33(4): 361-379.
-
(2012)
Autonomous Robots
, vol.33
, Issue.4
, pp. 361-379
-
-
Kober, J.1
Wilhelm, A.2
Oztop, E.3
Peters, J.4
-
19
-
-
84943767635
-
-
arXiv preprint
-
Levine, S., Finn, C., Darrell, T., and Abbeel, P. (2015). End-to-end training of deep visuomotor policies. arXiv preprint arXiv: 1504.00702.
-
(2015)
End-to-end Training of Deep Visuomotor Policies
-
-
Levine, S.1
Finn, C.2
Darrell, T.3
Abbeel, P.4
-
20
-
-
84965135289
-
-
Lillicrap, T. P., and Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv: 1509.02971.
-
(2015)
Continuous Control with Deep Reinforcement Learning
-
-
Lillicrap, T.P.1
Hunt, J.J.2
Pritzel, A.3
Heess, N.4
Erez, T.5
Tassa, Y.6
Silver, D.7
Wierstra, D.8
-
21
-
-
0000123778
-
Self-improving reactive agents based on reinforcement learning, planning and teaching
-
Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, 8(3-4): 293-321.
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 293-321
-
-
Lin, L.-J.1
-
22
-
-
85046994307
-
-
Metz, L., Ibarz, J., Jaitly, N., and Davidson, J. (2017). Discrete sequential prediction of continuous actions for deep rl. arXiv preprint arXiv: 1705.05035.
-
(2017)
Discrete Sequential Prediction of Continuous Actions for Deep Rl
-
-
Metz, L.1
Ibarz, J.2
Jaitly, N.3
Davidson, J.4
-
23
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., and Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Bellemare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
-
24
-
-
33845397555
-
Autonomous inverted helicopter flight via reinforcement learning
-
Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., and Liang, E. (2006). Autonomous inverted helicopter flight via reinforcement learning. Experimental Robotics IX, pages 363-372.
-
(2006)
Experimental Robotics IX
, pp. 363-372
-
-
Ng, A.1
Coates, A.2
Diel, M.3
Ganapathi, V.4
Schulte, J.5
Tse, B.6
Berger, E.7
Liang, E.8
-
25
-
-
0141596576
-
Policy invariance under reward transformations: Theory and application to reward shaping
-
Ng, A. Y., Harada, D., and Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, Volume 99, pages 278-287.
-
(1999)
ICML
, vol.99
, pp. 278-287
-
-
Ng, A.Y.1
Harada, D.2
Russell, S.3
-
26
-
-
85019259487
-
Deep exploration via bootstrapped dqn
-
Osband, I., Blundell, C., Pritzel, A., and Van Roy, B. (2016). Deep exploration via bootstrapped dqn. In Advances In Neural Information Processing Systems, pages 4026-4034.
-
(2016)
Advances in Neural Information Processing Systems
, pp. 4026-4034
-
-
Osband, I.1
Blundell, C.2
Pritzel, A.3
Van Roy, B.4
-
27
-
-
85031109810
-
-
Ostrovski, G., and Bellemare, M. G., Oord, A. v. d., and Munos, R. (2017). Count-based exploration with neural density models. arXiv preprint arXiv: 1703.01310.
-
(2017)
Count-based Exploration with Neural Density Models
-
-
Ostrovski, G.1
Bellemare, M.G.2
Oord, A.V.D.3
Munos, R.4
-
28
-
-
44949241322
-
Reinforcement learning of motor skills with policy gradients
-
Peters, J. and Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients. Neural networks, 21(4): 682-697.
-
(2008)
Neural Networks
, vol.21
, Issue.4
, pp. 682-697
-
-
Peters, J.1
Schaal, S.2
-
30
-
-
85041959813
-
-
Popov, I., Heess, N., Lillicrap, T., Hafner, R., Barth-Maron, G., Vecerik, M., Lampe, T., Tassa, Y., Erez, T., and Riedmiller, M. (2017). Data-efficient deep reinforcement learning for dexterous manipulation. arXiv preprint arXiv: 1704.03073.
-
(2017)
Data-efficient Deep Reinforcement Learning for Dexterous Manipulation
-
-
Popov, I.1
Heess, N.2
Lillicrap, T.3
Hafner, R.4
Barth-Maron, G.5
Vecerik, M.6
Lampe, T.7
Tassa, Y.8
Erez, T.9
Riedmiller, M.10
-
31
-
-
84969760283
-
Universal value function approximators
-
Schaul, T., Horgan, D., Gregor, K., and Silver, D. (2015a). Universal value function approximators. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 1312-1320.
-
(2015)
Proceedings of the 32nd International Conference on Machine Learning (ICML-15)
, pp. 1312-1320
-
-
Schaul, T.1
Horgan, D.2
Gregor, K.3
Silver, D.4
-
32
-
-
84980041049
-
-
Schaul, T., Quan, J., Antonoglou, I., and Silver, D. (2015b). Prioritized experience replay. arXiv preprint arXiv: 1511.05952.
-
(2015)
Prioritized Experience Replay
-
-
Schaul, T.1
Quan, J.2
Antonoglou, I.3
Silver, D.4
-
33
-
-
1642328943
-
Optimal ordered problem solver
-
Schmidhuber, J. (2004). Optimal ordered problem solver. Machine Learning, 54(3): 211-254.
-
(2004)
Machine Learning
, vol.54
, Issue.3
, pp. 211-254
-
-
Schmidhuber, J.1
-
34
-
-
84906338085
-
Powerplay: Training an increasingly general problem solver by continually searching for the simplest still unsolvable problem
-
Schmidhuber, J. (2013). Powerplay: Training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Frontiers in psychology, 4.
-
(2013)
Frontiers in Psychology
, pp. 4
-
-
Schmidhuber, J.1
-
36
-
-
84963949906
-
Mastering the game of go with deep neural networks and tree search
-
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587): 484-489.
-
(2016)
Nature
, vol.529
, Issue.7587
, pp. 484-489
-
-
Silver, D.1
Huang, A.2
Maddison, C.J.3
Guez, A.4
Sifre, L.5
Van Den Driessche, G.6
Schrittwieser, J.7
Antonoglou, I.8
Panneershelvam, V.9
Lanctot, M.10
-
37
-
-
84875895363
-
First experiments with powerplay
-
Srivastava, R. K., and Steunebrink, B. R., and Schmidhuber, J. (2013). First experiments with powerplay. Neural Networks, 41: 130-136.
-
(2013)
Neural Networks
, vol.41
, pp. 130-136
-
-
Srivastava, R.K.1
Steunebrink, B.R.2
Schmidhuber, J.3
-
39
-
-
85046937806
-
-
Sukhbaatar, S., Kostrikov, I., Szlam, A., and Fergus, R. (2017). Intrinsic motivation and automatic curricula via asymmetric self-play. arXiv preprint arXiv: 1703.05407.
-
(2017)
Intrinsic Motivation and Automatic Curricula Via Asymmetric Self-play
-
-
Sukhbaatar, S.1
Kostrikov, I.2
Szlam, A.3
Fergus, R.4
-
40
-
-
84899464022
-
Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
-
International Foundation for Autonomous Agents and Multiagent Systems
-
Sutton, R. S., Modayil, J., Delp, M., Degris, T., and Pilarski, P. M., White, A., and Precup, D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pages 761-768. International Foundation for Autonomous Agents and Multiagent Systems.
-
(2011)
The 10th International Conference on Autonomous Agents and Multiagent Systems
, vol.2
, pp. 761-768
-
-
Sutton, R.S.1
Modayil, J.2
Delp, M.3
Degris, T.4
Pilarski, P.M.5
White, A.6
Precup, D.7
-
41
-
-
85024368720
-
-
Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, X., Duan, Y., Schulman, J., De Turck, F., and Abbeel, P. (2016). # exploration: A study of count-based exploration for deep reinforcement learning. arXiv preprint arXiv: 1611.04717.
-
(2016)
# Exploration: A Study of Count-based Exploration for Deep Reinforcement Learning
-
-
Tang, H.1
Houthooft, R.2
Foote, D.3
Stooke, A.4
Chen, X.5
Duan, Y.6
Schulman, J.7
De Turck, F.8
Abbeel, P.9
-
42
-
-
85044474962
-
-
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world. arXiv preprint arXiv: 1703.06907.
-
(2017)
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
-
-
Tobin, J.1
Fong, R.2
Ray, A.3
Schneider, J.4
Zaremba, W.5
Abbeel, P.6
-
43
-
-
84872292044
-
Mujoco: A physics engine for model-based control
-
IEEE
-
Todorov, E., Erez, T., and Tassa, Y. (2012). Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pages 5026-5033. IEEE.
-
(2012)
Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on
, pp. 5026-5033
-
-
Todorov, E.1
Erez, T.2
Tassa, Y.3
-
44
-
-
85029427578
-
-
Vezhnevets, A. S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., and Kavukcuoglu, K. (2017). Feudal networks for hierarchical reinforcement learning. arXiv preprint arXiv: 1703.01161.
-
(2017)
Feudal Networks for Hierarchical Reinforcement Learning
-
-
Vezhnevets, A.S.1
Osindero, S.2
Schaul, T.3
Heess, N.4
Jaderberg, M.5
Silver, D.6
Kavukcuoglu, K.7
|