SCOPUS 정보 검색 플랫폼

5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings

Volumn , Issue , 2017, Pages

Combining policy gradient and q-learning

(4) O'Donoghue, Brendan a Munos, Rémi a Kavukcuoglu, Koray a Mnih, Volodymyr a

a DEEPMIND (United Kingdom)

Author keywords

[No Author keywords available]

Indexed keywords

REINFORCEMENT LEARNING;

ACTION PREFERENCES; ACTOR CRITIC; ACTOR-CRITIC ALGORITHM; FITTING TECHNIQUES; FIXED POINTS; FUNCTION LEARNING; POLICY GRADIENT; Q-LEARNING;

LEARNING ALGORITHMS;

EID: 85088228567 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (90)

References (46)

1
- 0000396062
- Natural gradient works efficiently in learning
- Shun-Ichi Amari. Natural gradient works efficiently in learning. Neural computation, 10(2):251-276, 1998.
- (1998) Neural Computation , vol.10 , Issue.2 , pp. 251-276
- Amari, S.-I.¹

2
- 84870922246
- Dynamic policy programming
- Mohammad Gheshlaghi Azar, Vicenç Gómez, and Hilbert J Kappen. Dynamic policy programming. Journal of Machine Learning Research, 13(Nov):3207-3245, 2012.
- (2012) Journal of Machine Learning Research , vol.13 , pp. 3207-3245
- Azar, M.G.¹ Gómez, V.² Kappen, H.J.³

3
- 84858765598
- Covariant policy search
- J Andrew Bagnell and Jeff Schneider. Covariant policy search. In IJCAI, 2003.
- (2003) IJCAI
- Andrew Bagnell, J.¹ Schneider, J.²

4
- 0004370245
- Technical Wright-Patterson Air Force Base Ohio: Wright Laboratory
- Leemon C Baird III. Advantage updating. Technical Report WL-TR-93-1146, Wright-Patterson Air Force Base Ohio: Wright Laboratory, 1993.
- (1993) Advantage Updating
- Baird, L.C.¹

5
- 84998969754
- The arcade learning environment: An evaluation platform for general agents
- Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 2012.
- (2012) Journal of Artificial Intelligence Research
- Bellemare, M.G.¹ Naddaf, Y.² Veness, J.³ Bowling, M.⁴

6
- 85012688561
- Princeton University Press
- Richard Bellman. Dynamic programming. Princeton University Press, 1957.
- (1957) Dynamic Programming
- Bellman, R.¹

7
- 0003565783
- Athena Scientific
- Dimitri P Bertsekas. Dynamic programming and optimal control, volume 1. Athena Scientific, 2005.
- (2005) Dynamic Programming and Optimal Control , vol.1
- Bertsekas, D.P.¹

8
- 0003487482
- Athena Scientific
- Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

9
- 84925072591
- Thomas Degris, Martha White, and Richard S Sutton. Off-policy actor-critic. 2012.
- (2012) Off-Policy Actor-Critic
- Degris, T.¹ White, M.² Sutton, R.S.³

10
- 84988543707
- arXiv preprint
- Roy Fox, Ari Pakman, and Naftali Tishby. Taming the noise in reinforcement learning via soft updates. arXiv preprint arXiv:1207.4708, 2015.
- (2015) Taming the Noise in Reinforcement Learning Via Soft Updates
- Fox, R.¹ Pakman, A.² Tishby, N.³

11
- 85071013266
- On-policy vs. Off-policy updates for deep reinforcement learning
- Matthew Hausknecht and Peter Stone. On-policy vs. off-policy updates for deep reinforcement learning. Deep Reinforcement Learning: Frontiers and Challenges, IJCAI 2016 Workshop, 2016.
- (2016) Deep Reinforcement Learning: Frontiers and Challenges, IJCAI 2016 Workshop
- Hausknecht, M.¹ Stone, P.²

12
- 84910681911
- Actor-critic reinforcement learning with energy-based policies
- Nicolas Heess, David Silver, and Yee Whye Teh. Actor-critic reinforcement learning with energy-based policies. In JMLR: Workshop and Conference Proceedings 24, pp. 43-57, 2012.
- (2012) JMLR: Workshop and Conference Proceedings , vol.24 , pp. 43-57
- Heess, N.¹ Silver, D.² Teh, Y.W.³

13
- 33646243319
- A natural policy gradient
- Sham Kakade. A natural policy gradient. In Advances in Neural Information Processing Systems, volume 14, pp. 1531-1538, 2001.
- (2001) Advances in Neural Information Processing Systems , vol.14 , pp. 1531-1538
- Kakade, S.¹

14
- 4043069840
- On actor-critic algorithms
- Vijay R Konda and John N Tsitsiklis. On actor-critic algorithms. SIAM Journal on Control and Optimization, 42(4):1143-1166, 2003.
- (2003) SIAM Journal on Control and Optimization , vol.42 , Issue.4 , pp. 1143-1166
- Konda, V.R.¹ Tsitsiklis, J.N.²

15
- 85071015865
- arXiv preprint
- Lucas Lehnert and Doina Precup. Policy gradient methods for off-policy control. arXiv preprint arXiv:1512.04105, 2015.
- (2015) Policy Gradient Methods for Off-Policy Control
- Lehnert, L.¹ Precup, D.²

16
- 84897529781
- Guided policy search
- Sergey Levine and Vladlen Koltun. Guided policy search. In Proceedings of the 30th International Conference on Machine Learning (ICML), pp. 1-9, 2013.
- (2013) Proceedings of the 30th International Conference on Machine Learning (ICML) , pp. 1-9
- Levine, S.¹ Koltun, V.²

17
- 84943767635
- arXiv preprint
- Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuo-motor policies. arXiv preprint arXiv:1504.00702, 2015.
- (2015) End-to-End Training of Deep Visuo-Motor Policies
- Levine, S.¹ Finn, C.² Darrell, T.³ Abbeel, P.⁴

18
- 84965135289
- arXiv preprint
- Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- (2015) Continuous Control with Deep Reinforcement Learning
- Lillicrap, T.P.¹ Hunt, J.J.² Pritzel, A.³ Heess, N.⁴ Erez, T.⁵ Tassa, Y.⁶ Silver, D.⁷ Wierstra, D.⁸

19
- 0003673017
- Reinforcement learning for robots using neural networks
- Long-Ji Lin. Reinforcement learning for robots using neural networks. Technical report, DTIC Document, 1993.
- (1993) Technical Report, DTIC Document
- Lin, L.-J.¹

20
- 84904867557
- Playing atari with deep reinforcement learning
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. In NIPS Deep Learning Workshop. 2013.
- (2013) NIPS Deep Learning Workshop
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Graves, A.⁴ Antonoglou, I.⁵ Wierstra, D.⁶ Riedmiller, M.⁷

21
- 84924051598
- Human-level control through deep reinforcement learning
- 02
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 02 2015. URL http://dx.doi.org/10.1038/nature14236.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰ Petersen, S.¹¹ Beattie, C.¹² Sadik, A.¹³ Antonoglou, I.¹⁴ King, H.¹⁵ Kumaran, D.¹⁶ Wierstra, D.¹⁷ Legg, S.¹⁸ Hassabis, D.¹⁹

22
- 84971448181
- arXiv preprint
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. arXiv preprint arXiv:1602.01783, 2016.
- (2016) Asynchronous Methods for Deep Reinforcement Learning
- Mnih, V.¹ Badia, A.P.² Mirza, M.³ Graves, A.⁴ Lillicrap, T.P.⁵ Harley, T.⁶ Silver, D.⁷ Kavukcuoglu, K.⁸

23
- 85070965865
- arXiv preprint
- Mohammad Norouzi, Samy Bengio, Zhifeng Chen, Navdeep Jaitly, Mike Schuster, Yonghui Wu, and Dale Schuurmans. Reward augmented maximum likelihood for neural structured prediction. arXiv preprint arXiv:1609.00150, 2016.
- (2016) Reward Augmented Maximum Likelihood for Neural Structured Prediction
- Norouzi, M.¹ Bengio, S.² Chen, Z.³ Jaitly, N.⁴ Schuster, M.⁵ Wu, Y.⁶ Schuurmans, D.⁷

24
- 84883177991
- arXiv preprint
- Razvan Pascanu and Yoshua Bengio. Revisiting natural gradient for deep networks. arXiv preprint arXiv:1301.3584, 2013.
- (2013) Revisiting Natural Gradient for Deep Networks
- Pascanu, R.¹ Bengio, Y.²

25
- 0242540456
- Sequential cost-sensitive decision making with reinforcement learning
- Edwin Pednault, Naoki Abe, and Bianca Zadrozny. Sequential cost-sensitive decision making with reinforcement learning. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 259-268. ACM, 2002.
- (2002) Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pp. 259-268
- Pednault, E.¹ Abe, N.² Zadrozny, B.³

26
- 0000955979
- Incremental multi-step Q-learning
- Jing Peng and Ronald J Williams. Incremental multi-step Q-learning. Machine Learning, 22(1-3): 283-290, 1996.
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 283-290
- Peng, J.¹ Williams, R.J.²

27
- 85167411371
- Relative entropy policy search
- Atlanta
- Jan Peters, Katharina Mülling, and Yasemin Altun. Relative entropy policy search. In AAAI. Atlanta, 2010.
- (2010) AAAI
- Peters, J.¹ Mülling, K.² Altun, Y.³

28
- 33646398129
- Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method
- Springer Berlin Heidelberg
- Martin Riedmiller. Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In Machine Learning: ECML 2005, pp. 317-328. Springer Berlin Heidelberg, 2005.
- (2005) Machine Learning: ECML 2005 , pp. 317-328
- Riedmiller, M.¹

29
- 0003636089
- Gavin A Rummery and Mahesan Niranjan. On-line Q-learning using connectionist systems. 1994.
- (1994) On-Line Q-Learning Using Connectionist Systems
- Rummery, G.A.¹ Niranjan, M.²

30
- 32844474095
- Reinforcement learning with factored states and actions
- Aug
- Brian Sallans and Geoffrey E Hinton. Reinforcement learning with factored states and actions. Journal of Machine Learning Research, 5(Aug):1063-1088, 2004.
- (2004) Journal of Machine Learning Research , vol.5 , pp. 1063-1088
- Sallans, B.¹ Hinton, G.E.²

31
- 84980041049
- arXiv preprint
- Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
- (2015) Prioritized Experience Replay
- Schaul, T.¹ Quan, J.² Antonoglou, I.³ Silver, D.⁴

32
- 84969963490
- Trust region policy optimization
- John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In Proceedings of The 32nd International Conference on Machine Learning, pp. 1889-1897, 2015.
- (2015) Proceedings of the 32nd International Conference on Machine Learning , pp. 1889-1897
- Schulman, J.¹ Levine, S.² Abbeel, P.³ Jordan, M.⁴ Moritz, P.⁵

33
- 84919793697
- Deterministic policy gradient algorithms
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on Machine Learning (ICML), pp. 387-395, 2014.
- (2014) Proceedings of the 31st International Conference on Machine Learning (ICML) , pp. 387-395
- Silver, D.¹ Lever, G.² Heess, N.³ Degris, T.⁴ Wierstra, D.⁵ Riedmiller, M.⁶

34
- 84963949906
- Mastering the game of go with deep neural networks and tree search
- David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016.
- (2016) Nature , vol.529 , Issue.7587 , pp. 484-489
- Silver, D.¹ Huang, A.² Maddison, C.J.³ Guez, A.⁴ Sifre, L.⁵ Van Den Driessche, G.⁶ Schrittwieser, J.⁷ Antonoglou, I.⁸ Panneershelvam, V.⁹ Lanctot, M.¹⁰

35
- 0004102479
- MIT Press
- R. Sutton and A. Barto. Reinforcement Learning: an Introduction. MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.¹ Barto, A.²

36
- 33847202724
- Learning to predict by the methods of temporal differences
- Richard S Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3 (1):9-44, 1988.
- (1988) Machine Learning , vol.3 , Issue.1 , pp. 9-44
- Sutton, R.S.¹

37
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et al. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, volume 99, pp. 1057-1063, 1999.
- (1999) Advances in Neural Information Processing Systems , vol.99 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.P.³ Mansour, Y.⁴

38
- 0029276036
- Temporal difference learning and TD-Gammon
- Gerald Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38 (3):58-68, 1995.
- (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
- Tesauro, G.¹

39
- 85035116867
- Bias in natural actor-critic algorithms
- Philip Thomas. Bias in natural actor-critic algorithms. In Proceedings of The 31st International Conference on Machine Learning, pp. 441-448, 2014.
- (2014) Proceedings of the 31st International Conference on Machine Learning , pp. 441-448
- Thomas, P.¹

40
- 85007210890
- Deep reinforcement learning with double Q-learning
- Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double Q-learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), pp. 2094-2100, 2016.
- (2016) Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) , pp. 2094-2100
- Van Hasselt, H.¹ Guez, A.² Silver, D.³

41
- 67650505307
- A theoretical and empirical analysis of expected sarsa
- Harm Van Seijen, Hado Van Hasselt, Shimon Whiteson, and Marco Wiering. A theoretical and empirical analysis of expected sarsa. In 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp. 177-184. IEEE, 2009.
- (2009) 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning , pp. 177-184
- Van Seijen, H.¹ Van Hasselt, H.² Whiteson, S.³ Wiering, M.⁴

42
- 84888336880
- Backward q-learning: The combination of sarsa algorithm and q-learning
- Yin-Hao Wang, Tzuu-Hseng S Li, and Chih-Jui Lin. Backward q-learning: The combination of sarsa algorithm and q-learning. Engineering Applications of Artificial Intelligence, 26(9):2184-2193, 2013.
- (2013) Engineering Applications of Artificial Intelligence , vol.26 , Issue.9 , pp. 2184-2193
- Wang, Y.-H.¹ Li, T.-H.S.² Lin, C.-J.³

43
- 84998996757
- Dueling network architectures for deep reinforcement learning
- Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, and Nando de Freitas. Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 1995-2003, 2016.
- (2016) Proceedings of the 33rd International Conference on Machine Learning (ICML) , pp. 1995-2003
- Wang, Z.¹ Schaul, T.² Hessel, M.³ Van Hasselt, H.⁴ Lanctot, M.⁵ De Freitas, N.⁶

44
- 0004049893
- PhD thesis, University of Cambridge England
- Christopher John Cornish Hellaby Watkins. Learning from delayed rewards. PhD thesis, University of Cambridge England, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

45
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
- Williams, R.J.¹

46
- 0041154467
- Function optimization using connectionist reinforcement learning algorithms
- Ronald J Williams and Jing Peng. Function optimization using connectionist reinforcement learning algorithms. Connection Science, 3(3):241-268, 1991.
- (1991) Connection Science , vol.3 , Issue.3 , pp. 241-268
- Williams, R.J.¹ Peng, J.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.