SCOPUS 정보 검색 플랫폼

6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings

Volumn , Issue , 2018, Pages

Variance reduction for policy gradient with action-dependent factorized baselines

(8) Wu, Cathy a Rajeswaran, Aravind b Duan, Yan a,c Kumar, Vikash b Bayen, Alexandre M a,d Kakade, Sham b Mordatch, Igor c Abbeel, Pieter a,c

a UNIVERSITY OF CALIFORNIA (United States)

b UNIVERSITY OF WASHINGTON (United States)

c OpenAI LLC (United States)

d UNIVERSITY OF CALIFORNIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

DEEP LEARNING; GRADIENT METHODS; MACHINE LEARNING; MULTI AGENT SYSTEMS; STOCHASTIC SYSTEMS;

COMPUTATIONALLY EFFICIENT; GRADIENT ESTIMATES; HAND MANIPULATION; HIGH-DIMENSIONAL; NUMERICAL RESULTS; POLICY GRADIENT METHODS; STOCHASTIC POLICY; VARIANCE REDUCTIONS;

REINFORCEMENT LEARNING;

EID: 85083951478 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (138)

References (29)

1
- 84887843747
- Motion editing with independent component analysis
- Yong Cao, Ari Shapiro, Petros Faloutsos, and Frédéric Pighin. Motion editing with independent component analysis. Visual Computer, 2, 2007.
- (2007) Visual Computer , vol.2
- Cao, Y.¹ Shapiro, A.² Faloutsos, P.³ Pighin, F.⁴

2
- 84999018287
- Benchmarking deep reinforcement learning for continuous control
- Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016.
- (2016) Proceedings of the 33rd International Conference on Machine Learning (ICML)
- Duan, Y.¹ Chen, X.² Houthooft, R.³ Schulman, J.⁴ Abbeel, P.⁵

3
- 85046125163
- arXiv preprint
- Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. arXiv preprint arXiv:1705.08926, 2017.
- (2017) Counterfactual Multi-Agent Policy Gradients
- Foerster, J.¹ Farquhar, G.² Afouras, T.³ Nardelli, N.⁴ Whiteson, S.⁵

4
- 84897694817
- Variance reduction techniques for gradient estimates in reinforcement learning
- Nov
- Evan Greensmith, Peter L Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5(Nov):1471–1530, 2004.
- (2004) Journal of Machine Learning Research , vol.5 , pp. 1471-1530
- Greensmith, E.¹ Bartlett, P.L.² Baxter, J.³

5
- 85041942380
- Q-prop: Sample-efficient policy gradient with an off-policy critic
- Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E Turner, and Sergey Levine. Q-prop: Sample-efficient policy gradient with an off-policy critic. In International Conference on Learning Representations (ICLR2017), 2017.
- (2017) International Conference on Learning Representations (ICLR2017)
- Gu, S.¹ Lillicrap, T.² Ghahramani, Z.³ Turner, R.E.⁴ Levine, S.⁵

6
- 84898930479
- A natural policy gradient
- Sham M Kakade. A natural policy gradient. In Advances in neural information processing systems, pp. 1531–1538, 2002.
- (2002) Advances in Neural Information Processing Systems , pp. 1531-1538
- Kakade, S.M.¹

7
- 84898938510
- Actor-critic algorithms
- Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. In Advances in neural information processing systems, pp. 1008–1014, 2000.
- (2000) Advances in Neural Information Processing Systems , pp. 1008-1014
- Konda, V.R.¹ Tsitsiklis, J.N.²

8
- 84979924150
- End-to-end training of deep visuo-motor policies
- Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuo-motor policies. Journal of Machine Learning Research, 17(39):1–40, 2016.
- (2016) Journal of Machine Learning Research , vol.17 , Issue.39 , pp. 1-40
- Levine, S.¹ Finn, C.² Darrell, T.³ Abbeel, P.⁴

9
- 85083953657
- Continuous control with deep reinforcement learning
- Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. In International Conference on Learning Representations (ICLR2016), 2016.
- (2016) International Conference on Learning Representations (ICLR2016)
- Lillicrap, T.P.¹ Hunt, J.J.² Pritzel, A.³ Heess, N.⁴ Erez, T.⁵ Tassa, Y.⁶ Silver, D.⁷ Wierstra, D.⁸

10
- 85041351193
- arXiv preprint
- Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275, 2017.
- (2017) Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
- Lowe, R.¹ Wu, Y.² Tamar, A.³ Harb, J.⁴ Abbeel, P.⁵ Mordatch, I.⁶

11
- 84924051598
- Human-level control through deep reinforcement learning
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Belle-mare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Belle-Mare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰

12
- 84971448181
- Asynchronous methods for deep reinforcement learning
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pp. 1928–1937, 2016.
- (2016) International Conference on Machine Learning , pp. 1928-1937
- Mnih, V.¹ Badia, A.P.² Mirza, M.³ Graves, A.⁴ Lillicrap, T.⁵ Harley, T.⁶ Silver, D.⁷ Kavukcuoglu, K.⁸

13
- 85041963017
- Guided policy search as approximate mirror descent
- William Montgomery and Sergey Levine. Guided policy search as approximate mirror descent. In NIPS, 2016.
- (2016) NIPS
- Montgomery, W.¹ Levine, S.²

14
- 84965182099
- Interactive control of diverse complex characters with neural networks
- Igor Mordatch, Kendall Lowrey, Galen Andrew, Zoran Popovic, and Emanuel Todorov. Interactive Control of Diverse Complex Characters with Neural Networks. In NIPS, 2015.
- (2015) NIPS
- Mordatch, I.¹ Lowrey, K.² Andrew, G.³ Popovic, Z.⁴ Todorov, E.⁵

15
- 40649106649
- Natural actor-critic
- Jan Peters and Stefan Schaal. Natural actor-critic. Neurocomputing, 71(7):1180–1190, 2008.
- (2008) Neurocomputing , vol.71 , Issue.7 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

16
- 77953218689
- Random features for large-scale kernel machines
- Ali Rahimi and Benjamin Recht. Random Features for Large-Scale Kernel Machines. In NIPS, 2007.
- (2007) NIPS
- Rahimi, A.¹ Recht, B.²

17
- 85049877180
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations
- abs/1709.10087
- Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, John Schulman, Emanuel Todorov, and Sergey Levine. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. CoRR, abs/1709.10087, 2017a.
- (2017) CoRR
- Rajeswaran, A.¹ Kumar, V.² Gupta, A.³ Schulman, J.⁴ Todorov, E.⁵ Levine, S.⁶

18
- 85044996392
- Towards generalization and simplicity in continuous control
- Aravind Rajeswaran, Kendall Lowrey, Emanuel Todorov, and Sham Kakade. Towards Generalization and Simplicity in Continuous Control. In NIPS, 2017b.
- (2017) NIPS
- Rajeswaran, A.¹ Lowrey, K.² Todorov, E.³ Kakade, S.⁴

19
- 84969963490
- Trust region policy optimization
- John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 1889–1897, 2015.
- (2015) Proceedings of the 32nd International Conference on Machine Learning (ICML-15) , pp. 1889-1897
- Schulman, J.¹ Levine, S.² Abbeel, P.³ Jordan, M.⁴ Moritz, P.⁵

20
- 85083954383
- High-dimensional continuous control using generalized advantage estimation
- John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. In International Conference on Learning Representations (ICLR2016), 2016.
- (2016) International Conference on Learning Representations (ICLR2016)
- Schulman, J.¹ Moritz, P.² Levine, S.³ Jordan, M.⁴ Abbeel, P.⁵

21
- 84963949906
- Mastering the game of go with deep neural networks and tree search
- David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
- (2016) Nature , vol.529 , Issue.7587 , pp. 484-489
- Silver, D.¹ Huang, A.² Maddison, C.J.³ Guez, A.⁴ Sifre, L.⁵ Van Den Driessche, G.⁶ Schrittwieser, J.⁷ Antonoglou, I.⁸ Panneershelvam, V.⁹ Lanctot, M.¹⁰

22
- 0004102479
- MIT press Cambridge
- Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998.
- (1998) Reinforcement Learning: An Introduction , vol.1
- Sutton, R.S.¹ Barto, A.G.²

23
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pp. 1057–1063, 2000.
- (2000) Advances in Neural Information Processing Systems , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.P.³ Mansour, Y.⁴

24
- 11144267104
- Analysis of the synergies underlying complex hand manipulation
- Emanuel Todorov and Zoubin Ghahramani. Analysis of the synergies underlying complex hand manipulation. In Engineering in Medicine and Biology Society, 2004. IEMBS’04. 26th Annual International Conference of the IEEE, volume 2, pp. 4637–4640. IEEE, 2004.
- (2004) Engineering in Medicine and Biology Society, 2004. IEMBS’04. 26th Annual International Conference of the IEEE , vol.2 , pp. 4637-4640
- Todorov, E.¹ Ghahramani, Z.²

25
- 28044474086
- From task parameters to motor synergies: A hierarchical framework for approximately optimal control of redundant manipulators
- Emanuel Todorov, Weiwei Li, and Xiuchuan Pan. From task parameters to motor synergies: A hierarchical framework for approximately optimal control of redundant manipulators. Journal of Field Robotics, 22(11):691–710, 2005.
- (2005) Journal of Field Robotics , vol.22 , Issue.11 , pp. 691-710
- Todorov, E.¹ Li, W.² Pan, X.³

26
- 84872292044
- MujoCo: A physics engine for model-based control
- Emanuel Todorov, Tom Erez, and Yuval Tassa. MuJoCo: A physics engine for model-based control. In International Conference on Intelligent Robots and Systems, 2012.
- (2012) International Conference on Intelligent Robots and Systems
- Todorov, E.¹ Erez, T.² Tassa, Y.³

27
- 34249833101
- Q-learning
- Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279–292, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

28
- 21444437925
- The optimal reward baseline for gradient-based reinforcement learning
- Morgan Kaufmann Publishers Inc
- Lex Weaver and Nigel Tao. The optimal reward baseline for gradient-based reinforcement learning. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pp. 538–545. Morgan Kaufmann Publishers Inc., 2001.
- (2001) Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence , pp. 538-545
- Weaver, L.¹ Tao, N.²

29
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
- Williams, R.J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.