SCOPUS 정보 검색 플랫폼

6th International Conference on Learning Representations, ICLR 2018 - Workshop Track Proceedings

Volumn , Issue , 2018, Pages

The mirage of action-dependent baselines in reinforcement learning

(6) Tucker, George a Bhupatiraju, Surya a Gu, Shixiang a,b,c Turner, Richard E b Ghahramani, Zoubin b,e Levine, Sergey a,d

a GOOGLE INC (United States)

b UNIVERSITY OF CAMBRIDGE (United Kingdom)

c MAX PLANCK INSTITUTE FOR INTELLIGENT SYSTEMS (Germany)

d UNIVERSITY OF CALIFORNIA (United States)

e Uber AI Labs (United States)

Author keywords

[No Author keywords available]

Indexed keywords

DECISION MAKING; GRADIENT METHODS; MACHINE LEARNING; OPEN SOURCE SOFTWARE; OPEN SYSTEMS;

BENCHMARK DOMAINS; FUNCTION APPROXIMATORS; GRADIENT ESTIMATES; GRADIENT ESTIMATOR; OPEN-SOURCE CODE; POLICY GRADIENT METHODS; SEQUENTIAL DECISION MAKING; VARIANCE REDUCTIONS;

REINFORCEMENT LEARNING;

EID: 85083951605 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (128)

References (25)

1
- 85019241632
- Benchmarking deep reinforcement learning for continuous control
- Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning, pp. 1329–1338, 2016.
- (2016) International Conference on Machine Learning , pp. 1329-1338
- Duan, Y.¹ Chen, X.² Houthooft, R.³ Schulman, J.⁴ Abbeel, P.⁵

2
- 85083950952
- Backpropagation through the void: Optimizing control variates for black-box gradient estimation
- Will Grathwohl, Dami Choi, Yuhuai Wu, Geoff Roeder, and David Duvenaud. Backpropagation through the void: Optimizing control variates for black-box gradient estimation. International Conference on Learning Representations (ICLR), 2018.
- (2018) International Conference on Learning Representations (ICLR)
- Grathwohl, W.¹ Choi, D.² Wu, Y.³ Roeder, G.⁴ Duvenaud, D.⁵

3
- 85047008184
- Audrunas Gruslys, Mohammad Gheshlaghi Azar, Marc G Bellemare, and Remi Munos. The reactor: A sample-efficient actor-critic architecture. arXiv preprint arXiv:1704.04651, 2017.
- (2017) The Reactor: A Sample-Efficient Actor-Critic Architecture
- Gruslys, A.¹ Azar, M.G.² Bellemare, M.G.³ Munos, R.⁴

4
- 85047014445
- Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning
- Shixiang Gu, Tim Lillicrap, Richard E Turner, Zoubin Ghahramani, Bernhard Schölkopf, and Sergey Levine. Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning. In Advances in Neural Information Processing Systems, pp. 3849–3858, 2017a.
- (2017) Advances in Neural Information Processing Systems , pp. 3849-3858
- Gu, S.¹ Lillicrap, T.² Turner, R.E.³ Ghahramani, Z.⁴ Schölkopf, B.⁵ Levine, S.⁶

5
- 85041942380
- Q-prop: Sample-efficient policy gradient with an off-policy critic
- Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E Turner, and Sergey Levine. Q-prop: Sample-efficient policy gradient with an off-policy critic. International Conference on Learning Representations (ICLR), 2017b.
- (2017) International Conference on Learning Representations (ICLR)
- Gu, S.¹ Lillicrap, T.² Ghahramani, Z.³ Turner, R.E.⁴ Levine, S.⁵

6
- 85161982655
- On a connection between importance sampling and the likelihood ratio policy gradient
- Tang Jie and Pieter Abbeel. On a connection between importance sampling and the likelihood ratio policy gradient. In Advances in Neural Information Processing Systems, pp. 1000–1008, 2010.
- (2010) Advances in Neural Information Processing Systems , pp. 1000-1008
- Jie, T.¹ Abbeel, P.²

7
- 84898930479
- A natural policy gradient
- Sham M Kakade. A natural policy gradient. In Advances in Neural Information Processing Systems, pp. 1531–1538, 2002.
- (2002) Advances in Neural Information Processing Systems , pp. 1531-1538
- Kakade, S.M.¹

8
- 84919810317
- Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- (2013) Auto-Encoding Variational Bayes
- Kingma, D.P.¹ Welling, M.²

9
- 85083952784
- Action-dependent control variates for policy optimization via Stein identity
- Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, and Qiang Liu. Action-dependent control variates for policy optimization via stein identity. International Conference on Learning Representations (ICLR), 2018.
- (2018) International Conference on Learning Representations (ICLR)
- Liu, H.¹ Feng, Y.² Mao, Y.³ Zhou, D.⁴ Peng, J.⁵ Liu, Q.⁶

10
- 84937852305
- Andriy Mnih and Karol Gregor. Neural variational inference and learning in belief networks. arXiv preprint arXiv:1402.0030, 2014.
- (2014) Neural Variational Inference and Learning in Belief Networks
- Mnih, A.¹ Gregor, K.²

11
- 84971448181
- Asynchronous methods for deep reinforcement learning
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pp. 1928–1937, 2016.
- (2016) International Conference on Machine Learning , pp. 1928-1937
- Mnih, V.¹ Badia, A.P.² Mirza, M.³ Graves, A.⁴ Lillicrap, T.⁵ Harley, T.⁶ Silver, D.⁷ Kavukcuoglu, K.⁸

12
- 84904418787
- Monte carlo theory, methods and examples
- Art Owen
- Art B Owen. Monte carlo theory, methods and examples. Monte Carlo Theory, Methods and Examples. Art Owen, 2013.
- (2013) Monte Carlo Theory, Methods and Examples
- Owen, A.B.¹

13
- 34250635407
- Policy gradient methods for robotics
- Jan Peters and Stefan Schaal. Policy gradient methods for robotics. In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pp. 2219–2225. IEEE, 2006.
- (2006) Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on , pp. 2219-2225
- Peters, J.¹ Schaal, S.²

14
- 84919908080
- Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082, 2014.
- (2014) Stochastic Backpropagation and Approximate Inference in Deep Generative Models
- Rezende, D.J.¹ Mohamed, S.² Wierstra, D.³

15
- 84969963490
- Trust region policy optimization
- John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International Conference on Machine Learning, pp. 1889–1897, 2015a.
- (2015) International Conference on Machine Learning , pp. 1889-1897
- Schulman, J.¹ Levine, S.² Abbeel, P.³ Jordan, M.⁴ Moritz, P.⁵

16
- 84993963574
- John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015b.
- (2015) High-Dimensional Continuous Control Using Generalized Advantage Estimation
- Schulman, J.¹ Moritz, P.² Levine, S.³ Jordan, M.⁴ Abbeel, P.⁵

17
- 85041194636
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- (2017) Proximal Policy Optimization Algorithms
- Schulman, J.¹ Wolski, F.² Dhariwal, P.³ Radford, A.⁴ Klimov, O.⁵

18
- 84919793697
- Deterministic policy gradient algorithms
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In ICML, 2014.
- (2014) ICML
- Silver, D.¹ Lever, G.² Heess, N.³ Degris, T.⁴ Wierstra, D.⁵ Riedmiller, M.⁶

19
- 0004102479
- MIT Press Cambridge
- Richard S Sutton and Andrew G Barto. Reinforcement Learning: An Introduction, volume 1. MIT Press Cambridge, 1998.
- (1998) Reinforcement Learning: An Introduction , vol.1
- Sutton, R.S.¹ Barto, A.G.²

20
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, pp. 1057–1063, 2000.
- (2000) Advances in Neural Information Processing Systems , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.P.³ Mansour, Y.⁴

21
- 85035116867
- Bias in natural actor-critic algorithms
- Philip Thomas. Bias in natural actor-critic algorithms. In International Conference on Machine Learning, pp. 441–448, 2014.
- (2014) International Conference on Machine Learning , pp. 441-448
- Thomas, P.¹

22
- 85057234218
- Philip S Thomas and Emma Brunskill. Policy gradient methods for reinforcement learning with function approximation and action-dependent baselines. arXiv preprint arXiv:1706.06643, 2017.
- (2017) Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines
- Thomas, P.S.¹ Brunskill, E.²

23
- 21444437925
- The optimal reward baseline for gradient-based reinforcement learning
- Morgan Kaufmann Publishers Inc
- Lex Weaver and Nigel Tao. The optimal reward baseline for gradient-based reinforcement learning. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pp. 538–545. Morgan Kaufmann Publishers Inc., 2001.
- (2001) Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence , pp. 538-545
- Weaver, L.¹ Tao, N.²

24
- 84941874233
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Springer
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. In Reinforcement Learning, pp. 5–32. Springer, 1992.
- (1992) Reinforcement Learning , pp. 5-32
- Williams, R.J.¹

25
- 85083951478
- Variance reduction for policy gradient with action-dependent factorized baselines
- Cathy Wu, Aravind Rajeswaran, Yan Duan, Vikash Kumar, Alexandre M Bayen, Sham Kakade, Igor Mordatch, and Pieter Abbeel. Variance reduction for policy gradient with action-dependent factorized baselines. International Conference on Learning Representations (ICLR), 2018.
- (2018) International Conference on Learning Representations (ICLR)
- Wu, C.¹ Rajeswaran, A.² Duan, Y.³ Kumar, V.⁴ Bayen, A.M.⁵ Kakade, S.⁶ Mordatch, I.⁷ Abbeel, P.⁸

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.