SCOPUS 정보 검색 플랫폼

4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings

Volumn , Issue , 2016, Pages

High-dimensional continuous control using generalized advantage estimation

(5) Schulman, John a Moritz, Philipp a Levine, Sergey a Jordan, Michael I a Abbeel, Pieter a

a UNIVERSITY OF CALIFORNIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BIPED LOCOMOTION; GRADIENT METHODS; THREE DIMENSIONAL COMPUTER GRAPHICS;

CONTINUOUS CONTROL; HIGH-DIMENSIONAL; NON-STATIONARITIES; NONLINEAR FUNCTIONS; NUMBER OF SAMPLES; OPTIMIZATION PROCEDURES; POLICY GRADIENT METHODS; WEIGHTED ESTIMATOR;

REINFORCEMENT LEARNING;

EID: 85083954383 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (1174)

References (25)

1
- 0020970738
- Neuronlike adaptive elements that can solve difficult learning control problems
- Barto, Andrew G, Sutton, Richard S, and Anderson, Charles W. Neuronlike adaptive elements that can solve difficult learning control problems. Systems, Man and Cybernetics, IEEE Transactions on, (5):834–846, 1983.
- (1983) Systems, Man and Cybernetics, IEEE Transactions on , Issue.5 , pp. 834-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

2
- 0003272616
- Reinforcement learning in POMDPs via direct gradient ascent
- Baxter, Jonathan and Bartlett, Peter L. Reinforcement learning in POMDPs via direct gradient ascent. In ICML, pp. 41–48, 2000.
- (2000) ICML , pp. 41-48
- Baxter, J.¹ Bartlett, P.L.²

3
- 0003565783
- Athena Scientific
- Bertsekas, Dimitri P. Dynamic programming and optimal control, volume 2. Athena Scientific, 2012.
- (2012) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.P.¹

4
- 79951481923
- Convergent temporal-difference learning with arbitrary smooth function approximation
- Bhatnagar, Shalabh, Precup, Doina, Silver, David, Sutton, Richard S, Maei, Hamid R, and Szepesvári, Csaba. Convergent temporal-difference learning with arbitrary smooth function approximation. In Advances in Neural Information Processing Systems, pp. 1204–1212, 2009.
- (2009) Advances in Neural Information Processing Systems , pp. 1204-1212
- Bhatnagar, S.¹ Precup, D.² Silver, D.³ Sutton, R.S.⁴ Maei, H.R.⁵ Szepesvári, C.⁶

5
- 84897694817
- Variance reduction techniques for gradient estimates in reinforcement learning
- Greensmith, Evan, Bartlett, Peter L, and Baxter, Jonathan. Variance reduction techniques for gradient estimates in reinforcement learning. The Journal of Machine Learning Research, 5:1471–1530, 2004.
- (2004) The Journal of Machine Learning Research , vol.5 , pp. 1471-1530
- Greensmith, E.¹ Bartlett, P.L.² Baxter, J.³

6
- 79958779459
- Reinforcement learning in feedback control
- Hafner, Roland and Riedmiller, Martin. Reinforcement learning in feedback control. Machine learning, 84 (1-2):137–169, 2011.
- (2011) Machine Learning , vol.84 , Issue.1-2 , pp. 137-169
- Hafner, R.¹ Riedmiller, M.²

7
- 85056837310
- arXiv preprint
- Heess, Nicolas, Wayne, Greg, Silver, David, Lillicrap, Timothy, Tassa, Yuval, and Erez, Tom. Learning continuous control policies by stochastic value gradients. arXiv preprint arXiv:1510.09142, 2015.
- (2015) Learning Continuous Control Policies by Stochastic Value Gradients
- Heess, N.¹ Wayne, G.² Silver, D.³ Lillicrap, T.⁴ Tassa, Y.⁵ Erez, T.⁶

8
- 0004129335
- Hull, Clark. Principles of behavior. 1943.
- (1943) Principles of Behavior
- Hull, C.¹

9
- 33646243319
- A natural policy gradient
- Kakade, Sham. A natural policy gradient. In NIPS, volume 14, pp. 1531–1538, 2001a.
- (2001) NIPS , vol.14 , pp. 1531-1538
- Kakade, S.¹

10
- 84943252297
- Optimizing average reward using discounted rewards
- Springer
- Kakade, Sham. Optimizing average reward using discounted rewards. In Computational Learning Theory, pp. 605–615. Springer, 2001b.
- (2001) Computational Learning Theory , pp. 605-615
- Kakade, S.¹

11
- 0008336447
- An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function
- Kimura, Hajime and Kobayashi, Shigenobu. An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In ICML, pp. 278–286, 1998.
- (1998) ICML , pp. 278-286
- Kimura, H.¹ Kobayashi, S.²

12
- 4043069840
- On actor-critic algorithms
- Konda, Vijay R and Tsitsiklis, John N. On actor-critic algorithms. SIAM journal on Control and Optimization, 42(4):1143–1166, 2003.
- (2003) SIAM Journal on Control and Optimization , vol.42 , Issue.4 , pp. 1143-1166
- Konda, V.R.¹ Tsitsiklis, J.N.²

13
- 84965135289
- arXiv preprint
- Lillicrap, Timothy P, Hunt, Jonathan J, Pritzel, Alexander, Heess, Nicolas, Erez, Tom, Tassa, Yuval, Silver, David, and Wierstra, Daan. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- (2015) Continuous Control with Deep Reinforcement Learning
- Lillicrap, T.P.¹ Hunt, J.J.² Pritzel, A.³ Heess, N.⁴ Erez, T.⁵ Tassa, Y.⁶ Silver, D.⁷ Wierstra, D.⁸

14
- 0037288469
- Approximate gradient methods in policy-space optimization of markov reward processes
- Marbach, Peter and Tsitsiklis, John N. Approximate gradient methods in policy-space optimization of markov reward processes. Discrete Event Dynamic Systems, 13(1-2):111–148, 2003.
- (2003) Discrete Event Dynamic Systems , vol.13 , Issue.1-2 , pp. 111-148
- Marbach, P.¹ Tsitsiklis, J.N.²

15
- 84937350040
- Steps toward artificial intelligence
- Minsky, Marvin. Steps toward artificial intelligence. Proceedings of the IRE, 49(1):8–30, 1961.
- (1961) Proceedings of the IRE , vol.49 , Issue.1 , pp. 8-30
- Minsky, M.¹

16
- 0141596576
- Policy invariance under reward transformations: Theory and application to reward shaping
- Ng, Andrew Y, Harada, Daishi, and Russell, Stuart. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, pp. 278–287, 1999.
- (1999) ICML , vol.99 , pp. 278-287
- Ng, A.Y.¹ Harada, D.² Russell, S.³

17
- 40649106649
- Natural actor-critic
- Peters, Jan and Schaal, Stefan. Natural actor-critic. Neurocomputing, 71(7):1180–1190, 2008.
- (2008) Neurocomputing , vol.71 , Issue.7 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

18
- 84965149509
- arXiv preprint
- Schulman, John, Levine, Sergey, Moritz, Philipp, Jordan, Michael I, and Abbeel, Pieter. Trust region policy optimization. arXiv preprint arXiv:1502.05477, 2015.
- (2015) Trust Region Policy Optimization
- Schulman, J.¹ Levine, S.² Moritz, P.³ Jordan, M.I.⁴ Abbeel, P.⁵

19
- 0004102479
- MIT Press
- Sutton, Richard S and Barto, Andrew G. Introduction to reinforcement learning. MIT Press, 1998.
- (1998) Introduction to Reinforcement Learning
- Sutton, R.S.¹ Barto, A.G.²

20
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Citeseer
- Sutton, Richard S, McAllester, David A, Singh, Satinder P, and Mansour, Yishay. Policy gradient methods for reinforcement learning with function approximation. In NIPS, volume 99, pp. 1057–1063. Citeseer, 1999.
- (1999) NIPS , vol.99 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.P.³ Mansour, Y.⁴

21
- 85035116867
- Bias in natural actor-critic algorithms
- Thomas, Philip. Bias in natural actor-critic algorithms. In Proceedings of The 31st International Conference on Machine Learning, pp. 441–448, 2014.
- (2014) Proceedings of the 31st International Conference on Machine Learning , pp. 441-448
- Thomas, P.¹

22
- 84872292044
- MujoCo: A physics engine for model-based control
- Todorov, Emanuel, Erez, Tom, and Tassa, Yuval. Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 5026–5033. IEEE, 2012.
- (2012) Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on , pp. 5026-5033
- Todorov, E.¹ Erez, T.² Tassa, Y.³

23
- 71749106087
- Real-time reinforcement learning by sequential actor–critics and experience replay
- Wawrzynski, ´ Paweł. Real-time reinforcement learning by sequential actor–critics and experience replay. Neural Networks, 22(10):1484–1497, 2009.
- (2009) Neural Networks , vol.22 , Issue.10 , pp. 1484-1497
- Wawrzynski, P.¹

24
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Williams, Ronald J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
- Williams, R.J.¹

25
- 0003982971
- Springer New York
- Wright, Stephen J and Nocedal, Jorge. Numerical optimization. Springer New York, 1999.
- (1999) Numerical Optimization
- Wright, S.J.¹ Nocedal, J.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.