메뉴 건너뛰기




Volumn , Issue , 2016, Pages

High-dimensional continuous control using generalized advantage estimation

Author keywords

[No Author keywords available]

Indexed keywords

BIPED LOCOMOTION; GRADIENT METHODS; THREE DIMENSIONAL COMPUTER GRAPHICS;

EID: 85083954383     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (1174)

References (25)
  • 2
    • 0003272616 scopus 로고    scopus 로고
    • Reinforcement learning in POMDPs via direct gradient ascent
    • Baxter, Jonathan and Bartlett, Peter L. Reinforcement learning in POMDPs via direct gradient ascent. In ICML, pp. 41–48, 2000.
    • (2000) ICML , pp. 41-48
    • Baxter, J.1    Bartlett, P.L.2
  • 5
    • 84897694817 scopus 로고    scopus 로고
    • Variance reduction techniques for gradient estimates in reinforcement learning
    • Greensmith, Evan, Bartlett, Peter L, and Baxter, Jonathan. Variance reduction techniques for gradient estimates in reinforcement learning. The Journal of Machine Learning Research, 5:1471–1530, 2004.
    • (2004) The Journal of Machine Learning Research , vol.5 , pp. 1471-1530
    • Greensmith, E.1    Bartlett, P.L.2    Baxter, J.3
  • 6
    • 79958779459 scopus 로고    scopus 로고
    • Reinforcement learning in feedback control
    • Hafner, Roland and Riedmiller, Martin. Reinforcement learning in feedback control. Machine learning, 84 (1-2):137–169, 2011.
    • (2011) Machine Learning , vol.84 , Issue.1-2 , pp. 137-169
    • Hafner, R.1    Riedmiller, M.2
  • 9
    • 33646243319 scopus 로고    scopus 로고
    • A natural policy gradient
    • Kakade, Sham. A natural policy gradient. In NIPS, volume 14, pp. 1531–1538, 2001a.
    • (2001) NIPS , vol.14 , pp. 1531-1538
    • Kakade, S.1
  • 10
    • 84943252297 scopus 로고    scopus 로고
    • Optimizing average reward using discounted rewards
    • Springer
    • Kakade, Sham. Optimizing average reward using discounted rewards. In Computational Learning Theory, pp. 605–615. Springer, 2001b.
    • (2001) Computational Learning Theory , pp. 605-615
    • Kakade, S.1
  • 11
    • 0008336447 scopus 로고    scopus 로고
    • An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function
    • Kimura, Hajime and Kobayashi, Shigenobu. An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In ICML, pp. 278–286, 1998.
    • (1998) ICML , pp. 278-286
    • Kimura, H.1    Kobayashi, S.2
  • 14
    • 0037288469 scopus 로고    scopus 로고
    • Approximate gradient methods in policy-space optimization of markov reward processes
    • Marbach, Peter and Tsitsiklis, John N. Approximate gradient methods in policy-space optimization of markov reward processes. Discrete Event Dynamic Systems, 13(1-2):111–148, 2003.
    • (2003) Discrete Event Dynamic Systems , vol.13 , Issue.1-2 , pp. 111-148
    • Marbach, P.1    Tsitsiklis, J.N.2
  • 15
    • 84937350040 scopus 로고
    • Steps toward artificial intelligence
    • Minsky, Marvin. Steps toward artificial intelligence. Proceedings of the IRE, 49(1):8–30, 1961.
    • (1961) Proceedings of the IRE , vol.49 , Issue.1 , pp. 8-30
    • Minsky, M.1
  • 16
    • 0141596576 scopus 로고    scopus 로고
    • Policy invariance under reward transformations: Theory and application to reward shaping
    • Ng, Andrew Y, Harada, Daishi, and Russell, Stuart. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, pp. 278–287, 1999.
    • (1999) ICML , vol.99 , pp. 278-287
    • Ng, A.Y.1    Harada, D.2    Russell, S.3
  • 17
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • Peters, Jan and Schaal, Stefan. Natural actor-critic. Neurocomputing, 71(7):1180–1190, 2008.
    • (2008) Neurocomputing , vol.71 , Issue.7 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 20
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • Citeseer
    • Sutton, Richard S, McAllester, David A, Singh, Satinder P, and Mansour, Yishay. Policy gradient methods for reinforcement learning with function approximation. In NIPS, volume 99, pp. 1057–1063. Citeseer, 1999.
    • (1999) NIPS , vol.99 , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.A.2    Singh, S.P.3    Mansour, Y.4
  • 23
    • 71749106087 scopus 로고    scopus 로고
    • Real-time reinforcement learning by sequential actor–critics and experience replay
    • Wawrzynski, ´ Paweł. Real-time reinforcement learning by sequential actor–critics and experience replay. Neural Networks, 22(10):1484–1497, 2009.
    • (2009) Neural Networks , vol.22 , Issue.10 , pp. 1484-1497
    • Wawrzynski, P.1
  • 24
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams, Ronald J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.