메뉴 건너뛰기




Volumn , Issue , 2018, Pages

Variance reduction for policy gradient with action-dependent factorized baselines

Author keywords

[No Author keywords available]

Indexed keywords

DEEP LEARNING; GRADIENT METHODS; MACHINE LEARNING; MULTI AGENT SYSTEMS; STOCHASTIC SYSTEMS;

EID: 85083951478     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (138)

References (29)
  • 4
    • 84897694817 scopus 로고    scopus 로고
    • Variance reduction techniques for gradient estimates in reinforcement learning
    • Nov
    • Evan Greensmith, Peter L Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5(Nov):1471–1530, 2004.
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 1471-1530
    • Greensmith, E.1    Bartlett, P.L.2    Baxter, J.3
  • 13
    • 85041963017 scopus 로고    scopus 로고
    • Guided policy search as approximate mirror descent
    • William Montgomery and Sergey Levine. Guided policy search as approximate mirror descent. In NIPS, 2016.
    • (2016) NIPS
    • Montgomery, W.1    Levine, S.2
  • 14
    • 84965182099 scopus 로고    scopus 로고
    • Interactive control of diverse complex characters with neural networks
    • Igor Mordatch, Kendall Lowrey, Galen Andrew, Zoran Popovic, and Emanuel Todorov. Interactive Control of Diverse Complex Characters with Neural Networks. In NIPS, 2015.
    • (2015) NIPS
    • Mordatch, I.1    Lowrey, K.2    Andrew, G.3    Popovic, Z.4    Todorov, E.5
  • 15
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • Jan Peters and Stefan Schaal. Natural actor-critic. Neurocomputing, 71(7):1180–1190, 2008.
    • (2008) Neurocomputing , vol.71 , Issue.7 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 16
    • 77953218689 scopus 로고    scopus 로고
    • Random features for large-scale kernel machines
    • Ali Rahimi and Benjamin Recht. Random Features for Large-Scale Kernel Machines. In NIPS, 2007.
    • (2007) NIPS
    • Rahimi, A.1    Recht, B.2
  • 17
    • 85049877180 scopus 로고    scopus 로고
    • Learning complex dexterous manipulation with deep reinforcement learning and demonstrations
    • abs/1709.10087
    • Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, John Schulman, Emanuel Todorov, and Sergey Levine. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. CoRR, abs/1709.10087, 2017a.
    • (2017) CoRR
    • Rajeswaran, A.1    Kumar, V.2    Gupta, A.3    Schulman, J.4    Todorov, E.5    Levine, S.6
  • 18
    • 85044996392 scopus 로고    scopus 로고
    • Towards generalization and simplicity in continuous control
    • Aravind Rajeswaran, Kendall Lowrey, Emanuel Todorov, and Sham Kakade. Towards Generalization and Simplicity in Continuous Control. In NIPS, 2017b.
    • (2017) NIPS
    • Rajeswaran, A.1    Lowrey, K.2    Todorov, E.3    Kakade, S.4
  • 25
    • 28044474086 scopus 로고    scopus 로고
    • From task parameters to motor synergies: A hierarchical framework for approximately optimal control of redundant manipulators
    • Emanuel Todorov, Weiwei Li, and Xiuchuan Pan. From task parameters to motor synergies: A hierarchical framework for approximately optimal control of redundant manipulators. Journal of Field Robotics, 22(11):691–710, 2005.
    • (2005) Journal of Field Robotics , vol.22 , Issue.11 , pp. 691-710
    • Todorov, E.1    Li, W.2    Pan, X.3
  • 29
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.