메뉴 건너뛰기




Volumn 4, Issue , 2012, Pages 2717-2725

A unifying perspective of parametric policy search methods for Markov Decision Processes

Author keywords

[No Author keywords available]

Indexed keywords

EXPECTATION-MAXIMISATION; MARKOV DECISION PROCESSES; NATURAL GRADIENT; NEWTON'S METHODS; OPTIMISATION METHOD; OPTIMISATIONS; PARAMETER SPACES; ROBUSTNESS PROPERTIES;

EID: 84877731836     PISSN: 10495258     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (26)

References (31)
  • 1
    • 0000396062 scopus 로고    scopus 로고
    • Natural gradient works efficiently in learning
    • S. Amari. Natural Gradient Works Efficiently in Learning. Neural Computation, 10:251-276, 1998.
    • (1998) Neural Computation , vol.10 , pp. 251-276
    • Amari, S.1
  • 3
    • 84858765598 scopus 로고    scopus 로고
    • Covariant policy search
    • J. Bagnell and J. Schneider. Covariant Policy Search. IJCAI, 18:1019-1024, 2003.
    • (2003) IJCAI , vol.18 , pp. 1019-1024
    • Bagnell, J.1    Schneider, J.2
  • 10
    • 0346982426 scopus 로고    scopus 로고
    • Using expectation-maximization for reinforcement learning
    • P. Dayan and G. E. Hinton. Using Expectation-Maximization for Reinforcement Learning. Neural Computation, 9:271-278, 1997.
    • (1997) Neural Computation , vol.9 , pp. 271-278
    • Dayan, P.1    Hinton, G.E.2
  • 13
    • 80053139999 scopus 로고    scopus 로고
    • Efficient inference for Markov control problems
    • T. Furmston and D. Barber. Efficient Inference for Markov Control Problems. UAI, 29:221-229, 2011.
    • (2011) UAI , vol.29 , pp. 221-229
    • Furmston, T.1    Barber, D.2
  • 14
    • 84976859194 scopus 로고
    • Likelihood ratio gradient estimation for stochastic systems
    • P.W. Glynn. Likelihood Ratio Gradient Estimation for Stochastic Systems. Communications of the ACM, 33:97-84, 1990.
    • (1990) Communications of the ACM , vol.33 , pp. 97-84
    • Glynn, P.W.1
  • 15
    • 84897694817 scopus 로고    scopus 로고
    • Variance reduction techniques for gradient based estimates in reinforcement learning
    • E. Greensmith, P. Bartlett, and J. Baxter. Variance Reduction Techniques For Gradient Based Estimates in Reinforcement Learning. Journal of Machine Learning Research, 5:1471-1530, 2004.
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 1471-1530
    • Greensmith, E.1    Bartlett, P.2    Baxter, J.3
  • 16
    • 84898930479 scopus 로고    scopus 로고
    • A natural policy gradient
    • S. Kakade. A Natural Policy Gradient. NIPS, 14:1531-1538, 2002.
    • (2002) NIPS , vol.14 , pp. 1531-1538
    • Kakade, S.1
  • 18
    • 78049390740 scopus 로고    scopus 로고
    • Policy search for motor primitives in robotics
    • J. Kober and J. Peters. Policy Search for Motor Primitives in Robotics. Machine Learning, 84(1-2):171-203, 2011.
    • (2011) Machine Learning , vol.84 , Issue.1-2 , pp. 171-203
    • Kober, J.1    Peters, J.2
  • 21
    • 0035249254 scopus 로고    scopus 로고
    • Simulation-based optimisation of Markov reward processes
    • P. Marbach and J. Tsitsiklis. Simulation-Based Optimisation of Markov Reward Processes. IEEE Transactions on Automatic Control, 46(2):191-209, 2001.
    • (2001) IEEE Transactions on Automatic Control , vol.46 , Issue.2 , pp. 191-209
    • Marbach, P.1    Tsitsiklis, J.2
  • 22
    • 33646430192 scopus 로고    scopus 로고
    • Learning finite-state controllers for partially observable environments
    • N. Meuleau, L. Peshkin, K. Kim, and L. Kaelbling. Learning Finite-State Controllers for Partially Observable Environments. UAI, 15:427-436, 1999.
    • (1999) UAI , vol.15 , pp. 427-436
    • Meuleau, N.1    Peshkin, L.2    Kim, K.3    Kaelbling, L.4
  • 24
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • J. Peters and S. Schaal. Natural Actor-Critic. Neurocomputing, 71(7-9):1180-1190, 2008.
    • (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 26
    • 84864064043 scopus 로고    scopus 로고
    • Natural actor-critic for road traffic optimisation
    • S. Richter, D. Aberdeen, and J. Yu. Natural Actor-Critic for Road Traffic Optimisation. NIPS, 19:1169-1176, 2007.
    • (2007) NIPS , vol.19 , pp. 1169-1176
    • Richter, S.1    Aberdeen, D.2    Yu, J.3
  • 27
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy Gradient Methods for Reinforcement Learning with Function Approximation. NIPS, 13:1057-1063, 2000.
    • (2000) NIPS , vol.13 , pp. 1057-1063
    • Sutton, R.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 29
    • 70349327392 scopus 로고    scopus 로고
    • Learning model-free robot control by a Monte Carlo em algorithm
    • N. Vlassis, M. Toussaint, G. Kontes, and S. Piperidis. Learning Model-Free Robot Control by a Monte Carlo EM Algorithm. Autonomous Robots, 27(2):123-130, 2009.
    • (2009) Autonomous Robots , vol.27 , Issue.2 , pp. 123-130
    • Vlassis, N.1    Toussaint, M.2    Kontes, G.3    Piperidis, S.4
  • 30
    • 21444437925 scopus 로고    scopus 로고
    • The optimal reward baseline for gradient based reinforcement learning
    • L. Weaver and N. Tao. The Optimal Reward Baseline for Gradient Based Reinforcement Learning. UAI, 17(29):538-545, 2001.
    • (2001) UAI , vol.17 , Issue.29 , pp. 538-545
    • Weaver, L.1    Tao, N.2
  • 31
    • 0000337576 scopus 로고
    • Simple statistical gradient following algorithms for connectionist reinforcement learning
    • R. Williams. Simple Statistical Gradient Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8:229-256, 1992.
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.