메뉴 건너뛰기




Volumn , Issue , 2009, Pages 223-231

New inference strategies for solving Markov Decision Processes using reversible jump MCMC

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; MONTE CARLO METHODS;

EID: 78751705157     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (21)

References (20)
  • 1
    • 33749242151 scopus 로고    scopus 로고
    • Planning by probabilistic inference
    • H. Attias. Planning by probabilistic inference. In UAI, 2003.
    • (2003) UAI
    • Attias, H.1
  • 3
    • 0346982426 scopus 로고    scopus 로고
    • Using em for reinforcement learning
    • P. Dayan and G. Hinton. Using EM for reinforcement learning. Neural Computation, 9:271-278, 1997.
    • (1997) Neural Computation , vol.9 , pp. 271-278
    • Dayan, P.1    Hinton, G.2
  • 5
    • 0141567816 scopus 로고    scopus 로고
    • Marginal maximum a posteriori estimation using Markov chain Monte Carlo
    • DOI 10.1023/A:1013172322619
    • A. Doucet, S. Godsill, and C. Robert. Marginal maximum a posteriori estimation using Markov chain Monte Carlo. Statistics and Computing, 12(1):77-84, 2002. (Pubitemid 37132839)
    • (2002) Statistics and Computing , vol.12 , Issue.1 , pp. 77-84
    • Doucet, A.1    Godsill, S.J.2    Robert, C.P.3
  • 6
    • 77956889087 scopus 로고
    • Reversible jump Markov Chain Monte Carlo computation and Bayesian model determination
    • P. Green. Reversible jump Markov Chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4):711-732, 1995.
    • (1995) Biometrika , vol.82 , Issue.4 , pp. 711-732
    • Green, P.1
  • 7
    • 70350090880 scopus 로고    scopus 로고
    • Bayesian policy learning with trans-dimensional MCMC
    • M. Hoffman, A. Doucet, N. de Freitas, and A. Jasra. Bayesian policy learning with trans-dimensional MCMC. In NIPS, 2007a.
    • (2007) NIPS
    • Hoffman, M.1    Doucet, A.2    De Freitas, N.3    Jasra, A.4
  • 9
    • 84867572211 scopus 로고    scopus 로고
    • An expectation maximization algorithm for continuous Markov Decision Processes with arbitrary reward
    • M. Hoffman, N. de Freitas, A. Doucet, and J. Peters. An expectation maximization algorithm for continuous Markov Decision Processes with arbitrary reward. In AI-STATS, 2009.
    • (2009) AI-STATS
    • Hoffman, M.1    De Freitas, N.2    Doucet, A.3    Peters, J.4
  • 10
    • 80053157514 scopus 로고    scopus 로고
    • Policy search for motor primitives in robotics
    • P. Müller. Simulation based optimal design. In Bayesian Statistics 6
    • J. Kober and J. Peters. Policy search for motor primitives in robotics. In NIPS, 2008. P. Müller. Simulation based optimal design. In Bayesian Statistics 6, 1998.
    • (1998) NIPS 2008
    • Kober, J.1    Peters, J.2
  • 11
    • 4944254628 scopus 로고    scopus 로고
    • Optimal bayesian design by inhomogeneous Markov chain simulation
    • DOI 10.1198/016214504000001123
    • P.Müller, B. Sansó, and M. de Iorio. Optimal Bayesian design by inhomogeneous Markov chain simulation. Journal of the American Statistical Association, 99: 788-798, 2004. (Pubitemid 39332860)
    • (2004) Journal of the American Statistical Association , vol.99 , Issue.467 , pp. 788-798
    • Muller, P.1    Sanso, B.2    De Iorio, M.3
  • 12
    • 0141819580 scopus 로고    scopus 로고
    • PEGASUS: A policy search method for large MDPs and POMDPs
    • A. Ng and M. Jordan. PEGASUS: A policy search method for large MDPs and POMDPs. In UAI, pages 406-415, 2000.
    • (2000) UAI , pp. 406-415
    • Ng, A.1    Jordan, M.2
  • 13
    • 2442627902 scopus 로고    scopus 로고
    • Noncentered parameterisations for hierarchical models and data augmentation
    • O. Papaspiliopoulos, G. Roberts, and M. Sköld. Noncentered parameterisations for hierarchical models and data augmentation. Bayesian Statistics, 7, 2003.
    • (2003) Bayesian Statistics , vol.7
    • Papaspiliopoulos, O.1    Roberts, G.2    Sköld, M.3
  • 14
    • 36348971133 scopus 로고    scopus 로고
    • Reinforcement learning for operational space control
    • J. Peters and S. Schaal. Reinforcement learning for operational space control. In ICRA, 2007.
    • (2007) ICRA
    • Peters, J.1    Schaal, S.2
  • 16
    • 33749234798 scopus 로고    scopus 로고
    • Probabilistic inference for solving discrete and continuous state Markov decision processes
    • M. Toussaint and A. Storkey. Probabilistic inference for solving discrete and continuous state Markov Decision Processes. In ICML, 2006.
    • (2006) ICML
    • Toussaint, M.1    Storkey, A.2
  • 18
    • 67349102783 scopus 로고    scopus 로고
    • Hierarchical POMDP controller optimization by likelihood maximization
    • M. Toussaint, L. Charlin, and P. Poupart. Hierarchical POMDP controller optimization by likelihood maximization. In UAI, pages 562-570, 2008.
    • (2008) UAI , pp. 562-570
    • Toussaint, M.1    Charlin, L.2    Poupart, P.3
  • 19
    • 34250613841 scopus 로고    scopus 로고
    • Planning and acting in uncertain environments using probabilistic inference
    • D. Verma and R. Rao. Planning and acting in uncertain environments using probabilistic inference. In IROS, 2006.
    • (2006) IROS
    • Verma, D.1    Rao, R.2
  • 20
    • 67049132520 scopus 로고    scopus 로고
    • Planning and moving in dynamic environments: A statistical machine learning approach
    • Sendhoff, Koerner, Sporns, Ritter, and Doya, editors, LNAI. Springer-Verlag
    • S. Vijayakumar, M. Toussaint, G. Petkos, and M. Howard. Planning and moving in dynamic environments: A statistical machine learning approach. In Sendhoff, Koerner, Sporns, Ritter, and Doya, editors, Creating Brain Like Intelligence: From Principles to Complex Intelligent Systems, LNAI-Vol. 5436. Springer-Verlag, 2009.
    • (2009) Creating Brain Like Intelligence: From Principles to Complex Intelligent Systems , vol.5436
    • Vijayakumar, S.1    Toussaint, M.2    Petkos, G.3    Howard, M.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.