메뉴 건너뛰기




Volumn 5, Issue , 2009, Pages 232-239

An expectation maximization algorithm for continuous Markov decision processes with arbitrary rewards

Author keywords

[No Author keywords available]

Indexed keywords

ANALYTICAL TRACTABILITY; APPROXIMATION ERRORS; CLOSED FORM SOLUTIONS; EXPECTATION-MAXIMIZATION ALGORITHMS; GAUSSIANS; LINEAR QUADRATIC GAUSSIAN CONTROLLERS; MARKOV DECISION PROCESSES; MIXTURE OF GAUSSIANS; NUMERICAL OPTIMIZATIONS; OPTIMIZATION METHOD; PARAMETERIZED; POLICY OPTIMIZATION; REWARD FUNCTION;

EID: 84862277035     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Conference Paper
Times cited : (20)

References (21)
  • 1
    • 33749242151 scopus 로고    scopus 로고
    • Planning by probabilistic inference
    • H. Attias. Planning by probabilistic inference. In UAI, 2003.
    • (2003) UAI
    • Attias, H.1
  • 2
    • 0013535965 scopus 로고    scopus 로고
    • Infinite-horizon policy-gradient estimation
    • J. Baxter and P. Bartlett. Infinite-horizon policy-gradient estimation. JAIR, 15:319-350, 2001.
    • (2001) JAIR , vol.15 , pp. 319-350
    • Baxter, J.1    Bartlett, P.2
  • 5
    • 0346982426 scopus 로고    scopus 로고
    • Using EM for reinforcement learning
    • P. Dayan and G. Hinton. Using EM for reinforcement learning. Neural Computation, 9:271-278, 1997.
    • (1997) Neural Computation , vol.9 , pp. 271-278
    • Dayan, P.1    Hinton, G.2
  • 7
    • 70350090880 scopus 로고    scopus 로고
    • Bayesian policy learning with trans-dimensional MCMC
    • M. Hoffman, A. Doucet, N. de Freitas, and A. Jasra. Bayesian policy learning with trans-dimensional MCMC. In NIPS, 2008.
    • (2008) NIPS
    • Hoffman, M.1    Doucet, A.2    De Freitas, N.3    Jasra, A.4
  • 9
    • 0002414768 scopus 로고
    • A quasi-Newton acceleration of the EM algorithm
    • K. Lange. A quasi-Newton acceleration of the EM algorithm. Statistica Sinica, 5(1):1-18, 1995.
    • (1995) Statistica Sinica , vol.5 , Issue.1 , pp. 1-18
    • Lange, K.1
  • 12
    • 0141819580 scopus 로고    scopus 로고
    • PEGASUS: A policy search method for large MDPs and POMDPs
    • A. Ng and M. Jordan. PEGASUS: A policy search method for large MDPs and POMDPs. In UAI, 2000.
    • (2000) UAI
    • Ng, A.1    Jordan, M.2
  • 14
    • 36348971133 scopus 로고    scopus 로고
    • Reinforcement learning for operational space control
    • J. Peters and S. Schaal. Reinforcement learning for operational space control. In ICRA, 2007.
    • (2007) ICRA
    • Peters, J.1    Schaal, S.2
  • 15
    • 33750724397 scopus 로고    scopus 로고
    • Pointbased value iteration for continuous POMDPs
    • M. Porta, N. Vlassis, M. Spaan, and P. Poupart. Pointbased value iteration for continuous POMDPs. JMLR, 7:2329-2367, 2006.
    • (2006) JMLR , vol.7 , pp. 2329-2367
    • Porta, M.1    Vlassis, N.2    Spaan, M.3    Poupart, P.4
  • 16
    • 0015658957 scopus 로고
    • The optimal control of partially observable Markov processes over a finite horizon
    • R. Smallwood and E. Sondik. The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 1973.
    • (1973) Operations Research
    • Smallwood, R.1    Sondik, E.2
  • 17
    • 33646380511 scopus 로고    scopus 로고
    • Sparse Gaussian processes using pseudo-inputs
    • E. Snelson and Z. Ghahramani. Sparse Gaussian processes using pseudo-inputs. In NIPS, 2006.
    • (2006) NIPS
    • Snelson, E.1    Ghahramani, Z.2
  • 18
    • 84898978676 scopus 로고    scopus 로고
    • Monte Carlo POMDPs
    • S. Thrun. Monte Carlo POMDPs. In NIPS, 2000.
    • (2000) NIPS
    • Thrun, S.1
  • 19
    • 33749234798 scopus 로고    scopus 로고
    • Probabilistic inference for solving discrete and continuous state Markov Decision Processes
    • M. Toussaint and A. Storkey. Probabilistic inference for solving discrete and continuous state Markov Decision Processes. In ICML, 2006.
    • (2006) ICML
    • Toussaint, M.1    Storkey, A.2
  • 21
    • 34250613841 scopus 로고    scopus 로고
    • Planning and acting in uncertain environments using probabilistic inference
    • D. Verma and R. P. N. Rao. Planning and acting in uncertain environments using probabilistic inference. In IROS, 2006.
    • (2006) IROS
    • Verma, D.1    Rao, R.P.N.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.