SCOPUS 정보 검색 플랫폼

Journal of Machine Learning Research

Volumn 5, Issue , 2009, Pages 232-239

An expectation maximization algorithm for continuous Markov decision processes with arbitrary rewards

(4) Hoffman, Matt a De Freitas, Nando a Doucet, Arnaud a Peters, Jan b

a UNIVERSITY OF BRITISH COLUMBIA (Canada)

b MAX PLANCK INSTITUTE FOR BIOLOGICAL CYBERNETICS (Germany)

Author keywords

[No Author keywords available]

Indexed keywords

ANALYTICAL TRACTABILITY; APPROXIMATION ERRORS; CLOSED FORM SOLUTIONS; EXPECTATION-MAXIMIZATION ALGORITHMS; GAUSSIANS; LINEAR QUADRATIC GAUSSIAN CONTROLLERS; MARKOV DECISION PROCESSES; MIXTURE OF GAUSSIANS; NUMERICAL OPTIMIZATIONS; OPTIMIZATION METHOD; PARAMETERIZED; POLICY OPTIMIZATION; REWARD FUNCTION;

ARTIFICIAL INTELLIGENCE; MARKOV PROCESSES; OPTIMIZATION;

SIGNAL RECONSTRUCTION;

EID: 84862277035 PISSN: 15324435 EISSN: 15337928 Source Type: Journal
DOI: None Document Type: Conference Paper

Times cited : (20)

References (21)

1
- 33749242151
- Planning by probabilistic inference
- H. Attias. Planning by probabilistic inference. In UAI, 2003.
- (2003) UAI
- Attias, H.¹

2
- 0013535965
- Infinite-horizon policy-gradient estimation
- J. Baxter and P. Bartlett. Infinite-horizon policy-gradient estimation. JAIR, 15:319-350, 2001.
- (2001) JAIR , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.²

3
- 0003565783
- Athena Scientific
- D. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, 1995.
- (1995) Dynamic Programming and Optimal Control
- Bertsekas, D.¹

4
- 0000732463
- A limited memory algorithm for bound constrained optimization
- R. Byrd, P. Lu, J. Nocedal, and C. Zhu. A Limited Memory Algorithm for Bound Constrained Optimization. SIAM Journal on Scientific Computing, 1995.
- (1995) SIAM Journal on Scientific Computing
- Byrd, R.¹ Lu, P.² Nocedal, J.³ Zhu, C.⁴

5
- 0346982426
- Using EM for reinforcement learning
- P. Dayan and G. Hinton. Using EM for reinforcement learning. Neural Computation, 9:271-278, 1997.
- (1997) Neural Computation , vol.9 , pp. 271-278
- Dayan, P.¹ Hinton, G.²

6
- 80053145677
- On solving integral equations using Markov Chain Monte Carlo methods
- A. Doucet and V. Tadic. On solving integral equations using Markov Chain Monte Carlo methods. Technical Report CUED-F-INFENG 444, Cambridge University Engineering Department, 2004.
- (2004) Technical Report CUED-F-INFENG 444, Cambridge University Engineering Department
- Doucet, A.¹ Tadic, V.²

7
- 70350090880
- Bayesian policy learning with trans-dimensional MCMC
- M. Hoffman, A. Doucet, N. de Freitas, and A. Jasra. Bayesian policy learning with trans-dimensional MCMC. In NIPS, 2008.
- (2008) NIPS
- Hoffman, M.¹ Doucet, A.² De Freitas, N.³ Jasra, A.⁴

8
- 33749264449
- Fast particle smoothing: If I had a million particles
- M. Klaas, M. Briers, N. de Freitas, A. Doucet, and S. Maskell. Fast particle smoothing: If I had a million particles. In ICML, 2006.
- (2006) ICML
- Klaas, M.¹ Briers, M.² De Freitas, N.³ Doucet, A.⁴ Maskell, S.⁵

9
- 0002414768
- A quasi-Newton acceleration of the EM algorithm
- K. Lange. A quasi-Newton acceleration of the EM algorithm. Statistica Sinica, 5(1):1-18, 1995.
- (1995) Statistica Sinica , vol.5 , Issue.1 , pp. 1-18
- Lange, K.¹

10
- 3242739038
- CRC Press
- F. Lewis, D. Dawson, and C. Abdallah. Robot Manipulator Control: Theory and Practice. CRC Press, 2004.
- (2004) Robot Manipulator Control: Theory and Practice
- Lewis, F.¹ Dawson, D.² Abdallah, C.³

11
- 0004268529
- PrenticeHall
- J. Maciejowski. Predictive control with constraints. PrenticeHall, 2002.
- (2002) Predictive Control with Constraints
- Maciejowski, J.¹

12
- 0141819580
- PEGASUS: A policy search method for large MDPs and POMDPs
- A. Ng and M. Jordan. PEGASUS: A policy search method for large MDPs and POMDPs. In UAI, 2000.
- (2000) UAI
- Ng, A.¹ Jordan, M.²

13
- 40649106649
- J. Peters and S. Schaal. Natural Actor-Critic. Neurocomputing, 71(7-9):1180-1190, 2008.
- (2008) Natural Actor-Critic. Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

14
- 36348971133
- Reinforcement learning for operational space control
- J. Peters and S. Schaal. Reinforcement learning for operational space control. In ICRA, 2007.
- (2007) ICRA
- Peters, J.¹ Schaal, S.²

15
- 33750724397
- Pointbased value iteration for continuous POMDPs
- M. Porta, N. Vlassis, M. Spaan, and P. Poupart. Pointbased value iteration for continuous POMDPs. JMLR, 7:2329-2367, 2006.
- (2006) JMLR , vol.7 , pp. 2329-2367
- Porta, M.¹ Vlassis, N.² Spaan, M.³ Poupart, P.⁴

16
- 0015658957
- The optimal control of partially observable Markov processes over a finite horizon
- R. Smallwood and E. Sondik. The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 1973.
- (1973) Operations Research
- Smallwood, R.¹ Sondik, E.²

17
- 33646380511
- Sparse Gaussian processes using pseudo-inputs
- E. Snelson and Z. Ghahramani. Sparse Gaussian processes using pseudo-inputs. In NIPS, 2006.
- (2006) NIPS
- Snelson, E.¹ Ghahramani, Z.²

18
- 84898978676
- Monte Carlo POMDPs
- S. Thrun. Monte Carlo POMDPs. In NIPS, 2000.
- (2000) NIPS
- Thrun, S.¹

19
- 33749234798
- Probabilistic inference for solving discrete and continuous state Markov Decision Processes
- M. Toussaint and A. Storkey. Probabilistic inference for solving discrete and continuous state Markov Decision Processes. In ICML, 2006.
- (2006) ICML
- Toussaint, M.¹ Storkey, A.²

20
- 51349153274
- Probabilistic inference for solving (PO)MDPs
- School of Informatics
- M. Toussaint, S. Harmeling, and A. Storkey. Probabilistic inference for solving (PO)MDPs. Technical Report EDI-INF-RR-0934, University of Edinburgh, School of Informatics, 2006.
- (2006) Technical Report EDI-INF-RR-0934, University of Edinburgh
- Toussaint, M.¹ Harmeling, S.² Storkey, A.³

21
- 34250613841
- Planning and acting in uncertain environments using probabilistic inference
- D. Verma and R. P. N. Rao. Planning and acting in uncertain environments using probabilistic inference. In IROS, 2006.
- (2006) IROS
- Verma, D.¹ Rao, R.P.N.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.