메뉴 건너뛰기




Volumn 12, Issue , 2011, Pages 1729-1770

Bayesian approach for learning and planning in partially observable markov decision processes

Author keywords

Bayesian inference; Partially observable Markov decision processes; Reinforcement learning

Indexed keywords

APPROXIMATE ALGORITHMS; BAYESIAN APPROACHES; BAYESIAN INFERENCE; EMPIRICAL RESULTS; LEARNING PERFORMANCE; MARKOV DECISION PROCESSES; MODEL ESTIMATES; OPTIMAL SEQUENCE; PARTIAL OBSERVABILITY; PARTIALLY OBSERVABLE MARKOV DECISION PROCESS; THEORETICAL RESULT;

EID: 79960110381     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (148)

References (56)
  • 2
    • 56449090814 scopus 로고    scopus 로고
    • Logarithmic online regret bounds for undiscounted reinforcement learning
    • P. Auer and R. Ortner. Logarithmic online regret bounds for undiscounted reinforcement learning. In Neural Information Processing Systems (NIPS), volume 19, pages 49-56, 2006.
    • (2006) Neural Information Processing Systems (NIPS) , vol.19 , pp. 49-56
    • Auer, P.1    Ortner, R.2
  • 6
    • 0041965975 scopus 로고    scopus 로고
    • R-max - A general polynomial time algorithm for near-optimal reinforcement learning
    • R. I. Brafman and M. Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research (JMLR), 3:213-231, 2003.
    • (2003) Journal of Machine Learning Research (JMLR) , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 12
    • 56449086386 scopus 로고    scopus 로고
    • Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs
    • ACM
    • F. Doshi, J. Pineau, and N. Roy. Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs. In International Conference on Machine Learning, pages 256-263. ACM, 2008.
    • (2008) International Conference on Machine Learning , pp. 256-263
    • Doshi, F.1    Pineau, J.2    Roy, N.3
  • 15
    • 79960116893 scopus 로고    scopus 로고
    • Monte-Carlo algorithms for the improvement of finite-state stochastic controllers: Application to bayes-adaptive Markov decision processes
    • M. Duff. Monte-Carlo algorithms for the improvement of finite-state stochastic controllers: Application to bayes-adaptive Markov decision processes. In International Workshop on Artificial Intelligence and Statistics (AISTATS), 2001.
    • (2001) International Workshop on Artificial Intelligence and Statistics (AISTATS)
    • Duff, M.1
  • 17
  • 19
    • 0011716051 scopus 로고
    • Dual control theory, parts i and ii
    • 1033-1039
    • A. A. Feldbaum. Dual control theory, parts i and ii. Automation and Remote Control, 21:874-880 and 1033-1039, 1961.
    • (1961) Automation and Remote Control , vol.21 , pp. 874-880
    • Feldbaum, A.A.1
  • 20
    • 0033882494 scopus 로고    scopus 로고
    • Survey of adaptive dual control methods
    • DOI 10.1049/ip-cta:20000107
    • N.M. Filatov and H. Unbehauen. Survey of adaptive dual control methods. In IEEE Control Theory and Applications, volume 147, pages 118-128, 2000. (Pubitemid 30563857)
    • (2000) IEE Proceedings: Control Theory and Applications , vol.147 , Issue.1 , pp. 118-128
    • Filatov, N.M.1
  • 23
    • 24344438276 scopus 로고    scopus 로고
    • Adaptive control of nonlinear stochastic systems by particle filtering
    • ThA01-6, Fourth International Conference on Control and Automation
    • A. Greenfield and A. Brockwell. Adaptive control of nonlinear stochastic systems by particle filtering. In International Conference on Control and Automation (ICCA), pages 887-890, 2003. (Pubitemid 41244024)
    • (2003) International Conference on Control and Automation , pp. 887-890
    • Greenfield, A.1    Brockwell, A.2
  • 24
    • 34249761849 scopus 로고
    • Learning bayesian networks: The combination of knowledge and statistical data
    • D. Heckerman, D. Geiger, and D. M. Chickering. Learning bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3):197-243, 1995.
    • (1995) Machine Learning , vol.20 , Issue.3 , pp. 197-243
    • Heckerman, D.1    Geiger, D.2    Chickering, D.M.3
  • 29
    • 0032073263 scopus 로고    scopus 로고
    • Planning and acting in partially observable stochastic domains
    • PII S000437029800023X
    • L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101:99-134, 1998. (Pubitemid 128387390)
    • (1998) Artificial Intelligence , vol.101 , Issue.1-2 , pp. 99-134
    • Kaelbling, L.P.1    Littman, M.L.2    Cassandra, A.R.3
  • 45
    • 0015658957 scopus 로고
    • The optimal control of partially observable Markov processes over a finite horizon
    • Sep/Oct
    • R. D. Smallwood and E. J. Sondik. The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21(5):1071-1088, Sep/Oct 1973.
    • (1973) Operations Research , vol.21 , Issue.5 , pp. 1071-1088
    • Smallwood, R.D.1    Sondik, E.J.2
  • 48
    • 31144472319 scopus 로고    scopus 로고
    • Perseus: Randomized point-based value iteration for POMDPs
    • M. T. J. Spaan and N. Vlassis. Perseus: randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research (JAIR), 24:195-220, 2005. (Pubitemid 43130936)
    • (2005) Journal of Artificial Intelligence Research , vol.24 , pp. 195-220
    • Spaan, M.T.J.1    Vlassis, N.2
  • 53
    • 85162041468 scopus 로고    scopus 로고
    • Optimistic linear programming gives logarithmic regret for irreducible MDPs
    • A. Tewari and P. Bartlett. Optimistic linear programming gives logarithmic regret for irreducible MDPs. In Neural Information Processing Systems (NIPS), volume 20, pages 1505-1512, 2008.
    • (2008) Neural Information Processing Systems (NIPS) , vol.20 , pp. 1505-1512
    • Tewari, A.1    Bartlett, P.2
  • 56
    • 51649096429 scopus 로고
    • Discrete-time bayesian adaptive control problems with complete information
    • O. Zane. Discrete-time bayesian adaptive control problems with complete information. In IEEE Conference on Decision and Control, pages 2748-2749, 1992.
    • (1992) IEEE Conference on Decision and Control , pp. 2748-2749
    • Zane, O.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.