메뉴 건너뛰기




Volumn , Issue , 2011, Pages 33-62

An introduction to fully and partially observable Markov decision processes

Author keywords

[No Author keywords available]

Indexed keywords


EID: 84889585200     PISSN: None     EISSN: None     Source Type: Book    
DOI: 10.4018/978-1-60960-165-2.ch003     Document Type: Chapter
Times cited : (12)

References (66)
  • 1
    • 14344251217 scopus 로고    scopus 로고
    • Apprenticeship learning via inverse reinforcement learning
    • Banff, Alberta
    • Abbeel, P., & Ng, A. (2004). Apprenticeship learning via inverse reinforcement learning, International Conference on Machine Learning (ICML), Banff, Alberta, (pp. 1-8).
    • (2004) International Conference On Machine Learning (ICML) , pp. 1-8
    • Abbeel, P.1    Ng, A.2
  • 4
    • 77954951649 scopus 로고    scopus 로고
    • Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs
    • doi:10.1007/s10458-009-9103-z
    • Amato, C., Bernstein, D. S., & Zilberstein, S. (2010). Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Autonomous Agents and Multi-Agent Systems, 21(3), 293-320. doi:10.1007/s10458-009-9103-z
    • (2010) Autonomous Agents and Multi-Agent Systems , vol.21 , Issue.3 , pp. 293-320
    • Amato, C.1    Bernstein, D.S.2    Zilberstein, S.3
  • 5
    • 0001700171 scopus 로고
    • A Markov decision process
    • doi:10.1512/iumj.1957.6.06038
    • Bellman, R. (1957). A Markov decision process. Indiana University Mathematics Journal, 6(4), 679-684. doi:10.1512/iumj.1957.6.06038
    • (1957) Indiana University Mathematics Journal , vol.6 , Issue.4 , pp. 679-684
    • Bellman, R.1
  • 6
    • 0030242097 scopus 로고    scopus 로고
    • Input/output HMMs for sequence processing
    • doi:10.1109/72.536317
    • Bengio, Y., & Frasconi, P. (1996). Input/output HMMs for sequence processing. IEEE Transactions on Neural Networks, 7(5), 1231-1249. doi:10.1109/72.536317
    • (1996) IEEE Transactions On Neural Networks , vol.7 , Issue.5 , pp. 1231-1249
    • Bengio, Y.1    Frasconi, P.2
  • 9
    • 79955875655 scopus 로고    scopus 로고
    • Inverse reinforcement learning in partially observable domains. [JMLR]
    • Choi, J., & Kim, K.-E. (2011). Inverse reinforcement learning in partially observable domains. [JMLR]. Journal of Machine Learning Research, 12, 691-730.
    • (2011) Journal of Machine Learning Research , vol.12 , pp. 691-730
    • Choi, J.1    Kim, K.-E.2
  • 10
    • 79953864311 scopus 로고    scopus 로고
    • Efficient solutions to factored MDPs with imprecise transition probabilities. [AIJ]
    • doi:10.1016/j.artint.2011.01.001
    • Delgado, K. V., Sanner, S., & Nunes de Barros, L. (2011). Efficient solutions to factored MDPs with imprecise transition probabilities. [AIJ]. Artificial Intelligence, 175(9-10), 1498-1527. doi:10.1016/j.artint.2011.01.001
    • (2011) Artificial Intelligence , vol.175 , Issue.9-10 , pp. 1498-1527
    • Delgado, K.V.1    Sanner, S.2    Nunes de Barros, L.3
  • 14
    • 79956364385 scopus 로고    scopus 로고
    • Non-deterministic policies in Markovian decision processes. [JAIR]
    • Fard, M. M., & Pineau, J. (2011). Non-deterministic policies in Markovian decision processes. [JAIR]. Journal of Artificial Intelligence Research, 40, 1-24.
    • (2011) Journal of Artificial Intelligence Research , vol.40 , pp. 1-24
    • Fard, M.M.1    Pineau, J.2
  • 19
    • 84898987770 scopus 로고    scopus 로고
    • An improved policy iteration algorithm for partially observable MDPs
    • Denver, Colorado: NIPS
    • Hansen, E. A. (1997). An improved policy iteration algorithm for partially observable MDPs. Advances in Neural Information Processing systems (pp. 1015-1021). Denver, Colorado: NIPS.
    • (1997) Advances In Neural Information Processing Systems , pp. 1015-1021
    • Hansen, E.A.1
  • 25
    • 0032073263 scopus 로고    scopus 로고
    • Planning and acting in partially observable stochastic domains
    • doi:10.1016/S0004-3702(98)00023-X
    • Kaelbling, L. P., Littman, M., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99-134. doi:10.1016/S0004-3702(98)00023-X
    • (1998) Artificial Intelligence , vol.101 , pp. 99-134
    • Kaelbling, L.P.1    Littman, M.2    Cassandra, A.R.3
  • 26
    • 84880649215 scopus 로고    scopus 로고
    • A sparse sampling algorithm for near-optimal planning in large Markov decision processes
    • Stockholm, Sweden: IJCAI
    • Kearns, M., Mansour, Y., & Ng, A. Y. (1999). A sparse sampling algorithm for near-optimal planning in large Markov decision processes. International Joint Conferences on Artificial Intelligence (pp. 1324-1331). Stockholm, Sweden: IJCAI.
    • (1999) International Joint Conferences On Artificial Intelligence , pp. 1324-1331
    • Kearns, M.1    Mansour, Y.2    Ng, A.Y.3
  • 31
    • 0036374190 scopus 로고    scopus 로고
    • Nonapproximability results for partially observable Markov decision processes. [JAIR]
    • Lusena, C., Goldsmith, J., & Mundhenk, M. (2001). Nonapproximability results for partially observable Markov decision processes. [JAIR]. Journal of Artificial Intelligence Research, 14, 83-103.
    • (2001) Journal of Artificial Intelligence Research , vol.14 , pp. 83-103
    • Lusena, C.1    Goldsmith, J.2    Mundhenk, M.3
  • 32
    • 65349138293 scopus 로고    scopus 로고
    • A Heuristic Search Approach to Planning with Continuous Resources in Stochastic Domains. [JAIR]
    • Mausam
    • Meuleau, N., Benazera, E., Brafman, R.I., & Hansen, E.A. & Mausam. (2009). A Heuristic Search Approach to Planning with Continuous Resources in Stochastic Domains. [JAIR]. Journal of Artificial Intelligence Research, 34, 27-59.
    • (2009) Journal of Artificial Intelligence Research , vol.34 , pp. 27-59
    • Meuleau, N.1    Benazera, E.2    Brafman, R.I.3    Hansen, E.A.4
  • 37
    • 0000977910 scopus 로고
    • The complexity of Markov decision processes
    • doi:10.1287/moor.12.3.441
    • Papadimitriou, C. H., & Tsitsilis, J. N. (1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3), 441-450. doi:10.1287/moor.12.3.441
    • (1987) Mathematics of Operations Research , vol.12 , Issue.3 , pp. 441-450
    • Papadimitriou, C.H.1    Tsitsilis, J.N.2
  • 41
    • 0034292276 scopus 로고    scopus 로고
    • Constrained Markovian decision processes: The dynamic programming approach
    • doi:10.1016/S0167-6377(00)00039-0
    • Piunovskiy, A. B., & Mao, X. (2000). Constrained Markovian decision processes: the dynamic programming approach. Operations Research Letters, 27(3), 119-126. doi:10.1016/S0167-6377(00)00039-0
    • (2000) Operations Research Letters , vol.27 , Issue.3 , pp. 119-126
    • Piunovskiy, A.B.1    Mao, X.2
  • 45
    • 84898319353 scopus 로고    scopus 로고
    • Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains
    • Athens, Greece
    • Poupart, P., Lang, T., & Toussaint, M. (2011). Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains. European Conference on Machine Learning (ECML), Athens, Greece.
    • (2011) European Conference On Machine Learning (ECML)
    • Poupart, P.1    Lang, T.2    Toussaint, M.3
  • 48
    • 84881054930 scopus 로고    scopus 로고
    • Eliciting additive reward functions for Markov decision processes
    • Barcelona, Spain: IJCAI
    • Regan, K., & Boutilier, C. (2011a). Eliciting additive reward functions for Markov decision processes. International Joint Conferences on Artificial Intelligence (pp. 2159-2164). Barcelona, Spain: IJCAI.
    • (2011) , pp. 2159-2164
    • Regan, K.1    Boutilier, C.2
  • 49
    • 84881084517 scopus 로고    scopus 로고
    • Robust Online Optimization of Reward-Uncertain MDPs
    • Barcelona, Spain: IJCAI
    • Regan, K., & Boutilier, C. (2011b). Robust Online Optimization of Reward-Uncertain MDPs. International Joint Conferences on Artificial Intelligence (pp. 2165-2171). Barcelona, Spain: IJCAI.
    • (2011) , pp. 2165-2171
    • Regan, K.1    Boutilier, C.2
  • 53
    • 0015630091 scopus 로고
    • Markovian decision processes with uncertain transition probabilities
    • doi:10.1287/opre.21.3.728
    • Satia, J. K., & Lave, R. E. Jr. (1970). Markovian decision processes with uncertain transition probabilities. Operations Research, 21, 728-740. doi:10.1287/opre.21.3.728
    • (1970) Operations Research , vol.21 , pp. 728-740
    • Satia, J.K.1    Lave Jr., R.E.2
  • 59
    • 0015658957 scopus 로고
    • The optimal control of partially observable Markov processes over a finite horizon
    • doi:10.1287/opre.21.5.1071
    • Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21, 1071-1088. doi:10.1287/opre.21.5.1071
    • (1973) Operations Research , vol.21 , pp. 1071-1088
    • Smallwood, R.D.1    Sondik, E.J.2
  • 61
    • 31144472319 scopus 로고    scopus 로고
    • Perseus: Randomized point-based value iteration for POM-DPs. [JAIR]
    • Spaan, M. T. J., & Vlassis, N. A. (2005). Perseus: randomized point-based value iteration for POM-DPs. [JAIR]. Journal of Artificial Intelligence Research, 24, 195-220.
    • (2005) Journal of Artificial Intelligence Research , vol.24 , pp. 195-220
    • Spaan, M.T.J.1    Vlassis, N.A.2
  • 62
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. [AIJ]
    • doi:10.1016/S0004-3702(99)00052-1
    • Sutton, R., Precup, D., & Singh, S. P. (1999). Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. [AIJ]. Artificial Intelligence, 112(1-2), 181-211. doi:10.1016/S0004-3702(99)00052-1
    • (1999) Artificial Intelligence , vol.112 , Issue.1-2 , pp. 181-211
    • Sutton, R.1    Precup, D.2    Singh, S.P.3
  • 64
    • 51349153274 scopus 로고    scopus 로고
    • Probabilistic inference for solving (PO) MDPs
    • School of Informatics, University of Edinburgh
    • Toussaint, M., Harmeling, S., & Storkey, A. (2006). Probabilistic inference for solving (PO) MDPs, Technical Report EDI-INF-RR-0934, School of Informatics, University of Edinburgh.
    • (2006) Technical Report EDI-INF-RR-0934
    • Toussaint, M.1    Harmeling, S.2    Storkey, A.3
  • 65
    • 0028460403 scopus 로고
    • Markov decision processes with imprecise transition probabilities
    • doi:10.1287/opre.42.4.739
    • White, C. C. III, & El-Deib, H. K. (1994). Markov decision processes with imprecise transition probabilities. Operations Research, 42(4), 739-749. doi:10.1287/opre.42.4.739
    • (1994) Operations Research , vol.42 , Issue.4 , pp. 739-749
    • White, C.C.I.1    El-Deib, H.K.2
  • 66
    • 0010810245 scopus 로고    scopus 로고
    • Planning in stochastic domains: Problem characteristics and approximation
    • Hong Kong University of Science and Technology
    • Zhang, N. L., & Liu, W. (1996). Planning in stochastic domains: problem characteristics and approximation. Technical Report HKUST-CS96-31, Hong Kong University of Science and Technology.
    • (1996) Technical Report HKUST-CS96-31
    • Zhang, N.L.1    Liu, W.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.