메뉴 건너뛰기




Volumn 13, Issue , 2000, Pages 33-94

Value-Function Approximations for Partially Observable Markov Decision Processes

Author keywords

[No Author keywords available]

Indexed keywords


EID: 0001770240     PISSN: 10769757     EISSN: None     Source Type: Journal    
DOI: 10.1613/jair.678     Document Type: Article
Times cited : (431)

References (77)
  • 1
    • 50549213583 scopus 로고
    • Optimal control of Markov decision processes with incomplete state estimation
    • Astrom, K. J. (1965). Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications, 10, 174-205.
    • (1965) Journal of Mathematical Analysis and Applications , vol.10 , pp. 174-205
    • Astrom, K.J.1
  • 3
    • 0029210635 scopus 로고
    • Learning to act using real-time dynamic programming
    • Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72, 81-138.
    • (1995) Artificial Intelligence , vol.72 , pp. 81-138
    • Barto, A.G.1    Bradtke, S.J.2    Singh, S.P.3
  • 4
    • 0003787146 scopus 로고
    • Princeton University Press, Princeton, NJ
    • Bellman, R. E. (1957). Dynamic programming. Princeton University Press, Princeton, NJ.
    • (1957) Dynamic Programming
    • Bellman, R.E.1
  • 5
    • 0000268954 scopus 로고
    • A counter-example to temporal differences learning
    • Bertsekas, D. P. (1994). A counter-example to temporal differences learning. Neural Computation, 7, 270-279.
    • (1994) Neural Computation , vol.7 , pp. 270-279
    • Bertsekas, D.P.1
  • 8
    • 0346942368 scopus 로고    scopus 로고
    • Decision-theoretic planning: Structural assumptions and computational leverage
    • Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Artificial Intelligence, 11, 1-94.
    • (1999) Artificial Intelligence , vol.11 , pp. 1-94
    • Boutilier, C.1    Dean, T.2    Hanks, S.3
  • 10
    • 0001133021 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • MIT Press
    • Boyan, J. A., & Moore, A. A. (1995). Generalization in reinforcement learning: safely approximating the value function. In Advances in Neural Information Processing Systems 7. MIT Press.
    • (1995) Advances in Neural Information Processing Systems , vol.7
    • Boyan, J.A.1    Moore, A.A.2
  • 14
    • 0030570119 scopus 로고    scopus 로고
    • On the complexity of partially observed Markov decision processes
    • Burago, D., Rougemont, M. D., & Slissenko, A. (1996). On the complexity of partially observed Markov decision processes. Theoretical Computer Science, 157, 161-183.
    • (1996) Theoretical Computer Science , vol.157 , pp. 161-183
    • Burago, D.1    Rougemont, M.D.2    Slissenko, A.3
  • 19
    • 0026820657 scopus 로고
    • The complexity of stochastic games
    • Condon, A. (1992). The complexity of stochastic games. Information and Computation, 96, 203-224.
    • (1992) Information and Computation , vol.96 , pp. 203-224
    • Condon, A.1
  • 20
    • 84990553353 scopus 로고
    • A model for reasoning about persistence and causation
    • Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5, 142-150.
    • (1989) Computational Intelligence , vol.5 , pp. 142-150
    • Dean, T.1    Kanazawa, K.2
  • 21
    • 0030697013 scopus 로고    scopus 로고
    • Abstraction and approximate decision theoretic planning
    • Dearden, R., & Boutilier, C. (1997). Abstraction and approximate decision theoretic planning. Artificial Intelligence, 89, 219-283.
    • (1997) Artificial Intelligence , vol.89 , pp. 219-283
    • Dearden, R.1    Boutilier, C.2
  • 25
    • 0021486586 scopus 로고
    • The optimal search for a moving target when search path is constrained
    • Eagle, J. N. (1984). The optimal search for a moving target when search path is constrained. Operations Research, 32, 1107-1115.
    • (1984) Operations Research , vol.32 , pp. 1107-1115
    • Eagle, J.N.1
  • 28
    • 84898987770 scopus 로고    scopus 로고
    • An improved policy iteration algorithm for partially observable MDPs
    • MIT Press
    • Hansen, E. (1998a). An improved policy iteration algorithm for partially observable MDPs. In Advances in Neural Information Processing Systems 10. MIT Press.
    • (1998) Advances in Neural Information Processing Systems , vol.10
    • Hansen, E.1
  • 32
    • 0034160101 scopus 로고    scopus 로고
    • Planning treatment of ischemic heart disease with partially observable Markov decision processes
    • Hauskrecht, M., & Fraser, H. (2000). Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial Intelligence in Medicine, 18, 221-244.
    • (2000) Artificial Intelligence in Medicine , vol.18 , pp. 221-244
    • Hauskrecht, M.1    Fraser, H.2
  • 37
    • 0032073263 scopus 로고    scopus 로고
    • Planning and acting in partially observable stochastic domains
    • Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1999). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99-134.
    • (1999) Artificial Intelligence , vol.101 , pp. 99-134
    • Kaelbling, L.P.1    Littman, M.L.2    Cassandra, A.R.3
  • 41
    • 0022129301 scopus 로고
    • Depth-first iterative deepening: An optimal admissible tree search
    • Korf, R. (1985). Depth-first iterative deepening: an optimal admissible tree search. Artificial Intelligence, 27, 97-109.
    • (1985) Artificial Intelligence , vol.27 , pp. 97-109
    • Korf, R.1
  • 47
    • 0000494894 scopus 로고
    • Computationally feasible bounds for partially observed Markov decision processes
    • Lovejoy, W. S. (1991a). Computationally feasible bounds for partially observed Markov decision processes. Operations Research, 39, 192-175.
    • (1991) Operations Research , vol.39 , pp. 192-1175
    • Lovejoy, W.S.1
  • 48
    • 0002679852 scopus 로고
    • A survey of algorithmic methods for partially observed Markov decision processes
    • Lovejoy, W. S. (1991b). A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28, 47-66.
    • (1991) Annals of Operations Research , vol.28 , pp. 47-66
    • Lovejoy, W.S.1
  • 49
    • 0001095688 scopus 로고
    • Suboptimal policies with bounds for parameter adaptive decision processes
    • Lovejoy, W. S. (1993). Suboptimal policies with bounds for parameter adaptive decision processes. Operations Research, 41, 583-599.
    • (1993) Operations Research , vol.41 , pp. 583-599
    • Lovejoy, W.S.1
  • 54
    • 0019909899 scopus 로고
    • A survey of partially observable Markov decision processes: Theory, models, and algorithms
    • Monahan, G. E. (1982). A survey of partially observable Markov decision processes: theory, models, and algorithms. Management Science, 28, 1-16.
    • (1982) Management Science , vol.28 , pp. 1-16
    • Monahan, G.E.1
  • 64
    • 0015665630 scopus 로고
    • Markovian decision processes with probabilistic observation of states
    • Satia, J., & Lave, R. (1973). Markovian decision processes with probabilistic observation of states. Management Science, 20, 1-13.
    • (1973) Management Science , vol.20 , pp. 1-13
    • Satia, J.1    Lave, R.2
  • 66
    • 0015658957 scopus 로고
    • The optimal control of partially observable processes over a finite horizon
    • Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable processes over a finite horizon. Operations Research, 21, 1071-1088.
    • (1973) Operations Research , vol.21 , pp. 1071-1088
    • Smallwood, R.D.1    Sondik, E.J.2
  • 68
    • 0017943242 scopus 로고
    • The optimal control of partially observable processes over the infinite horizon: Discounted costs
    • Sondik, E. J. (1978). The optimal control of partially observable processes over the infinite horizon: Discounted costs. Operations Research, 26, 282-304.
    • (1978) Operations Research , vol.26 , pp. 282-304
    • Sondik, E.J.1
  • 70
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large-scale dynamic programming
    • Tsitsiklis, J. N., & Roy, B. V. (1996). Feature-based methods for large-scale dynamic programming. Machine Learning, 22, 59-94.
    • (1996) Machine Learning , vol.22 , pp. 59-94
    • Tsitsiklis, J.N.1    Roy, B.V.2
  • 72
    • 0005951145 scopus 로고
    • Finite memory suboptimal design for partially observed Markov decision processes
    • White, C. C., & Scherer, W. T. (1994). Finite memory suboptimal design for partially observed Markov decision processes. Operations Research, 42, 439-455.
    • (1994) Operations Research , vol.42 , pp. 439-455
    • White, C.C.1    Scherer, W.T.2
  • 76
    • 85016628903 scopus 로고    scopus 로고
    • A model approximation scheme for planning in partially observable stochastic domains
    • Zhang, N. L., & Liu, W. (1997a). A model approximation scheme for planning in partially observable stochastic domains. Journal of Artificial Intelligence Research, 7, 199-230.
    • (1997) Journal of Artificial Intelligence Research , vol.7 , pp. 199-230
    • Zhang, N.L.1    Liu, W.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.