-
1
-
-
50549213583
-
Optimal control of Markov decision processes with incomplete state estimation
-
Astrom, K. J. (1965). Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications, 10, 174-205.
-
(1965)
Journal of Mathematical Analysis and Applications
, vol.10
, pp. 174-205
-
-
Astrom, K.J.1
-
3
-
-
0029210635
-
Learning to act using real-time dynamic programming
-
Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72, 81-138.
-
(1995)
Artificial Intelligence
, vol.72
, pp. 81-138
-
-
Barto, A.G.1
Bradtke, S.J.2
Singh, S.P.3
-
4
-
-
0003787146
-
-
Princeton University Press, Princeton, NJ
-
Bellman, R. E. (1957). Dynamic programming. Princeton University Press, Princeton, NJ.
-
(1957)
Dynamic Programming
-
-
Bellman, R.E.1
-
5
-
-
0000268954
-
A counter-example to temporal differences learning
-
Bertsekas, D. P. (1994). A counter-example to temporal differences learning. Neural Computation, 7, 270-279.
-
(1994)
Neural Computation
, vol.7
, pp. 270-279
-
-
Bertsekas, D.P.1
-
8
-
-
0346942368
-
Decision-theoretic planning: Structural assumptions and computational leverage
-
Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Artificial Intelligence, 11, 1-94.
-
(1999)
Artificial Intelligence
, vol.11
, pp. 1-94
-
-
Boutilier, C.1
Dean, T.2
Hanks, S.3
-
10
-
-
0001133021
-
Generalization in reinforcement learning: Safely approximating the value function
-
MIT Press
-
Boyan, J. A., & Moore, A. A. (1995). Generalization in reinforcement learning: safely approximating the value function. In Advances in Neural Information Processing Systems 7. MIT Press.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
-
-
Boyan, J.A.1
Moore, A.A.2
-
14
-
-
0030570119
-
On the complexity of partially observed Markov decision processes
-
Burago, D., Rougemont, M. D., & Slissenko, A. (1996). On the complexity of partially observed Markov decision processes. Theoretical Computer Science, 157, 161-183.
-
(1996)
Theoretical Computer Science
, vol.157
, pp. 161-183
-
-
Burago, D.1
Rougemont, M.D.2
Slissenko, A.3
-
16
-
-
0001909869
-
Incremental pruning: A simple, fast, exact algorithm for partially observable Markov decision processes
-
Cassandra, A. R., Littman, M. L., & Zhang, N. L. (1997). Incremental pruning: a simple, fast, exact algorithm for partially observable Markov decision processes. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, pp. 54-61.
-
(1997)
Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence
, pp. 54-61
-
-
Cassandra, A.R.1
Littman, M.L.2
Zhang, N.L.3
-
19
-
-
0026820657
-
The complexity of stochastic games
-
Condon, A. (1992). The complexity of stochastic games. Information and Computation, 96, 203-224.
-
(1992)
Information and Computation
, vol.96
, pp. 203-224
-
-
Condon, A.1
-
20
-
-
84990553353
-
A model for reasoning about persistence and causation
-
Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5, 142-150.
-
(1989)
Computational Intelligence
, vol.5
, pp. 142-150
-
-
Dean, T.1
Kanazawa, K.2
-
21
-
-
0030697013
-
Abstraction and approximate decision theoretic planning
-
Dearden, R., & Boutilier, C. (1997). Abstraction and approximate decision theoretic planning. Artificial Intelligence, 89, 219-283.
-
(1997)
Artificial Intelligence
, vol.89
, pp. 219-283
-
-
Dearden, R.1
Boutilier, C.2
-
24
-
-
84943292752
-
Probabilistic planning with information gathering and contingent execution
-
Draper, D., Hanks, S., & Weld, D. (1994). Probabilistic planning with information gathering and contingent execution. In Proceedings of the Second International Conference on AI Planning Systems, pp. 31-36.
-
(1994)
Proceedings of the Second International Conference on AI Planning Systems
, pp. 31-36
-
-
Draper, D.1
Hanks, S.2
Weld, D.3
-
25
-
-
0021486586
-
The optimal search for a moving target when search path is constrained
-
Eagle, J. N. (1984). The optimal search for a moving target when search path is constrained. Operations Research, 32, 1107-1115.
-
(1984)
Operations Research
, vol.32
, pp. 1107-1115
-
-
Eagle, J.N.1
-
28
-
-
84898987770
-
An improved policy iteration algorithm for partially observable MDPs
-
MIT Press
-
Hansen, E. (1998a). An improved policy iteration algorithm for partially observable MDPs. In Advances in Neural Information Processing Systems 10. MIT Press.
-
(1998)
Advances in Neural Information Processing Systems
, vol.10
-
-
Hansen, E.1
-
32
-
-
0034160101
-
Planning treatment of ischemic heart disease with partially observable Markov decision processes
-
Hauskrecht, M., & Fraser, H. (2000). Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial Intelligence in Medicine, 18, 221-244.
-
(2000)
Artificial Intelligence in Medicine
, vol.18
, pp. 221-244
-
-
Hauskrecht, M.1
Fraser, H.2
-
37
-
-
0032073263
-
Planning and acting in partially observable stochastic domains
-
Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1999). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99-134.
-
(1999)
Artificial Intelligence
, vol.101
, pp. 99-134
-
-
Kaelbling, L.P.1
Littman, M.L.2
Cassandra, A.R.3
-
38
-
-
0003072903
-
Stochastic simulation algorithms for dynamic probabilistic networks
-
Kanazawa, K., Koller, D., & Russell, S. J. (1995). Stochastic simulation algorithms for dynamic probabilistic networks. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 346-351.
-
(1995)
Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence
, pp. 346-351
-
-
Kanazawa, K.1
Koller, D.2
Russell, S.J.3
-
39
-
-
84880649215
-
A sparse sampling algorithm for near optimal planning in large Markov decision processes
-
Kearns, M., Mansour, Y., & Ng, A. Y. (1999). A sparse sampling algorithm for near optimal planning in large Markov decision processes. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 1324-1331.
-
(1999)
Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
, pp. 1324-1331
-
-
Kearns, M.1
Mansour, Y.2
Ng, A.Y.3
-
41
-
-
0022129301
-
Depth-first iterative deepening: An optimal admissible tree search
-
Korf, R. (1985). Depth-first iterative deepening: an optimal admissible tree search. Artificial Intelligence, 27, 97-109.
-
(1985)
Artificial Intelligence
, vol.27
, pp. 97-109
-
-
Korf, R.1
-
42
-
-
0029333536
-
An algorithm for probabilistic planning
-
Kushmerick, N., Hanks, S., & Weld, D. (1995). An algorithm for probabilistic planning. Artificial Intelligence, 76, 239-286.
-
(1995)
Artificial Intelligence
, vol.76
, pp. 239-286
-
-
Kushmerick, N.1
Hanks, S.2
Weld, D.3
-
44
-
-
0003272035
-
Memoryless policies: Theoretical limitations and practical results
-
Cliff, D., Husbands, P., Meyer, J., & Wilson, S. (Eds.), MIT Press, Cambridge
-
Littman, M. L. (1994). Memoryless policies: Theoretical limitations and practical results. In Cliff, D., Husbands, P., Meyer, J., & Wilson, S. (Eds.), From Animals to Animals 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior. MIT Press, Cambridge.
-
(1994)
From Animals to Animals 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior
, vol.3
-
-
Littman, M.L.1
-
46
-
-
85138579181
-
Learning policies for partially observable environments: Scaling up
-
Littman, M. L., Cassandra, A. R., & Kaelbling, L. P. (1995). Learning policies for partially observable environments: scaling up. In Proceedings of the Twelfth International Conference on Machine Learning, pp. 362-370.
-
(1995)
Proceedings of the Twelfth International Conference on Machine Learning
, pp. 362-370
-
-
Littman, M.L.1
Cassandra, A.R.2
Kaelbling, L.P.3
-
47
-
-
0000494894
-
Computationally feasible bounds for partially observed Markov decision processes
-
Lovejoy, W. S. (1991a). Computationally feasible bounds for partially observed Markov decision processes. Operations Research, 39, 192-175.
-
(1991)
Operations Research
, vol.39
, pp. 192-1175
-
-
Lovejoy, W.S.1
-
48
-
-
0002679852
-
A survey of algorithmic methods for partially observed Markov decision processes
-
Lovejoy, W. S. (1991b). A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28, 47-66.
-
(1991)
Annals of Operations Research
, vol.28
, pp. 47-66
-
-
Lovejoy, W.S.1
-
49
-
-
0001095688
-
Suboptimal policies with bounds for parameter adaptive decision processes
-
Lovejoy, W. S. (1993). Suboptimal policies with bounds for parameter adaptive decision processes. Operations Research, 41, 583-599.
-
(1993)
Operations Research
, vol.41
, pp. 583-599
-
-
Lovejoy, W.S.1
-
50
-
-
0347348620
-
-
Tech. rep., University of Kentucky
-
Lusena, C., Goldsmith, J., & Mundhenk, M. (1998). Nonapproximability results for Markov decision processes. Tech. rep., University of Kentucky.
-
(1998)
Nonapproximability Results for Markov Decision Processes
-
-
Lusena, C.1
Goldsmith, J.2
Mundhenk, M.3
-
54
-
-
0019909899
-
A survey of partially observable Markov decision processes: Theory, models, and algorithms
-
Monahan, G. E. (1982). A survey of partially observable Markov decision processes: theory, models, and algorithms. Management Science, 28, 1-16.
-
(1982)
Management Science
, vol.28
, pp. 1-16
-
-
Monahan, G.E.1
-
55
-
-
0008884567
-
-
Tech. rep., CS Dept TR 273-97, University of Kentucky
-
Mundhenk, M., Goldsmith, J., Lusena, C., & Allender, E. (1997). Encyclopaedia of complexity results for finite-horizon Markov decision process problems. Tech. rep., CS Dept TR 273-97, University of Kentucky.
-
(1997)
Encyclopaedia of Complexity Results for Finite-horizon Markov Decision Process Problems
-
-
Mundhenk, M.1
Goldsmith, J.2
Lusena, C.3
Allender, E.4
-
63
-
-
0000646059
-
Learning internal representations by error propagation
-
Rumelhart, D., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In Parallel Distributed Processing, pp. 318-362.
-
(1986)
Parallel Distributed Processing
, pp. 318-362
-
-
Rumelhart, D.1
Hinton, G.E.2
Williams, R.J.3
-
64
-
-
0015665630
-
Markovian decision processes with probabilistic observation of states
-
Satia, J., & Lave, R. (1973). Markovian decision processes with probabilistic observation of states. Management Science, 20, 1-13.
-
(1973)
Management Science
, vol.20
, pp. 1-13
-
-
Satia, J.1
Lave, R.2
-
65
-
-
2142812536
-
Learning without state-estimation in partially observable Markovian decision processes
-
Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Learning without state-estimation in partially observable Markovian decision processes. In Proceedings of the Eleventh International Conference on Machine Learning, pp. 284-292.
-
(1994)
Proceedings of the Eleventh International Conference on Machine Learning
, pp. 284-292
-
-
Singh, S.P.1
Jaakkola, T.2
Jordan, M.I.3
-
66
-
-
0015658957
-
The optimal control of partially observable processes over a finite horizon
-
Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable processes over a finite horizon. Operations Research, 21, 1071-1088.
-
(1973)
Operations Research
, vol.21
, pp. 1071-1088
-
-
Smallwood, R.D.1
Sondik, E.J.2
-
68
-
-
0017943242
-
The optimal control of partially observable processes over the infinite horizon: Discounted costs
-
Sondik, E. J. (1978). The optimal control of partially observable processes over the infinite horizon: Discounted costs. Operations Research, 26, 282-304.
-
(1978)
Operations Research
, vol.26
, pp. 282-304
-
-
Sondik, E.J.1
-
69
-
-
0025399873
-
Dynamic programming and influence diagrams
-
Tatman, J., & Schachter, R. D. (1990). Dynamic programming and influence diagrams. IEEE Transactions on Systems, Man and Cybernetics, 20, 365-379.
-
(1990)
IEEE Transactions on Systems, Man and Cybernetics
, vol.20
, pp. 365-379
-
-
Tatman, J.1
Schachter, R.D.2
-
70
-
-
0029752470
-
Feature-based methods for large-scale dynamic programming
-
Tsitsiklis, J. N., & Roy, B. V. (1996). Feature-based methods for large-scale dynamic programming. Machine Learning, 22, 59-94.
-
(1996)
Machine Learning
, vol.22
, pp. 59-94
-
-
Tsitsiklis, J.N.1
Roy, B.V.2
-
72
-
-
0005951145
-
Finite memory suboptimal design for partially observed Markov decision processes
-
White, C. C., & Scherer, W. T. (1994). Finite memory suboptimal design for partially observed Markov decision processes. Operations Research, 42, 439-455.
-
(1994)
Operations Research
, vol.42
, pp. 439-455
-
-
White, C.C.1
Scherer, W.T.2
-
76
-
-
85016628903
-
A model approximation scheme for planning in partially observable stochastic domains
-
Zhang, N. L., & Liu, W. (1997a). A model approximation scheme for planning in partially observable stochastic domains. Journal of Artificial Intelligence Research, 7, 199-230.
-
(1997)
Journal of Artificial Intelligence Research
, vol.7
, pp. 199-230
-
-
Zhang, N.L.1
Liu, W.2
|