-
1
-
-
50549213583
-
Optimal control of Markov decision processes with incomplete state estimation
-
Astrom, K. J. (1965). Optimal control of Markov decision processes with incomplete state estimation. J. Math. Anal. Appl., 10:174-205.
-
(1965)
J. Math. Anal. Appl
, vol.10
, pp. 174-205
-
-
Astrom, K. J.1
-
5
-
-
0028564629
-
Acting optimally in partially observable stochastic domains
-
Seattle, WA
-
Cassandra, A. R., Kaelbling, L. P., and Littman, M. L. (1994). Acting optimally in partially observable stochastic domains. In Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA.
-
(1994)
Proceedings of the Twelfth National Conference on Artificial Intelligence
-
-
Cassandra, A. R.1
Kaelbling, L. P.2
Littman, M. L.3
-
7
-
-
0026998041
-
Reinforcement learning with perceptual aliasing: The perceptual distinctions approach
-
Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proc. Tenth National Conference on AI (AAAI).
-
(1992)
Proc. Tenth National Conference on AI (AAAI)
-
-
Chrisman, L.1
-
8
-
-
0001041553
-
Rapid task learning for real robots
-
Kluwer Academic Publishers
-
Connell, J. and Mahadevan, S. (1993). Rapid task learning for real robots. In Robot Learning. Kluwer Academic Publishers.
-
(1993)
Robot Learning
-
-
Connell, J.1
Mahadevan, S.2
-
9
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
Jaakkola, T., Jordan, M. I., and Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6).
-
(1994)
Neural Computation
, vol.6
, Issue.6
-
-
Jaakkola, T.1
Jordan, M. I.2
Singh, S. P.3
-
10
-
-
2342597043
-
-
Technical Report 93-06-03, University of Washington Department of Computer Science and Engineering. To appear in Artificial Intelligence
-
Kushmerick, N., Hanks, S., and Weld, D. (1993). An Algorithm for Probabilistic Planning. Technical Report 93-06-03, University of Washington Department of Computer Science and Engineering. To appear in Artificial Intelligence.
-
(1993)
An Algorithm for Probabilistic Planning
-
-
Kushmerick, N.1
Hanks, S.2
Weld, D.3
-
11
-
-
33646427325
-
-
Technical Report CS-95-11, Brown University, Department of Computer Science, Providence RI
-
Littman, M., Cassandra, A., and Kaelbling, L. (1995). Learning policies for partially observable environments: Scaling up. Technical Report CS-95-11, Brown University, Department of Computer Science, Providence RI.
-
(1995)
Learning policies for partially observable environments: Scaling up
-
-
Littman, M.1
Cassandra, A.2
Kaelbling, L.3
-
12
-
-
0008038484
-
-
Technical Report CS-94-40, Brown University, Department of Computer Science, Providence, RI
-
Littman, M. L. (1994). The Witness algorithm: Solving partially observable Markov decision processes. Technical Report CS-94-40, Brown University, Department of Computer Science, Providence, RI.
-
(1994)
The Witness algorithm: Solving partially observable Markov decision processes
-
-
Littman, M. L.1
-
13
-
-
0002679852
-
A survey of algorithmic methods for partially observable Markov decision processes
-
Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28:47-66.
-
(1991)
Annals of Operations Research
, vol.28
, pp. 47-66
-
-
Lovejoy, W. S.1
-
15
-
-
0006488247
-
The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces
-
San Mateo, CA. Morgan Kaufmann
-
Moore, A. W. (1994). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces. In Advances in Neural Information Processing Systems 6, San Mateo, CA. Morgan Kaufmann.
-
(1994)
Advances in Neural Information Processing Systems
, vol.6
-
-
Moore, A. W.1
-
21
-
-
0000646059
-
Learning internal representations by error backpropagation
-
Rumelhart, D. E. and McClelland, J. L., editors, Foundations, chapter 8. The MIT Press, Cambridge, MA
-
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning internal representations by error backpropagation. In Rumelhart, D. E. and McClelland, J. L., editors, Parallel Distributed Processing: Explorations in the microstructures of cognition. Volume 1: Foundations, chapter 8. The MIT Press, Cambridge, MA.
-
(1986)
Parallel Distributed Processing: Explorations in the microstructures of cognition
, vol.1
-
-
Rumelhart, D. E.1
Hinton, G. E.2
Williams, R. J.3
-
23
-
-
0015658957
-
The optimal control of partially observable Markov processes over a finite horizon
-
Smallwood, R. D. and Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21:1071-1088.
-
(1973)
Operations Research
, vol.21
, pp. 1071-1088
-
-
Smallwood, R. D.1
Sondik, E. J.2
-
24
-
-
0017943242
-
The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs
-
Sondik, E. J. (1978). The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations Research, 26(2).
-
(1978)
Operations Research
, vol.26
, Issue.2
-
-
Sondik, E. J.1
-
25
-
-
85152619997
-
Asynchronous stohcastic aproximation and Q-learning
-
Tsitsikilis, J. N. (1994). Asynchronous stohcastic aproximation and Q-learning. Machine Learning, 16(3).
-
(1994)
Machine Learning
, vol.16
, Issue.3
-
-
Tsitsikilis, J. N.1
-
27
-
-
0012252296
-
-
Technical Report NUCCS 93-13, Northeastern University, College of Computer Science, Boston, MA
-
Williams, R. J. and Baird, L. C. I. (1993). Tight performance bounds on greedy policies based on imperfect value functions. Technical Report NUCCS- 93-13, Northeastern University, College of Computer Science, Boston, MA.
-
(1993)
Tight performance bounds on greedy policies based on imperfect value functions
-
-
Williams, R. J.1
Baird, L. C. I.2
|