SCOPUS 정보 검색 플랫폼

Proceedings of the 12th International Conference on Machine Learning, ICML 1995

Volumn , Issue , 1995, Pages 362-370

Learning policies for partially observable environments: Scaling up

(3) Littman, Michael L a Cassandra, Anthony R a Kaelbling, Leslie Pack a

a BROWN UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ROBOTS;

DECISION PROBLEMS; LEARNING POLICY; MARKOV DECISION PROCESS MODELS; MODELING DECISIONS; NOISY SENSORS; PARTIALLY OBSERVABLE ENVIRONMENTS; PARTIALLY OBSERVABLE MARKOV DECISION PROCESS; SCALING-UP; SENSOR FEEDBACK; SIMPLE++;

MARKOV PROCESSES;

EID: 85138579181 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (490)

References (27)

1
- 50549213583
- Optimal control of Markov decision processes with incomplete state estimation
- Astrom, K. J. (1965). Optimal control of Markov decision processes with incomplete state estimation. J. Math. Anal. Appl., 10:174-205.
- (1965) J. Math. Anal. Appl , vol.10 , pp. 174-205
- Astrom, K. J.¹

2
- 0003565779
- Prentice-Hall
- Bertsekas, D. P. (1987). Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall.
- (1987) Dynamic Programming: Deterministic and Stochastic Models
- Bertsekas, D. P.¹

3
- 85166207010
- Exploiting structure in policy construction
- Boutilier, C , Dearden, R., and Goldszmidt, M. (1995). Exploiting structure in policy construction. In Proceedings of the International Joint Conference on Artificial Intelligence.
- (1995) Proceedings of the International Joint Conference on Artificial Intelligence
- Boutilier, C¹ Dearden, R.² Goldszmidt, M.³

4
- 0008084202
- Technical Report CS-94-14, Brown University, Department of Computer Science, Providence RI
- Cassandra, A. (1994). Optimal policies for partially observable Markov decision processes. Technical Report CS-94-14, Brown University, Department of Computer Science, Providence RI.
- (1994) Optimal policies for partially observable Markov decision processes
- Cassandra, A.¹

5
- 0028564629
- Acting optimally in partially observable stochastic domains
- Seattle, WA
- Cassandra, A. R., Kaelbling, L. P., and Littman, M. L. (1994). Acting optimally in partially observable stochastic domains. In Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA.
- (1994) Proceedings of the Twelfth National Conference on Artificial Intelligence
- Cassandra, A. R.¹ Kaelbling, L. P.² Littman, M. L.³

6
- 0003818801
- PhD thesis, University of British Columbia, British Columbia, Canada
- Cheng, H.-T. (1988). Algorithms for Partially Observable Markov Decision Processes. PhD thesis, University of British Columbia, British Columbia, Canada.
- (1988) Algorithms for Partially Observable Markov Decision Processes
- Cheng, H.-T.¹

7
- 0026998041
- Reinforcement learning with perceptual aliasing: The perceptual distinctions approach
- Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proc. Tenth National Conference on AI (AAAI).
- (1992) Proc. Tenth National Conference on AI (AAAI)
- Chrisman, L.¹

8
- 0001041553
- Rapid task learning for real robots
- Kluwer Academic Publishers
- Connell, J. and Mahadevan, S. (1993). Rapid task learning for real robots. In Robot Learning. Kluwer Academic Publishers.
- (1993) Robot Learning
- Connell, J.¹ Mahadevan, S.²

9
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakkola, T., Jordan, M. I., and Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6).
- (1994) Neural Computation , vol.6 , Issue.6
- Jaakkola, T.¹ Jordan, M. I.² Singh, S. P.³

10
- 2342597043
- Technical Report 93-06-03, University of Washington Department of Computer Science and Engineering. To appear in Artificial Intelligence
- Kushmerick, N., Hanks, S., and Weld, D. (1993). An Algorithm for Probabilistic Planning. Technical Report 93-06-03, University of Washington Department of Computer Science and Engineering. To appear in Artificial Intelligence.
- (1993) An Algorithm for Probabilistic Planning
- Kushmerick, N.¹ Hanks, S.² Weld, D.³

11
- 33646427325
- Technical Report CS-95-11, Brown University, Department of Computer Science, Providence RI
- Littman, M., Cassandra, A., and Kaelbling, L. (1995). Learning policies for partially observable environments: Scaling up. Technical Report CS-95-11, Brown University, Department of Computer Science, Providence RI.
- (1995) Learning policies for partially observable environments: Scaling up
- Littman, M.¹ Cassandra, A.² Kaelbling, L.³

12
- 0008038484
- Technical Report CS-94-40, Brown University, Department of Computer Science, Providence, RI
- Littman, M. L. (1994). The Witness algorithm: Solving partially observable Markov decision processes. Technical Report CS-94-40, Brown University, Department of Computer Science, Providence, RI.
- (1994) The Witness algorithm: Solving partially observable Markov decision processes
- Littman, M. L.¹

13
- 0002679852
- A survey of algorithmic methods for partially observable Markov decision processes
- Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28:47-66.
- (1991) Annals of Operations Research , vol.28 , pp. 47-66
- Lovejoy, W. S.¹

14
- 85152629368
- Technical Report 446, Dept. Comp. Sei., Univ. Rochester. also Proceedings of Machine Learning Conference 1993
- McCallum, R. A. (1992). First results with utile distinction memory for reinforcement learning. Technical Report 446, Dept. Comp. Sei., Univ. Rochester. See also Proceedings of Machine Learning Conference 1993.
- (1992) First results with utile distinction memory for reinforcement learning
- McCallum, R. A.¹

15
- 0006488247
- The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces
- San Mateo, CA. Morgan Kaufmann
- Moore, A. W. (1994). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces. In Advances in Neural Information Processing Systems 6, San Mateo, CA. Morgan Kaufmann.
- (1994) Advances in Neural Information Processing Systems , vol.6
- Moore, A. W.¹

16
- 0041543604
- Toward approximate planning in very large stochastic domains
- Stanford, California
- Nicholson, A. and Kaelbling, L. P. (1994). Toward approximate planning in very large stochastic domains. In Proceedings of the AAAI Spring Symposium on Decision Theoretic Planning, Stanford, California.
- (1994) Proceedings of the AAAI Spring Symposium on Decision Theoretic Planning
- Nicholson, A.¹ Kaelbling, L. P.²

17
- 0000977910
- The complexity of Markov decision processes
- Papadimitriou, C. H. and Tsitsiklis, J. N. (1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3):441-450.
- (1987) Mathematics of Operations Research , vol.12 , Issue.3 , pp. 441-450
- Papadimitriou, C. H.¹ Tsitsiklis, J. N.²

18
- 85168129602
- Approximating optimal policies for partially observable stochastic domains
- Parr, R. and Russell, S. (1995). Approximating optimal policies for partially observable stochastic domains. In Proceedings of the International Joint Conference on Artificial Intelligence.
- (1995) Proceedings of the International Joint Conference on Artificial Intelligence
- Parr, R.¹ Russell, S.²

19
- 0003998452
- John Wiley & Sons, Inc., New York, NY
- Puterman, M. L. (1994). Markov Decision Processes- Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY.
- (1994) Markov Decision Processes- Discrete Stochastic Dynamic Programming
- Puterman, M. L.¹

20
- 0004038871
- Academic Press, New York
- Ross, S. M. (1983). Introduction to Stochastic Dynamic Programming. Academic Press, New York.
- (1983) Introduction to Stochastic Dynamic Programming
- Ross, S. M.¹

21
- 0000646059
- Learning internal representations by error backpropagation
- Rumelhart, D. E. and McClelland, J. L., editors, Foundations, chapter 8. The MIT Press, Cambridge, MA
- Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning internal representations by error backpropagation. In Rumelhart, D. E. and McClelland, J. L., editors, Parallel Distributed Processing: Explorations in the microstructures of cognition. Volume 1: Foundations, chapter 8. The MIT Press, Cambridge, MA.
- (1986) Parallel Distributed Processing: Explorations in the microstructures of cognition , vol.1
- Rumelhart, D. E.¹ Hinton, G. E.² Williams, R. J.³

22
- 0003584577
- Prentice-Hall, Englewood Cliffs, NJ
- Russell, S. J. and Norvig, P. (1994). Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs, NJ.
- (1994) Artificial Intelligence: A Modern Approach
- Russell, S. J.¹ Norvig, P.²

23
- 0015658957
- The optimal control of partially observable Markov processes over a finite horizon
- Smallwood, R. D. and Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21:1071-1088.
- (1973) Operations Research , vol.21 , pp. 1071-1088
- Smallwood, R. D.¹ Sondik, E. J.²

24
- 0017943242
- The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs
- Sondik, E. J. (1978). The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations Research, 26(2).
- (1978) Operations Research , vol.26 , Issue.2
- Sondik, E. J.¹

25
- 85152619997
- Asynchronous stohcastic aproximation and Q-learning
- Tsitsikilis, J. N. (1994). Asynchronous stohcastic aproximation and Q-learning. Machine Learning, 16(3).
- (1994) Machine Learning , vol.16 , Issue.3
- Tsitsikilis, J. N.¹

26
- 0004049893
- PhD thesis, Cambridge University
- Watkins, C. J. (1989). Learning with Delayed Rewards. PhD thesis, Cambridge University.
- (1989) Learning with Delayed Rewards
- Watkins, C. J.¹

27
- 0012252296
- Technical Report NUCCS 93-13, Northeastern University, College of Computer Science, Boston, MA
- Williams, R. J. and Baird, L. C. I. (1993). Tight performance bounds on greedy policies based on imperfect value functions. Technical Report NUCCS- 93-13, Northeastern University, College of Computer Science, Boston, MA.
- (1993) Tight performance bounds on greedy policies based on imperfect value functions
- Williams, R. J.¹ Baird, L. C. I.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.