SCOPUS 정보 검색 플랫폼

Proceedings of the 12th International Conference on Machine Learning, ICML 1995

Volumn , Issue , 1995, Pages 387-395

Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State

(1) Mccallum, R Andrew a

a UNIVERSITY OF ROCHESTER (United States)

Author keywords

[No Author keywords available]

Indexed keywords

DYNAMIC PROGRAMMING; LEARNING ALGORITHMS; REINFORCEMENT LEARNING;

ALIASING; HIDDEN STATE; LEARN+; MEMORY USE; MEMORY-BASED LEARNING; REINFORCEMENT LEARNING ALGORITHMS; REINFORCEMENT LEARNINGS; SHORT TERM MEMORY; TASK STRUCTURE; TREE-STRUCTURED REPRESENTATION;

TREES (MATHEMATICS);

EID: 2342482919 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (116)

References (27)

1
- 0000963039
- Pengi: an implementation of a theory of activity
- [Agre and Chapman, 1987] pages
- [Agre and Chapman, 1987] Philip E. Agre and David Chapman. Pengi: an implementation of a theory of activity. In AAAI, pages 268-272,1987.
- (1987) AAAI , pp. 268-272
- Agre, Philip E.¹ Chapman, David²

2
- 0003915098
- [Barto et al., 1991] Technical Report 91-57, University of Massachusetts, Amherst, MA
- [Barto et al., 1991] A.B. Barto, S.J. Bradtke, and S.P. Singh. Real-time learning and control using asynchronous dynamic programming. Technical Report 91-57, University of Massachusetts, Amherst, MA, 1991.
- (1991) Real-time learning and control using asynchronous dynamic programming
- Barto, A.B.¹ Bradtke, S.J.² Singh, S.P.³

3
- 0003787146
- [Bellman, 1957] Princeton University Press, Princeton, NJ
- [Bellman, 1957] R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.
- (1957) Dynamic Programming
- Bellman, R. E.¹

4
- 0003923091
- [Bertsekas and Shreve, 1978] Dimitri. P. Bertsekas and Academic Press
- [Bertsekas and Shreve, 1978] Dimitri. P. Bertsekas and Steven E. Shreve. Stochastic Optimal Control. Academic Press, 1978.
- (1978) Stochastic Optimal Control
- Shreve, Steven E.¹

5
- 0028564629
- Acting optimally in partially observable stochastic domains
- [Cassandra et al, 1994] Seattle, WA
- [Cassandra et al, 1994] Anthony R. Cassandra, Leslie Pack Kaelbling, and Michael L. Littman. Acting optimally in partially observable stochastic domains. In Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA, 1994.
- (1994) Proceedings of the Twelfth National Conference on Artificial Intelligence
- Cassandra, Anthony R.¹ Kaelbling, Leslie Pack² Littman, Michael L.³

6
- 0002192119
- Learning from delayed reinforcement in a complex domain
- [Chapman and Kaelbling, 1991]
- [Chapman and Kaelbling, 1991] David Chapman and Leslie Pack Kaelbling. Learning from delayed reinforcement in a complex domain. In Twelfth International Joint Conference on Artificial Intelligence, 1991.
- (1991) Twelfth International Joint Conference on Artificial Intelligence
- Chapman, David¹ Kaelbling, Leslie Pack²

7
- 0026998041
- Reinforcement learning with perceptual aliasing: The perceptual distinctions approach
- [Chrisman, 1992]
- [Chrisman, 1992] Lonnie Chrisman. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Tenth National Conference on AI, 1992.
- (1992) Tenth National Conference on AI
- Chrisman, Lonnie¹

8
- 0000624333
- Reinforcement learning algorithm for partially observable markov decision problems
- [Jaakkolaei al, 1995] Morgan Kaufmann
- [Jaakkolaei al, 1995] Tommi Jaakkola, Satinder Pal Singh, and Michael I. Jordan. Reinforcement learning algorithm for partially observable markov decision problems. In Advances of Neural Information Processing Systems (NIPS 7). Morgan Kaufmann, 1995.
- (1995) Advances of Neural Information Processing Systems (NIPS 7)
- Jaakkola, Tommi¹ Singh, Satinder Pal² Jordan, Michael I.³

9
- 85151437138
- Programming robots using reinforcement learning and teaching
- 1991]
- [Lin, 1991] Long-Ji Lin. Programming robots using reinforcement learning and teaching. Ninth National Conference on Artificial Intelligence, 1991.
- (1991) Ninth National Conference on Artificial Intelligence
- Lin, Long-Ji¹

10
- 0003673017
- 1993] PhD thesis, Carnegie Mellon, School of Computer Science, January
- [Lin, 1993] Long-Ji Lin. Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon, School of Computer Science, January 1993.
- (1993) Reinforcement Learning for Robots Using Neural Networks
- Lin, Long-Ji¹

11
- 0003272035
- Memoryless policies: Theoretical limitations and practical results
- [Littman, 1994]
- [Littman, 1994] Michael Littman. Memoryless policies: Theoretical limitations and practical results. In Proceedings of the Third International Conference on Simulation of Adaptive Behavior: From Animals to Animais, 1994.
- (1994) Proceedings of the Third International Conference on Simulation of Adaptive Behavior: From Animals to Animais
- Littman, Michael¹

12
- 0001133191
- Overcoming incomplete perception with utile distinction memory
- [McCallum, 1993] Morgan Kaufmann Publishers, Inc
- [McCallum, 1993] R. Andrew McCallum. Overcoming incomplete perception with utile distinction memory. In The Proceedings of the Tenth International Machine Learning Conference. Morgan Kaufmann Publishers, Inc., 1993.
- (1993) The Proceedings of the Tenth International Machine Learning Conference
- Andrew McCallum, R.¹

13
- 33748176214
- [McCallum, 1994] Technical Report 549, University of Rochester Computer Science Dept., December
- [McCallum, 1994] R. Andrew McCallum. Utile suffix memory for reinforcement learning with hidden state. Technical Report 549, University of Rochester Computer Science Dept., December 1994.
- (1994) Utile suffix memory for reinforcement learning with hidden state
- Andrew McCallum, R.¹

14
- 0001617769
- Instance-based state identification for reinforcement learning
- [McCallum, 1995]
- [McCallum, 1995] R. Andrew McCallum. Instance-based state identification for reinforcement learning. In Advances of Neural Information Processing Systems (NIPS 7), 1995.
- (1995) Advances of Neural Information Processing Systems (NIPS 7)
- Andrew McCallum, R.¹

15
- 84916521733
- Memory-based reinforcement learning: Efficient computation with prioritized sweeping
- [Moore and Atkeson, 1993] Morgan Kaufmann Publishers, Inc
- [Moore and Atkeson, 1993] Andrew W. Moore and Christopher G. Atkeson. Memory-based reinforcement learning: Efficient computation with prioritized sweeping. In Advances of Neural Information Processing Systems (NIPS 5). Morgan Kaufmann Publishers, Inc., 1993.
- (1993) Advances of Neural Information Processing Systems (NIPS 5)
- Moore, Andrew W.¹ Atkeson, Christopher G.²

16
- 33747997674
- Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces
- [Moore, 1991] pages
- [Moore, 1991] Andrew W. Moore. Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces. Proceedings of the Eighth International Workshop on Machine Learning, pages 333-337,1991.
- (1991) Proceedings of the Eighth International Workshop on Machine Learning , pp. 333-337
- Moore, Andrew W.¹

17
- 0006488247
- The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces
- [Moore, 1993] pages Morgan Kaufmann
- [Moore, 1993] Andrew W. Moore. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces. In Advances of Neural Information Processing Systems (NIPS 6), pages 711-718. Morgan Kaufmann, 1993.
- (1993) Advances of Neural Information Processing Systems (NIPS 6) , pp. 711-718
- Moore, Andrew W.¹

18
- 0012117998
- Efficient learning and planning within the Dyna framework
- [Peng and Williams, 1992]
- [Peng and Williams, 1992] Jing Peng and R. J. Williams. Efficient learning and planning within the Dyna framework. In Proceedings of the Second International Conference on Simulation of Adaptive Behavior: From Animals to Animais, 1992.
- (1992) Proceedings of the Second International Conference on Simulation of Adaptive Behavior: From Animals to Animais
- Peng, Jing¹ Williams, R. J.²

19
- 0003438819
- [Platzman, 1977] PhD thesis, Department of Electrical Engineering and Computer Science, MIT, January
- [Platzman, 1977] Loren Kerry Platzman. Finite Memory Estimation and Control of Finite Probabilistic Systems. PhD thesis, Department of Electrical Engineering and Computer Science, MIT, January 1977.
- (1977) Finite Memory Estimation and Control of Finite Probabilistic Systems
- Platzman, Loren Kerry¹

20
- 85013571397
- Learning probabilistic automata with variable memory length
- [Ron et ai, 1994] Morgan Kaufmann Publishers, Inc
- [Ron et ai, 1994] Dana Ron, Yoram Singer, and Naftali Tishby. Learning probabilistic automata with variable memory length. In Proceedings Computational Learning Theory. Morgan Kaufmann Publishers, Inc., 1994.
- (1994) Proceedings Computational Learning Theory
- Ron, Dana¹ Singer, Yoram² Tishby, Naftali³

21
- 85132026293
- Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- [Sutton, 1990] June
- [Sutton, 1990] Richard S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, June 1990.
- (1990) Proceedings of the Seventh International Conference on Machine Learning
- Sutton, Richard S.¹

22
- 0003362676
- Astro Teller. The evolution of mental models
- [Teller, 1994] Kim Kinnear, editor, chapter 9. MIT Press
- [Teller, 1994] Astro Teller. The evolution of mental models. In Kim Kinnear, editor, Advances in Genetic Programming, chapter 9. MIT Press, 1994.
- (1994) Advances in Genetic Programming

23
- 0003411271
- [Thrun, 1992] Technical Report CMU-CS-92-102, CMU Comp. Sei. Dept., January
- [Thrun, 1992] Sebastian B. Thrun. Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, CMU Comp. Sei. Dept., January 1992.
- (1992) Efficient exploration in reinforcement learning
- Thrun, Sebastian B.¹

24
- 0021700041
- Visual routines
- [Ullman, 1984]
- [Ullman, 1984] Shimon Ullman. Visual routines. Cognition, 18:97-159,1984.
- (1984) Cognition , vol.18 , pp. 97-159
- Ullman, Shimon¹

25
- 0004049893
- [Watkins, 1989] PhD thesis, Cambridge University
- [Watkins, 1989] Chris Watkins. Learning from delayed rewards. PhD thesis, Cambridge University, 1989.
- (1989) Learning from delayed rewards
- Watkins, Chris¹

26
- 0005951145
- Finite-memory suboptimal design for partially observed markov decision processes
- [White and Scherer, 1994]
- [White and Scherer, 1994] Chelsea C. White and William T. Scherer. Finite-memory suboptimal design for partially observed markov decision processes. Operations Research, 42:439-455,1994.
- (1994) Operations Research , vol.42 , pp. 439-455
- White, Chelsea C.¹ Scherer, William T.²

27
- 0002557085
- Learning to perceive and act by trial and error
- [Whitehead and Ballard, 1991] ()
- [Whitehead and Ballard, 1991] Steven D. Whitehead and Dana H. Ballard. Learning to perceive and act by trial and error. Machine Learning, 7(1 ):45-83, 1991.
- (1991) Machine Learning , vol.7 , Issue.1 , pp. 45-83
- Whitehead, Steven D.¹ Ballard, Dana H.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.