메뉴 건너뛰기




Volumn 8, Issue 3, 1992, Pages 293-321

Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Author keywords

connectionist networks; planning; Reinforcement learning; teaching

Indexed keywords


EID: 0000123778     PISSN: 08856125     EISSN: 15730565     Source Type: Journal    
DOI: 10.1023/A:1022628806385     Document Type: Article
Times cited : (1533)

References (30)
  • 1
    • 85025865990 scopus 로고    scopus 로고
    • Anderson, C.W. (1987). Strategy learning with multilayer connectionist representations. Proceedings of the Fourth International Workshop on Machine Learning (pp. 103–114).
  • 2
    • 84951530330 scopus 로고    scopus 로고
    • Barto, A.G., Sutton, R.S., & Watkins, C.J.C.H. (1990). Learning and sequential decision making. In: M. Gabriel & J.W. Moore (Eds.), Learning and computational neuroscience. MIT Press.
  • 3
    • 84951530331 scopus 로고    scopus 로고
    • Barto, A.G., Bradtke, S.J., & Singh, S.P. (1991). Real-time learning and control using asynchronous dynamic programming. (Technical Report 91-57). University of Massachusetts, Computer Science Department.
  • 4
    • 84951530332 scopus 로고    scopus 로고
    • Chapman, D. & Kaelbling, L.P. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. Proceedings of IJCAI-91.
  • 7
    • 84951530333 scopus 로고    scopus 로고
    • Hinton, G.E., McClelland, J.L., & Rumelhart, D.E. (1986). Distributed representations. Parallel distributed processing: Explorations in the microstructure of cognition, Vol. 1, Bradford Books/MIT Press.
  • 9
    • 84951530334 scopus 로고    scopus 로고
    • Kaelbling, L.P. (1990). Learning in embedded systems. Ph.D. Thesis, Department of Computer Science, Stanford University.
  • 10
    • 84951530335 scopus 로고    scopus 로고
    • Lang, K.J. (1989). A time-delay neural network architecture for speech recognition. Ph.D. Thesis, School of Computer Science, Carnegie Mellon University.
  • 11
    • 84951530336 scopus 로고    scopus 로고
    • Lin, Long-Ji. (1991a). Self-improving reactive agents: Case studies of reinforcement learning frameworks. Proceedings of the First International Conference on Simulation of Adaptive Behavior: From Animals to Animats (pp. 297–305). Also Technical Report CMU-CS-90-109, Carnegie Mellon University.
  • 12
    • 85025855831 scopus 로고    scopus 로고
    • Lin, Long-Ji. (1991b). Self-improvement based on reinforcement learning, planning and teaching. Proceedings of the Eighth International Workshop on Machine Learning (pp. 323–327).
  • 13
    • 84951530338 scopus 로고    scopus 로고
    • Lin, Long-Ji. (1991c). Programming robots using reinforcement learning and teaching. Proceedings of AAAI-91 (pp. 781–786).
  • 14
    • 84951530339 scopus 로고    scopus 로고
    • Mahadevan, S. & Connell, J. (1991). Scaling reinforcement learning to robotics by exploiting the subsumption architecture. Proceedings of the Eighth International Workshop on Machine Learning (pp. 328–332).
  • 16
    • 84951530340 scopus 로고    scopus 로고
    • Moore, A.W. (1991). Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces. Proceedings of the Eighth International Workshop on Machine Learning (pp. 333–337).
  • 18
    • 84951530341 scopus 로고    scopus 로고
    • Pomerleau, D.A. (1989). ALVINN: An autonomous land vehicle in a neural network (Technical Report CMU-CS-89-107). Carnegie Mellon University.
  • 20
    • 84951530342 scopus 로고    scopus 로고
    • Sutton, R.S. (1984). Temporal credit assignment in reinforcement learning. Ph.D. Thesis, Dept. of Computer and Information Science, University of Massachusetts.
  • 22
    • 85025864537 scopus 로고    scopus 로고
    • Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Workshop on Machine Learning (pp. 216–224).
  • 23
    • 84951530344 scopus 로고    scopus 로고
    • Tan, Ming. (1991). Learning a cost-sensitive internal representation for reinforcement learning. Proceedings of the Eighth International Workshop on Machine Learning (pp. 358–362).
  • 24
    • 85025874247 scopus 로고    scopus 로고
    • Thrun, S.B., Möller, K., & Linden, A. (1991). Planning with an adaptive world model. In D.S. Touretzky (Ed.), Advances in neural information processing systems 3, Morgan Kaufmann.
  • 25
    • 84951530346 scopus 로고    scopus 로고
    • Thrun, S.B. & Möller, K. (1992). Active exploration in dynamic environments. To appear in D.S. Touretzky (Ed.), Advances in neural information processing systems 4, Morgan Kaufmann.
  • 26
    • 84951530347 scopus 로고    scopus 로고
    • Watkins, C.J.C.H. (1989). Learning from delayed rewards. Ph.D. Thesis, King's College, Cambridge.
  • 28
    • 85025876171 scopus 로고    scopus 로고
    • Whitehead, S.D. & Ballard, D.H. (1989). A role for anticipation in reactive systems that learn. Proceedings of the Sixth International Workshop on Machine Learning (pp. 354–357).
  • 30
    • 85025856603 scopus 로고    scopus 로고
    • Whitehead, S.D. (1991b). Complexity and cooperation in Q-learning. Proceedings of the Eighth International Workshop on Machine Learning (pp. 363–367).


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.