메뉴 건너뛰기




Volumn 22, Issue 1-3, 1996, Pages 227-250

The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms

Author keywords

Action models; Action penalty representation; Admissible and consistent heuristics; Complexity; Goal reward representation; Goal directed exploration; On line reinforcement learning; Prior knowledge; Q hat learning; Q learning; Reward structure

Indexed keywords

COMPUTATIONAL COMPLEXITY; HEURISTIC METHODS; KNOWLEDGE REPRESENTATION; LEARNING ALGORITHMS; MATHEMATICAL MODELS; ONLINE SYSTEMS; POLYNOMIALS; STATE SPACE METHODS; TOPOLOGY;

EID: 0029751419     PISSN: 08856125     EISSN: None     Source Type: Journal    
DOI: 10.1007/BF00114729     Document Type: Article
Times cited : (76)

References (22)
  • 1
    • 0029210635 scopus 로고
    • Learning to act using real-time dynamic programming
    • Barto, A. G., S.J. Bradtke, and S. P. Singh (1995) Learning to act using real-time dynamic programming. Artificial Intelligence, 73(1):81-138.
    • (1995) Artificial Intelligence , vol.73 , Issue.1 , pp. 81-138
    • Barto, A.G.1    Bradtke, S.J.2    Singh, S.P.3
  • 2
    • 0003602259 scopus 로고
    • Learning and sequential decision making
    • Department of Computer Science, University of Massachusetts at Amherst
    • Barto,A. G., R.S. Sutton, and C.J. Watkins. (1989) Learning and sequential decision making. Technical Report 89-95. Department of Computer Science, University of Massachusetts at Amherst.
    • (1989) Technical Report , vol.89-95
    • Barto, A.G.1    Sutton, R.S.2    Watkins, C.J.3
  • 3
    • 0003787146 scopus 로고
    • Princeton University Press, Princeton (New Jersey)
    • Bellman, R. (1917) Dynamic Programming. Princeton University Press, Princeton (New Jersey)
    • (1917) Dynamic Programming
    • Bellman, R.1
  • 4
    • 0001854509 scopus 로고
    • Solving time-dependent planning problems
    • Boddy,M. and T. Dean. (1989). Solving time-dependent planning problems. In Proceeding of the IJCAI, pages 970-984.
    • (1989) Proceeding of the IJCAI , pp. 970-984
    • Boddy, M.1    Dean, T.2
  • 6
    • 0029751418 scopus 로고    scopus 로고
    • The loss from imperfect value functions in expectation based and minimax-based tasks
    • Heger,M (1996) The loss from imperfect value functions in expectation based and minimax-based tasks. Machine Learning, pages 197-225.
    • (1996) Machine Learning , pp. 197-225
    • Heger, M.1
  • 8
    • 0026989688 scopus 로고
    • Moving target search with intelligence
    • Ishida,T (1992). Moving target search with intelligence. In Proceedings of the AAAI, pages 525-532.
    • (1992) Proceedings of the AAAI , pp. 525-532
    • Ishida, T.1
  • 13
    • 0008806879 scopus 로고
    • Complexity analysis of real-time reinforcement learning applied to finding shortest paths in deterministic domains
    • School of Computer Science,Carnegie Mellon University
    • Koenig,S. and R.G. Simmons (1992) Complexity analysis of real-time reinforcement learning applied to finding shortest paths in deterministic domains. Technical Report CMU-CS-93-106, School of Computer Science,Carnegie Mellon University.
    • (1992) Technical Report CMU-CS-93-106
    • Koenig, S.1    Simmons, R.G.2
  • 14
    • 0027704767 scopus 로고
    • Complexity analysis of real time reinforcement learning
    • Koenig,S. and R.G. Simmons. (1993). Complexity analysis of real time reinforcement learning. In Proceedings of the AAAI, pages 99-105.
    • (1993) Proceedings of the AAAI , pp. 99-105
    • Koenig, S.1    Simmons, R.G.2
  • 16
    • 27144460841 scopus 로고
    • The effect of representation and knowledge on goal-directed exploration with reinforcement learning algorithms the proofs
    • School of Computer Science, Carnegie Mellon University
    • Koenig,S. and R.G. Simmons. (1995a) The effect of representation and knowledge on goal-directed exploration with reinforcement learning algorithms the proofs Technical Report CMU-CS-45-177, School of Computer Science, Carnegie Mellon University.
    • (1995) Technical Report CMU-CS-45-177
    • Koenig, S.1    Simmons, R.G.2
  • 17
    • 0002038863 scopus 로고
    • Real time search in non-deterministic domains
    • Koenig,S and R. G. Simmons (1995b). Real time search in non-deterministic domains. In Proceedings of the IJCAI, pages 1660-1667.
    • (1995) Proceedings of the IJCAI , pp. 1660-1667
    • Koenig, S.1    Simmons, R.G.2
  • 18
    • 0025400088 scopus 로고
    • Real-time beuristic search
    • Korf, R. F. (1990). Real-time beuristic search. Artificial Intelligence. 42(2-3):189-211.
    • (1990) Artificial Intelligence. , vol.42 , Issue.2-3 , pp. 189-211
    • Korf, R.F.1
  • 19
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning, planning and teaching
    • Lin, L. J. (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8:293-321.
    • (1992) Machine Learning , vol.8 , pp. 293-321
    • Lin, L.J.1
  • 20
    • 0003849946 scopus 로고
    • PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
    • Matarić,M. (1994). Interaction and Intelligent Behavior. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
    • (1994) Interaction and Intelligent Behavior
    • Matarić, M.1
  • 21
    • 0006488247 scopus 로고
    • The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
    • Moore, A.W. and C.G. Atkeson (1993a). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. In Proceeding of the NIPS.
    • (1993) Proceeding of the NIPS.
    • Moore, A.W.1    Atkeson, C.G.2
  • 22
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less time
    • Moore, A.W. Lind C.G. Atkeson. (1993b). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13:103-130.
    • (1993) Machine Learning , vol.13 , pp. 103-130
    • Moore, A.W.1    Lind2    Atkeson, C.G.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.