메뉴 건너뛰기




Volumn 13, Issue , 2000, Pages 227-303

Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

Author keywords

[No Author keywords available]

Indexed keywords


EID: 0002278788     PISSN: 10769757     EISSN: None     Source Type: Journal    
DOI: 10.1613/jair.639     Document Type: Article
Times cited : (1266)

References (33)
  • 4
    • 0026255231 scopus 로고
    • O-plan: The open planning architecture
    • Currie, K., & Tate, A. (1991). O-plan: The open planning architecture. Artificial Intelligence, 52(1), 49-86.
    • (1991) Artificial Intelligence , vol.52 , Issue.1 , pp. 49-86
    • Currie, K.1    Tate, A.2
  • 5
  • 9
    • 0020177941 scopus 로고
    • Rete: A fast algorithm for the many pattern/many object pattern match problem
    • Forgy, C. L. (1982). Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence, 19(1), 17-37.
    • (1982) Artificial Intelligence , vol.19 , Issue.1 , pp. 17-37
    • Forgy, C.L.1
  • 12
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1185-1201.
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 14
    • 0032045145 scopus 로고    scopus 로고
    • Module based reinforcement learning for a real robot
    • Kalmár, Z., Szepesvári, C., & Lörincz, A. (1998). Module based reinforcement learning for a real robot. Machine Learning, 31, 55-85.
    • (1998) Machine Learning , vol.31 , pp. 55-85
    • Kalmár, Z.1    Szepesvári, C.2    Lörincz, A.3
  • 16
    • 0022045044 scopus 로고
    • Macro-operators: A weak method for learning
    • Korf, R. E. (1985). Macro-operators: A weak method for learning. Artificial Intelligence, 26(1), 35-77.
    • (1985) Artificial Intelligence , vol.26 , Issue.1 , pp. 35-77
    • Korf, R.E.1
  • 17
    • 0003673017 scopus 로고
    • Ph.D. thesis, Carnegie Mellon University, Department of Computer Science, Pittsburgh, PA
    • Lin, L.-J. (1993). Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University, Department of Computer Science, Pittsburgh, PA.
    • (1993) Reinforcement Learning for Robots Using Neural Networks
    • Lin, L.-J.1
  • 21
    • 84898956770 scopus 로고    scopus 로고
    • Reinforcement learning with hierarchies of machines
    • Cambridge, MA. MIT Press
    • Parr, R., & Russell, S. (1998). Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems, Vol. 10, pp. 1043-1049 Cambridge, MA. MIT Press.
    • (1998) Advances in Neural Information Processing Systems , vol.10 , pp. 1043-1049
    • Parr, R.1    Russell, S.2
  • 23
    • 0003636089 scopus 로고
    • Tech. rep. CUED/FINFENG/TR 166, Cambridge University Engineering Department, Cambridge, England
    • Rummery, G. A., & Niranjan, M. (1994). Online Q-learning using connectionist systems. Tech. rep. CUED/FINFENG/TR 166, Cambridge University Engineering Department, Cambridge, England.
    • (1994) Online Q-learning Using Connectionist Systems
    • Rummery, G.A.1    Niranjan, M.2
  • 24
    • 0016069798 scopus 로고
    • Planning in a hierarchy of abstraction spaces
    • Sacerdoti, E. D. (1974). Planning in a hierarchy of abstraction spaces. Artificial Intelligence, 5(2), 115-135.
    • (1974) Artificial Intelligence , vol.5 , Issue.2 , pp. 115-135
    • Sacerdoti, E.D.1
  • 25
    • 0346087506 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • Tech. rep., University of Colorado, Department of Computer Science, Boulder, CO. To appear
    • Singh, S., Jaakkola, T., Littman, M. L., & Szepesvári, C. (1998). Convergence results for single-step on-policy reinforcement-learning algorithms. Tech. rep., University of Colorado, Department of Computer Science, Boulder, CO. To appear in Machine Learning.
    • (1998) Machine Learning
    • Singh, S.1    Jaakkola, T.2    Littman, M.L.3    Szepesvári, C.4
  • 26
    • 0001027894 scopus 로고
    • Transfer of learning by composing solutions of elemental sequential tasks
    • Singh, S. P. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8, 323-339.
    • (1992) Machine Learning , vol.8 , pp. 323-339
    • Singh, S.P.1
  • 29
    • 0003899594 scopus 로고    scopus 로고
    • Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales
    • Tech. rep., University of Massachusetts, Department of Computer and Information Sciences, Amherst, MA. To appear
    • Sutton, R. S., Precup, D., & Singh, S. (1998). Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales. Tech. rep., University of Massachusetts, Department of Computer and Information Sciences, Amherst, MA. To appear in Artificial Intelligence.
    • (1998) Artificial Intelligence
    • Sutton, R.S.1    Precup, D.2    Singh, S.3
  • 31
    • 0028464184 scopus 로고
    • Investigating production system representations for non-combinatorial match
    • Tambe, M., & Rosenbloom, P. S. (1994). Investigating production system representations for non-combinatorial match. Artificial Intelligence, 68(1), 155-199.
    • (1994) Artificial Intelligence , vol.68 , Issue.1 , pp. 155-199
    • Tambe, M.1    Rosenbloom, P.S.2
  • 32
    • 0004049893 scopus 로고
    • Ph.D. thesis, King's College, Oxford. (To be reprinted by MIT Press.)
    • Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Ph.D. thesis, King's College, Oxford. (To be reprinted by MIT Press.).
    • (1989) Learning from Delayed Rewards
    • Watkins, C.J.C.H.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.