SCOPUS 정보 검색 플랫폼

Journal of Artificial Intelligence Research

Volumn 13, Issue , 2000, Pages 227-303

Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

(1) Dietterich, Thomas G a

a OREGON STATE UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

EID: 0002278788 PISSN: 10769757 EISSN: None Source Type: Journal
DOI: 10.1613/jair.639 Document Type: Article

Times cited : (1266)

References (33)

1
- 0003787146
- Princeton University Press
- Bellman, R. E. (1957). Dynamic Programming. Princeton University Press.
- (1957) Dynamic Programming
- Bellman, R.E.¹

2
- 0003487482
- Athena Scientific, Belmont, MA
- Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

3
- 85166207010
- Exploiting structure in policy construction
- Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting structure in policy construction. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1104-1111.
- (1995) Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence , pp. 1104-1111
- Boutilier, C.¹ Dearden, R.² Goldszmidt, M.³

4
- 0026255231
- O-plan: The open planning architecture
- Currie, K., & Tate, A. (1991). O-plan: The open planning architecture. Artificial Intelligence, 52(1), 49-86.
- (1991) Artificial Intelligence , vol.52 , Issue.1 , pp. 49-86
- Currie, K.¹ Tate, A.²

5
- 0001234682
- Feudal reinforcement learning
- Morgan Kaufmann, San Francisco, CA
- Dayan, P., & Hinton, G. (1993). Feudal reinforcement learning. In Advances in Neural Information Processing Systems, 5, pp. 271-278. Morgan Kaufmann, San Francisco, CA.
- (1993) Advances in Neural Information Processing Systems , vol.5 , pp. 271-278
- Dayan, P.¹ Hinton, G.²

6
- 0006424007
- Tech. rep. CS-95-10, Department of Computer Science, Brown University, Providence, Rhode Island
- Dean, T., & Lin, S.-H. (1995). Decomposition techniques for planning in stochastic domains. Tech. rep. CS-95-10, Department of Computer Science, Brown University, Providence, Rhode Island.
- (1995) Decomposition Techniques for Planning in Stochastic Domains
- Dean, T.¹ Lin, S.-H.²

7
- 0001806701
- The MAXQ method for hierarchical reinforcement learning
- Morgan Kaufmann
- Dietterich, T. G. (1998). The MAXQ method for hierarchical reinforcement learning. In Fifteenth International Conference on Machine Learning, pp. 118-126. Morgan Kaufmann.
- (1998) Fifteenth International Conference on Machine Learning , pp. 118-126
- Dietterich, T.G.¹

8
- 0015440625
- Learning and executing generalized robot plans
- Fikes, R. E., Hart, P. E., & Nilsson, N. J. (1972). Learning and executing generalized robot plans. Artificial Intelligence, 3, 251-288.
- (1972) Artificial Intelligence , vol.3 , pp. 251-288
- Fikes, R.E.¹ Hart, P.E.² Nilsson, N.J.³

9
- 0020177941
- Rete: A fast algorithm for the many pattern/many object pattern match problem
- Forgy, C. L. (1982). Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence, 19(1), 17-37.
- (1982) Artificial Intelligence , vol.19 , Issue.1 , pp. 17-37
- Forgy, C.L.¹

10
- 0006419533
- Hierarchical solution of Markov decision processes using macro-actions
- San Francisco, CA. Morgan Kaufmann Publishers
- Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Hierarchical solution of Markov decision processes using macro-actions. In Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-98), pp. 220-229 San Francisco, CA. Morgan Kaufmann Publishers.
- (1998) Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-98) , pp. 220-229
- Hauskrecht, M.¹ Meuleau, N.² Kaelbling, L.P.³ Dean, T.⁴ Boutilier, C.⁵

11
- 0003644124
- MIT Press, Cambridge, MA
- Howard, R. A. (1960). Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA.
- (1960) Dynamic Programming and Markov Processes
- Howard, R.A.¹

12
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1185-1201.
- (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

13
- 85143168613
- Hierarchical reinforcement learning: Preliminary results
- San Francisco, CA. Morgan Kaufmann
- Kaelbling, L. P. (1993). Hierarchical reinforcement learning: Preliminary results. In Proceedings of the Tenth International Conference on Machine Learning, pp. 167-173 San Francisco, CA. Morgan Kaufmann.
- (1993) Proceedings of the Tenth International Conference on Machine Learning , pp. 167-173
- Kaelbling, L.P.¹

14
- 0032045145
- Module based reinforcement learning for a real robot
- Kalmár, Z., Szepesvári, C., & Lörincz, A. (1998). Module based reinforcement learning for a real robot. Machine Learning, 31, 55-85.
- (1998) Machine Learning , vol.31 , pp. 55-85
- Kalmár, Z.¹ Szepesvári, C.² Lörincz, A.³

15
- 85109195641
- Learning abstraction hierarchies for problem solving
- Boston, MA. AAAI Press
- Knoblock, C. A. (1990). Learning abstraction hierarchies for problem solving. In Proceedings of the Eighth National Conference on Artificial Intelligence, pp. 923-928 Boston, MA. AAAI Press.
- (1990) Proceedings of the Eighth National Conference on Artificial Intelligence , pp. 923-928
- Knoblock, C.A.¹

16
- 0022045044
- Macro-operators: A weak method for learning
- Korf, R. E. (1985). Macro-operators: A weak method for learning. Artificial Intelligence, 26(1), 35-77.
- (1985) Artificial Intelligence , vol.26 , Issue.1 , pp. 35-77
- Korf, R.E.¹

17
- 0003673017
- Ph.D. thesis, Carnegie Mellon University, Department of Computer Science, Pittsburgh, PA
- Lin, L.-J. (1993). Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University, Department of Computer Science, Pittsburgh, PA.
- (1993) Reinforcement Learning for Robots Using Neural Networks
- Lin, L.-J.¹

18
- 84880688141
- Multi-value-functions: Efficient automatic action hierarchies for multiple goal MDPs
- San Francisco. Morgan Kaufmann
- Moore, A. W., Baird, L., & Kaelbling, L. P. (1999). Multi-value-functions: Efficient automatic action hierarchies for multiple goal MDPs. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1316-1323 San Francisco. Morgan Kaufmann.
- (1999) Proceedings of the International Joint Conference on Artificial Intelligence , pp. 1316-1323
- Moore, A.W.¹ Baird, L.² Kaelbling, L.P.³

19
- 0346738900
- Flexible decomposition algorithms for weakly coupled Markov decision problems
- San Francisco, CA. Morgan Kaufmann Publishers
- Parr, R. (1998a). Flexible decomposition algorithms for weakly coupled Markov decision problems. In Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-98), pp. 422-430 San Francisco, CA. Morgan Kaufmann Publishers.
- (1998) Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-98) , pp. 422-430
- Parr, R.¹

20
- 0003989214
- Ph.D. thesis, University of California, Berkeley, California
- Parr, R. (1998b). Hierarchical control and learning for Markov decision processes. Ph.D. thesis, University of California, Berkeley, California.
- (1998) Hierarchical Control and Learning for Markov Decision Processes
- Parr, R.¹

21
- 84898956770
- Reinforcement learning with hierarchies of machines
- Cambridge, MA. MIT Press
- Parr, R., & Russell, S. (1998). Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems, Vol. 10, pp. 1043-1049 Cambridge, MA. MIT Press.
- (1998) Advances in Neural Information Processing Systems , vol.10 , pp. 1043-1049
- Parr, R.¹ Russell, S.²

22
- 0003391330
- Morgan Kaufmann, San Mateo, CA
- Pearl, J. (1988). Probabilistic Inference in Intelligent Systems. Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA.
- (1988) Probabilistic Inference in Intelligent Systems. Networks of Plausible Inference
- Pearl, J.¹

23
- 0003636089
- Tech. rep. CUED/FINFENG/TR 166, Cambridge University Engineering Department, Cambridge, England
- Rummery, G. A., & Niranjan, M. (1994). Online Q-learning using connectionist systems. Tech. rep. CUED/FINFENG/TR 166, Cambridge University Engineering Department, Cambridge, England.
- (1994) Online Q-learning Using Connectionist Systems
- Rummery, G.A.¹ Niranjan, M.²

24
- 0016069798
- Planning in a hierarchy of abstraction spaces
- Sacerdoti, E. D. (1974). Planning in a hierarchy of abstraction spaces. Artificial Intelligence, 5(2), 115-135.
- (1974) Artificial Intelligence , vol.5 , Issue.2 , pp. 115-135
- Sacerdoti, E.D.¹

25
- 0346087506
- Convergence results for single-step on-policy reinforcement-learning algorithms
- Tech. rep., University of Colorado, Department of Computer Science, Boulder, CO. To appear
- Singh, S., Jaakkola, T., Littman, M. L., & Szepesvári, C. (1998). Convergence results for single-step on-policy reinforcement-learning algorithms. Tech. rep., University of Colorado, Department of Computer Science, Boulder, CO. To appear in Machine Learning.
- (1998) Machine Learning
- Singh, S.¹ Jaakkola, T.² Littman, M.L.³ Szepesvári, C.⁴

26
- 0001027894
- Transfer of learning by composing solutions of elemental sequential tasks
- Singh, S. P. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8, 323-339.
- (1992) Machine Learning , vol.8 , pp. 323-339
- Singh, S.P.¹

27
- 0000672258
- Improved switching among temporally abstract actions
- MIT Press
- Sutton, R. S., Singh, S., Precup, D., & Ravindran, B. (1999). Improved switching among temporally abstract actions. In Advances in Neural Information Processing Systems, Vol. 11, pp. 1066-1072. MIT Press.
- (1999) Advances in Neural Information Processing Systems , vol.11 , pp. 1066-1072
- Sutton, R.S.¹ Singh, S.² Precup, D.³ Ravindran, B.⁴

28
- 0003420416
- MIT Press, Cambridge, MA
- Sutton, R., & Barto, A. G. (1998). Introduction to Reinforcement Learning. MIT Press, Cambridge, MA.
- (1998) Introduction to Reinforcement Learning
- Sutton, R.¹ Barto, A.G.²

29
- 0003899594
- Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales
- Tech. rep., University of Massachusetts, Department of Computer and Information Sciences, Amherst, MA. To appear
- Sutton, R. S., Precup, D., & Singh, S. (1998). Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales. Tech. rep., University of Massachusetts, Department of Computer and Information Sciences, Amherst, MA. To appear in Artificial Intelligence.
- (1998) Artificial Intelligence
- Sutton, R.S.¹ Precup, D.² Singh, S.³

30
- 0038145105
- Hierarchical explanation-based reinforcement learning
- San Francisco, CA. Morgan Kaufmann
- Tadepalli, P., & Dietterich, T. G. (1997). Hierarchical explanation-based reinforcement learning. In Proceedings of the Fourteenth International Conference on Machine Learning, pp. 358-366 San Francisco, CA. Morgan Kaufmann.
- (1997) Proceedings of the Fourteenth International Conference on Machine Learning , pp. 358-366
- Tadepalli, P.¹ Dietterich, T.G.²

31
- 0028464184
- Investigating production system representations for non-combinatorial match
- Tambe, M., & Rosenbloom, P. S. (1994). Investigating production system representations for non-combinatorial match. Artificial Intelligence, 68(1), 155-199.
- (1994) Artificial Intelligence , vol.68 , Issue.1 , pp. 155-199
- Tambe, M.¹ Rosenbloom, P.S.²

32
- 0004049893
- Ph.D. thesis, King's College, Oxford. (To be reprinted by MIT Press.)
- Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Ph.D. thesis, King's College, Oxford. (To be reprinted by MIT Press.).
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

33
- 34249833101
- Technical note Q-Learning
- Watkins, C. J., & Dayan, P. (1992). Technical note Q-Learning. Machine Learning, 8, 279.
- (1992) Machine Learning , vol.8 , pp. 279
- Watkins, C.J.¹ Dayan, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.