메뉴 건너뛰기




Volumn 112, Issue 1, 1999, Pages 181-211

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION THEORY; COMPUTER SYSTEMS PROGRAMMING; DECISION THEORY; KNOWLEDGE REPRESENTATION; MARKOV PROCESSES; OPTIMIZATION; THEOREM PROVING;

EID: 0033170372     PISSN: 00043702     EISSN: None     Source Type: Journal    
DOI: 10.1016/S0004-3702(99)00052-1     Document Type: Article
Times cited : (3270)

References (82)
  • 2
    • 0030149709 scopus 로고    scopus 로고
    • Purposive behavior acquisition for a real robot by vision-based reinforcement learning
    • M. Asada, S. Noda, S. Tawaratsumida, K. Hosada, Purposive behavior acquisition for a real robot by vision-based reinforcement learning, Machine Learning 23 (1996) 279-303.
    • (1996) Machine Learning , vol.23 , pp. 279-303
    • Asada, M.1    Noda, S.2    Tawaratsumida, S.3    Hosada, K.4
  • 4
    • 84880685295 scopus 로고    scopus 로고
    • Prioritized goal decomposition of markov decision processes: Toward a synthesis of classical and decision theoretic planning
    • Nagoya, Japan
    • C. Boutilier, R.I. Brafman, C. Geib, Prioritized goal decomposition of markov decision processes: Toward a synthesis of classical and decision theoretic planning, in: Proc. IJCAI-97, Nagoya, Japan, 1997, pp. 1162-1165.
    • (1997) Proc. IJCAI-97 , pp. 1162-1165
    • Boutilier, C.1    Brafman, R.I.2    Geib, C.3
  • 5
    • 85150714688 scopus 로고
    • Reinforcement learning methods for continuous-time markov decision problems
    • MIT Press, Cambridge, MA
    • S.J. Bradtke, M.O. Duff, Reinforcement learning methods for continuous-time markov decision problems, in: Advances in Neural Information Processing Systems 7, MIT Press, Cambridge, MA, 1995, pp. 393-400.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 393-400
    • Bradtke, S.J.1    Duff, M.O.2
  • 6
    • 0031185898 scopus 로고    scopus 로고
    • Modeling agents as qualitative decision makers
    • R.I. Brafman, M. Tennenholtz, Modeling agents as qualitative decision makers, Artificial Intelligence 94 (1) (1997) 217-268.
    • (1997) Artificial Intelligence , vol.94 , Issue.1 , pp. 217-268
    • Brafman, R.I.1    Tennenholtz, M.2
  • 8
    • 0006493602 scopus 로고
    • Reasoning about probabilistic actions at multiple levels of granularity
    • Stanford University
    • L. Chrisman, Reasoning about probabilistic actions at multiple levels of granularity, in: Proc. AAAI Spring Symposium: Decision-Theoretic Planning, Stanford University, 1994.
    • (1994) Proc. AAAI Spring Symposium: Decision-Theoretic Planning,
    • Chrisman, L.1
  • 9
  • 10
    • 85156187730 scopus 로고    scopus 로고
    • Improving elevator performance using reinforcement learning
    • MIT Press, Cambridge, MA
    • R.H. Crites, A.G. Barto, Improving elevator performance using reinforcement learning, in: Advances in Neural Information Processing Systems 8, MIT Press, Cambridge, MA, 1996, pp. 1017-1023.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1017-1023
    • Crites, R.H.1    Barto, A.G.2
  • 11
    • 0001158047 scopus 로고
    • Improving generalization for temporal difference learning: The successor representation
    • P. Dayan, Improving generalization for temporal difference learning: The successor representation, Neural Computation 5 (1993) 613-624.
    • (1993) Neural Computation , vol.5 , pp. 613-624
    • Dayan, P.1
  • 14
    • 85168151397 scopus 로고
    • Decomposition techniques for planning in stochastic domains
    • Montreal, Quebec, Morgan Kaufmann, San Mateo, CA, See also Technical Report CS-95-10, Brown University, Department of Computer Science, 1995
    • T. Dean, S.-H. Lin, Decomposition techniques for planning in stochastic domains, in: Proc. IJCAI-95, Montreal, Quebec, Morgan Kaufmann, San Mateo, CA, 1995, pp. 1121-1127. See also Technical Report CS-95-10, Brown University, Department of Computer Science, 1995.
    • (1995) Proc. IJCAI-95 , pp. 1121-1127
    • Dean, T.1    Lin, S.-H.2
  • 15
    • 0028317777 scopus 로고
    • Learning to plan in continuous domains
    • G.F. DeJong, Learning to plan in continuous domains, Artificial Intelligence 65 (1994) 71-141.
    • (1994) Artificial Intelligence , vol.65 , pp. 71-141
    • DeJong, G.F.1
  • 16
    • 0001806701 scopus 로고    scopus 로고
    • The MAXQ method for hierarchical reinforcement learning
    • Morgan Kaufmann, San Mateo, CA
    • T.G. Dietterich, The MAXQ method for hierarchical reinforcement learning, in: Machine Learning: Proc. 15th International Conference, Morgan Kaufmann, San Mateo, CA, 1998, pp. 118-126.
    • (1998) Machine Learning: Proc. 15th International Conference , pp. 118-126
    • Dietterich, T.G.1
  • 17
    • 0028739953 scopus 로고
    • Robot shaping: Developing autonomous agents through learning
    • M. Dorigo, M. Colombetti, Robot shaping: Developing autonomous agents through learning, Artificial Intelligence 71 (1994) 321-370.
    • (1994) Artificial Intelligence , vol.71 , pp. 321-370
    • Dorigo, M.1    Colombetti, M.2
  • 19
    • 26844577989 scopus 로고    scopus 로고
    • Composing functions to speed up reinforcement learning in a changing world
    • Springer, Berlin
    • C. Drummond, Composing functions to speed up reinforcement learning in a changing world, in: Proc. 10th European Conference on Machine Learning, Springer, Berlin, 1998.
    • (1998) Proc. 10th European Conference on Machine Learning
    • Drummond, C.1
  • 20
    • 85158051593 scopus 로고
    • Why PRODIGY/EBL works
    • Boston, MA, MIT Press, Cambridge, MA
    • O. Etzioni, Why PRODIGY/EBL works, in: Proc. AAAI-90, Boston, MA, MIT Press, Cambridge, MA, 1990, pp. 916-922.
    • (1990) Proc. AAAI-90 , pp. 916-922
    • Etzioni, O.1
  • 23
    • 0030389008 scopus 로고    scopus 로고
    • A statistical approach to adaptive problem solving
    • J. Gratch, G. DeJong, A statistical approach to adaptive problem solving, Artificial Intelligence 88 (1-2) (1996) 101-161.
    • (1996) Artificial Intelligence , vol.88 , Issue.1-2 , pp. 101-161
    • Gratch, J.1    DeJong, G.2
  • 24
    • 0026961480 scopus 로고
    • A statistical approach to solving the EBL utility problem
    • San Jose, CA
    • R. Greiner, I. Jurisica, A statistical approach to solving the EBL utility problem, in: Proc. AAAI-92, San Jose, CA, 1992, pp. 241-248.
    • (1992) Proc. AAAI-92 , pp. 241-248
    • Greiner, R.1    Jurisica, I.2
  • 28
    • 0031343489 scopus 로고    scopus 로고
    • A feedback control structure for on-line learning tasks
    • M. Huber, R.A. Grupen, A feedback control structure for on-line learning tasks, Robotics and Autonomous Systems 22 (3-4) (1997) 303-315.
    • (1997) Robotics and Autonomous Systems , vol.22 , Issue.3-4 , pp. 303-315
    • Huber, M.1    Grupen, R.A.2
  • 29
    • 0000148778 scopus 로고
    • A heuristic approach to the discovery of macro-operators
    • G.A. Iba, A heuristic approach to the discovery of macro-operators, Machine Learning 3 (1989) 285-317.
    • (1989) Machine Learning , vol.3 , pp. 285-317
    • Iba, G.A.1
  • 30
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • T. Jaakkola, M.I. Jordan, S. Singh, On the convergence of stochastic iterative dynamic programming algorithms, Neural Computation 6 (6) (1994) 1185-1201.
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.3
  • 31
    • 85143168613 scopus 로고
    • Hierarchical learning in stochastic domains: Preliminary results
    • Morgan Kaufmann, San Mateo, CA
    • L.P. Kaelbling, Hierarchical learning in stochastic domains: Preliminary results, in: Proc. 10th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993, pp. 167-173.
    • (1993) Proc. 10th International Conference on Machine Learning , pp. 167-173
    • Kaelbling, L.P.1
  • 32
    • 0032045145 scopus 로고    scopus 로고
    • Module based reinforcement learning: Experiments with a real robot
    • and Autonomous Robots 5 (1998) 273-295 (special joint issue)
    • Zs. Kalmár, Cs. Szepesvári, A. Lörincz, Module based reinforcement learning: Experiments with a real robot, Machine Learning 31 (1998) 55-85 and Autonomous Robots 5 (1998) 273-295 (special joint issue).
    • (1998) Machine Learning , vol.31 , pp. 55-85
    • Kalmár, Zs.1    Szepesvári, Cs.2    Lörincz, A.3
  • 33
    • 0021577685 scopus 로고
    • A qualitative physics based on confluences
    • J. de Kleer, J.S. Brown, A qualitative physics based on confluences, Artificial Intelligence 24 (1-3) (1984) 7-83.
    • (1984) Artificial Intelligence , vol.24 , Issue.1-3 , pp. 7-83
    • De Kleer, J.1    Brown, J.S.2
  • 35
    • 0026961481 scopus 로고
    • Automatic programming of robots using genetic programming
    • San Jose, CA
    • J.R. Koza, J.P. Rice, Automatic programming of robots using genetic programming, in: Proc. AAAI-92, San Jose, CA, 1992, pp. 194-201.
    • (1992) Proc. AAAI-92 , pp. 194-201
    • Koza, J.R.1    Rice, J.P.2
  • 36
    • 0006502449 scopus 로고
    • Commonsense knowledge of space: Learning from experience
    • Tokyo, Japan
    • B.J. Kuipers, Commonsense knowledge of space: Learning from experience, in: Proc. IJCAI-79, Tokyo, Japan, 1979, pp. 499-501.
    • (1979) Proc. IJCAI-79 , pp. 499-501
    • Kuipers, B.J.1
  • 37
    • 0002982589 scopus 로고
    • Chunking in SOAR: The anatomy of a general learning mechanism
    • J.E. Laird, P.S. Rosenbloom, A. Newell, Chunking in SOAR: The anatomy of a general learning mechanism, Machine Learning 1 (1986) 11-46.
    • (1986) Machine Learning , vol.1 , pp. 11-46
    • Laird, J.E.1    Rosenbloom, P.S.2    Newell, A.3
  • 38
    • 0003673017 scopus 로고
    • Reinforcement learning for robots using neural networks
    • Ph.D. Thesis, Carnegie Mellon University
    • L.-J. Lin, Reinforcement learning for robots using neural networks, Ph.D. Thesis, Carnegie Mellon University, Technical Report CMU-CS-93-103, 1993.
    • (1993) Technical Report CMU-CS-93-103
    • Lin, L.-J.1
  • 39
    • 84976813028 scopus 로고
    • Learning to coordinate behaviors
    • Boston, MA
    • P. Maes, R. Brooks, Learning to coordinate behaviors, in: Proc. AAAI-90, Boston, MA, 1990, pp. 796-802.
    • (1990) Proc. AAAI-90 , pp. 796-802
    • Maes, P.1    Brooks, R.2
  • 40
    • 0026880130 scopus 로고
    • Automatic programming of behavior-based robots using reinforcement learning
    • S. Mahadevan, J. Connell, Automatic programming of behavior-based robots using reinforcement learning, Artificial Intelligence 55 (2-3) (1992) 311-365.
    • (1992) Artificial Intelligence , vol.55 , Issue.2-3 , pp. 311-365
    • Mahadevan, S.1    Connell, J.2
  • 42
    • 84898959706 scopus 로고    scopus 로고
    • Reinforcement learning for call admission control in routing in integrated service networks
    • Morgan Kaufmann, San Mateo, CA
    • P. Marbach, O. Mihatsch, M. Schulte, J.N. Tsitsiklis, Reinforcement learning for call admission control in routing in integrated service networks, in: Advances in Neural Information Processing Systems 10, Morgan Kaufmann, San Mateo, CA, 1998, pp. 922-928.
    • (1998) Advances in Neural Information Processing Systems , vol.10 , pp. 922-928
    • Marbach, P.1    Mihatsch, O.2    Schulte, M.3    Tsitsiklis, J.N.4
  • 43
    • 0031504223 scopus 로고    scopus 로고
    • Behavior-based control: Examples from navigation, learning, and group behavior
    • M.J. Mataric, Behavior-based control: Examples from navigation, learning, and group behavior, J. Experiment. Theoret. Artificial Intelligence 9 (2-3) (1997) 323-336.
    • (1997) J. Experiment. Theoret. Artificial Intelligence , vol.9 , Issue.2-3 , pp. 323-336
    • Mataric, M.J.1
  • 44
    • 0003543129 scopus 로고    scopus 로고
    • Macro-actions in reinforcement learning: An empirical analysis
    • University of Massachusetts, Department of Computer Science
    • A. McGovern, R.S. Sutton, Macro-actions in reinforcement learning: An empirical analysis, Technical Report 98-70, University of Massachusetts, Department of Computer Science, 1998.
    • (1998) Technical Report 98-70
    • McGovern, A.1    Sutton, R.S.2
  • 47
    • 0025398889 scopus 로고
    • Quantitative results concerning the utilty of explanation-based learning
    • S. Minton, Quantitative results concerning the utilty of explanation-based learning, Artificial Intelligence 42 (2-3) (1990) 363-391.
    • (1990) Artificial Intelligence , vol.42 , Issue.2-3 , pp. 363-391
    • Minton, S.1
  • 48
    • 0006488247 scopus 로고
    • The parti-game algorithm for variable resolution reinforcement learning in multidimensional spaces
    • MIT Press, Cambridge, MA
    • A.W. Moore, The parti-game algorithm for variable resolution reinforcement learning in multidimensional spaces, in: Advances in Neural Information Processing Systems 6, MIT Press, Cambridge, MA, 1994, pp. 711-718.
    • (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 711-718
    • Moore, A.W.1
  • 50
    • 84899001559 scopus 로고    scopus 로고
    • A Q-learning based dynamic channel assignment technique for mobile communication systems
    • to appear
    • J. Nie, S. Haykin, A Q-learning based dynamic channel assignment technique for mobile communication systems, IEEE Transactions on Vehicular Technology, to appear.
    • IEEE Transactions on Vehicular Technology
    • Nie, J.1    Haykin, S.2
  • 51
    • 0027652475 scopus 로고
    • Teleo-reactive programs for agent control
    • N. Nilsson, Teleo-reactive programs for agent control, J. Artificial Intelligence Res. 1 (1994) 139-158.
    • (1994) J. Artificial Intelligence Res. , vol.1 , pp. 139-158
    • Nilsson, N.1
  • 53
    • 84898956770 scopus 로고    scopus 로고
    • Reinforcement learning with hierarchies of machines
    • MIT Press, Cambridge, MA
    • R. Parr, S. Russell, Reinforcement learning with hierarchies of machines, in: Advances in Neural Information Procesing Systems 10, MIT Press, Cambridge, MA, 1998, pp. 1043-1049.
    • (1998) Advances in Neural Information Procesing Systems , vol.10 , pp. 1043-1049
    • Parr, R.1    Russell, S.2
  • 55
    • 84899003140 scopus 로고    scopus 로고
    • Multi-time models for temporally abstract planning
    • MIT Press, Cambridge, MA
    • D. Precup, R.S. Sutton, Multi-time models for temporally abstract planning, in: Advances in Neural Information Processing Systems 10, MIT Press, Cambridge, MA, 1998, pp. 1050-1056.
    • (1998) Advances in Neural Information Processing Systems , vol.10 , pp. 1050-1056
    • Precup, D.1    Sutton, R.S.2
  • 59
    • 10844252596 scopus 로고
    • Incremental development of complex behaviors through automatic construction of sensory-motor hierarchies
    • Morgan Kaufmann, San Mateo, CA
    • M. Ring, Incremental development of complex behaviors through automatic construction of sensory-motor hierarchies, in: Proc. 8th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1991, pp. 343-347.
    • (1991) Proc. 8th International Conference on Machine Learning , pp. 343-347
    • Ring, M.1
  • 60
    • 0016069798 scopus 로고
    • Planning in ahierarchy of abstraction spaces
    • E.D. Sacerdoti, Planning in ahierarchy of abstraction spaces, Artificial Intelligence 5 (1974) 115-135.
    • (1974) Artificial Intelligence , vol.5 , pp. 115-135
    • Sacerdoti, E.D.1
  • 62
    • 0030145238 scopus 로고    scopus 로고
    • Qualitative system identification: Deriving structure from behavior
    • A.C.C. Say, S. Kuru, Qualitative system identification: Deriving structure from behavior, Artificial Intelligence 83 (1) (1996) 75-141.
    • (1996) Artificial Intelligence , vol.83 , Issue.1 , pp. 75-141
    • Say, A.C.C.1    Kuru, S.2
  • 63
    • 0006459160 scopus 로고
    • Technische Universität München, TR FKI-148-91
    • J. Schmidhuber, Neural Sequence Chunkers, Technische Universität München, TR FKI-148-91, 1991.
    • (1991) Neural Sequence Chunkers
    • Schmidhuber, J.1
  • 64
    • 0005610003 scopus 로고
    • Probabilistic robot navigation in partially observable environments
    • Montreal, Quebec, Morgan Kaufmann, San Mateo, CA
    • R. Simmons, S. Koenig, Probabilistic robot navigation in partially observable environments, in: Proc. IJCAI-95, Montreal, Quebec, Morgan Kaufmann, San Mateo, CA, 1995, pp. 1080-1087.
    • (1995) Proc. IJCAI-95 , pp. 1080-1087
    • Simmons, R.1    Koenig, S.2
  • 65
    • 0026962175 scopus 로고
    • Reinforcement learning with ahierarchy of abstract models
    • San Jose, CA, MIT/AAAI Press, Cambridge, MA
    • S.P. Singh, Reinforcement learning with ahierarchy of abstract models, in: Proc. AAAI-92, San Jose, CA, MIT/AAAI Press, Cambridge, MA, 1992, pp. 202-207.
    • (1992) Proc. AAAI-92 , pp. 202-207
    • Singh, S.P.1
  • 66
    • 0002876837 scopus 로고
    • Scaling reinforcement learning by learning variable temporal resolution models
    • Morgan Kaufmann, San Mateo, CA
    • S.P. Singh, Scaling reinforcement learning by learning variable temporal resolution models, in: Proc. 9th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1992, pp. 406-415.
    • (1992) Proc. 9th International Conference on Machine Learning , pp. 406-415
    • Singh, S.P.1
  • 67
    • 0001652790 scopus 로고
    • The efficient learning of multiple task sequences
    • Morgan Kaufmann, San Mateo, CA
    • S.P. Singh, The efficient learning of multiple task sequences, in: Advances in Neural Information Processing Systems 4, Morgan Kaufmann, San Mateo, CA, 1992, pp. 251-258.
    • (1992) Advances in Neural Information Processing Systems , vol.4 , pp. 251-258
    • Singh, S.P.1
  • 68
    • 0001027894 scopus 로고
    • Transfer of learning by composing solutions of elemental sequential tasks
    • S.P. Singh, Transfer of learning by composing solutions of elemental sequential tasks, Machine Learning 8 (3/4) (1992) 323-340.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 323-340
    • Singh, S.P.1
  • 70
    • 84898972974 scopus 로고    scopus 로고
    • Reinforcement learning for dynamic channel allocation in cellular telephone systems
    • MIT Press, Cambridge, MA
    • S.P. Singh, D. Bertsekas, Reinforcement learning for dynamic channel allocation in cellular telephone systems, in: Advances in Neural Information Processing Systems 9, MIT Press, Cambridge, MA, 1997, pp. 974-980.
    • (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 974-980
    • Singh, S.P.1    Bertsekas, D.2
  • 71
    • 84922015064 scopus 로고
    • TD models: Modeling the world at a mixture of time scales
    • Morgan Kaufmann, San Mateo, CA
    • R.S. Sutton, TD models: Modeling the world at a mixture of time scales, in: Proc. 12th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1995, pp. 531-539.
    • (1995) Proc. 12th International Conference on Machine Learning , pp. 531-539
    • Sutton, R.S.1
  • 76
    • 0000797959 scopus 로고
    • The problem of expensive chunks and its solution by restricting expressiveness
    • M. Tambe, A. Newell, P. Rosenbloom, The problem of expensive chunks and its solution by restricting expressiveness, Machine Learning 5 (3) (1990) 299-348.
    • (1990) Machine Learning , vol.5 , Issue.3 , pp. 299-348
    • Tambe, M.1    Newell, A.2    Rosenbloom, P.3
  • 77
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • G.J. Tesauro, Temporal difference learning and TD-Gammon, Comm. ACM 38 (1995) 58-68.
    • (1995) Comm. ACM , vol.38 , pp. 58-68
    • Tesauro, G.J.1
  • 78
    • 33749882712 scopus 로고
    • Finding structure in reinforcement learning
    • Morgan Kaufmann, San Mateo, CA
    • T. Thrun, A. Schwartz, Finding structure in reinforcement learning, in: Advances in Neural Information Processing Systems 7, Morgan Kaufmann, San Mateo, CA, 1995, pp. 385-392.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 385-392
    • Thrun, T.1    Schwartz, A.2
  • 82
    • 0006496594 scopus 로고
    • Scaling reinforcement learning techniques via modularity
    • Morgan Kaufmann, San Mateo, CA
    • L.E. Wixson, Scaling reinforcement learning techniques via modularity, in: Proc. 8th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1991, pp. 368-372.
    • (1991) Proc. 8th International Conference on Machine Learning , pp. 368-372
    • Wixson, L.E.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.