SCOPUS 정보 검색 플랫폼

Artificial Intelligence

Volumn 112, Issue 1, 1999, Pages 181-211

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

(3) Sutton, Richard S a Precup, Doina b Singh, Satinder a

a AT AND T LABS RESEARCH (United States)

b University of Massachusetts (United States)

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION THEORY; COMPUTER SYSTEMS PROGRAMMING; DECISION THEORY; KNOWLEDGE REPRESENTATION; MARKOV PROCESSES; OPTIMIZATION; THEOREM PROVING;

REINFORCEMENT LEARNING; SEMI-MARKOV DECISION PROCESS (SMDP);

LEARNING SYSTEMS;

EID: 0033170372 PISSN: 00043702 EISSN: None Source Type: Journal
DOI: 10.1016/S0004-3702(99)00052-1 Document Type: Article

Times cited : (3506)

References (82)

1
- 0001038548
- Learning control composition in a complex environment
- E.G. Araujo, R.A. Grupen, Learning control composition in a complex environment, in: Proc. 4th International Conference on Simulation of Adaptive Behavior, 1996, pp. 333-342.
- (1996) Proc. 4th International Conference on Simulation of Adaptive Behavior , pp. 333-342
- Araujo, E.G.¹ Grupen, R.A.²

2
- 0030149709
- Purposive behavior acquisition for a real robot by vision-based reinforcement learning
- M. Asada, S. Noda, S. Tawaratsumida, K. Hosada, Purposive behavior acquisition for a real robot by vision-based reinforcement learning, Machine Learning 23 (1996) 279-303.
- (1996) Machine Learning , vol.23 , pp. 279-303
- Asada, M.¹ Noda, S.² Tawaratsumida, S.³ Hosada, K.⁴

3
- 0029210635
- Learning to act using real-time dynamic programming
- A.G. Barto, S.J. Bradtke, S.P. Singh, Learning to act using real-time dynamic programming, Artificial Intelligence 72 (1995) 81-138.
- (1995) Artificial Intelligence , vol.72 , pp. 81-138
- Barto, A.G.¹ Bradtke, S.J.² Singh, S.P.³

4
- 84880685295
- Prioritized goal decomposition of markov decision processes: Toward a synthesis of classical and decision theoretic planning
- Nagoya, Japan
- C. Boutilier, R.I. Brafman, C. Geib, Prioritized goal decomposition of markov decision processes: Toward a synthesis of classical and decision theoretic planning, in: Proc. IJCAI-97, Nagoya, Japan, 1997, pp. 1162-1165.
- (1997) Proc. IJCAI-97 , pp. 1162-1165
- Boutilier, C.¹ Brafman, R.I.² Geib, C.³

5
- 85150714688
- Reinforcement learning methods for continuous-time markov decision problems
- MIT Press, Cambridge, MA
- S.J. Bradtke, M.O. Duff, Reinforcement learning methods for continuous-time markov decision problems, in: Advances in Neural Information Processing Systems 7, MIT Press, Cambridge, MA, 1995, pp. 393-400.
- (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 393-400
- Bradtke, S.J.¹ Duff, M.O.²

6
- 0031185898
- Modeling agents as qualitative decision makers
- R.I. Brafman, M. Tennenholtz, Modeling agents as qualitative decision makers, Artificial Intelligence 94 (1) (1997) 217-268.
- (1997) Artificial Intelligence , vol.94 , Issue.1 , pp. 217-268
- Brafman, R.I.¹ Tennenholtz, M.²

7
- 0003106852
- Hybrid models for motion control systems
- Birkhäuser, Boston, MA
- R.W. Brockett, Hybrid models for motion control systems, in: Essays in Control: Perspectives in the Theory and its Applications, Birkhäuser, Boston, MA, 1993, pp. 29-53.
- (1993) Essays in Control: Perspectives in the Theory and its Applications, , pp. 29-53
- Brockett, R.W.¹

8
- 0006493602
- Reasoning about probabilistic actions at multiple levels of granularity
- Stanford University
- L. Chrisman, Reasoning about probabilistic actions at multiple levels of granularity, in: Proc. AAAI Spring Symposium: Decision-Theoretic Planning, Stanford University, 1994.
- (1994) Proc. AAAI Spring Symposium: Decision-Theoretic Planning,
- Chrisman, L.¹

9
- 0030167564
- Behavior analysis and training: A methodology for behavior engineering
- M. Colombetti, M. Dorigo, G. Borghi, Behavior analysis and training: A methodology for behavior engineering, IEEE Trans. Systems Man Cybernet. Part B 26 (3) (1996) 365-380.
- (1996) IEEE Trans. Systems Man Cybernet. Part B , vol.26 , Issue.3 , pp. 365-380
- Colombetti, M.¹ Dorigo, M.² Borghi, G.³

10
- 85156187730
- Improving elevator performance using reinforcement learning
- MIT Press, Cambridge, MA
- R.H. Crites, A.G. Barto, Improving elevator performance using reinforcement learning, in: Advances in Neural Information Processing Systems 8, MIT Press, Cambridge, MA, 1996, pp. 1017-1023.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1017-1023
- Crites, R.H.¹ Barto, A.G.²

11
- 0001158047
- Improving generalization for temporal difference learning: The successor representation
- P. Dayan, Improving generalization for temporal difference learning: The successor representation, Neural Computation 5 (1993) 613-624.
- (1993) Neural Computation , vol.5 , pp. 613-624
- Dayan, P.¹

12
- 0001234682
- Feudal reinforcement learning
- Morgan Kaufmann, San Mateo, CA
- P. Dayan, G.E. Hinton, Feudal reinforcement learning, in: Advances in Neural Information Processing Systems 5, Morgan Kaufmann, San Mateo, CA, 1993, pp. 271-278.
- (1993) Advances in Neural Information Processing Systems , vol.5 , pp. 271-278
- Dayan, P.¹ Hinton, G.E.²

13
- 0029332887
- Planning under time constraints in stochastic domains
- T. Dean, L.P. Kaelbling, J. Kirman, A. Nicholson, Planning under time constraints in stochastic domains, Artificial Intelligence 76 (1-2) (1995) 35-74.
- (1995) Artificial Intelligence , vol.76 , Issue.1-2 , pp. 35-74
- Dean, T.¹ Kaelbling, L.P.² Kirman, J.³ Nicholson, A.⁴

14
- 85168151397
- Decomposition techniques for planning in stochastic domains
- Montreal, Quebec, Morgan Kaufmann, San Mateo, CA, See also Technical Report CS-95-10, Brown University, Department of Computer Science, 1995
- T. Dean, S.-H. Lin, Decomposition techniques for planning in stochastic domains, in: Proc. IJCAI-95, Montreal, Quebec, Morgan Kaufmann, San Mateo, CA, 1995, pp. 1121-1127. See also Technical Report CS-95-10, Brown University, Department of Computer Science, 1995.
- (1995) Proc. IJCAI-95 , pp. 1121-1127
- Dean, T.¹ Lin, S.-H.²

15
- 0028317777
- Learning to plan in continuous domains
- G.F. DeJong, Learning to plan in continuous domains, Artificial Intelligence 65 (1994) 71-141.
- (1994) Artificial Intelligence , vol.65 , pp. 71-141
- DeJong, G.F.¹

16
- 0001806701
- The MAXQ method for hierarchical reinforcement learning
- Morgan Kaufmann, San Mateo, CA
- T.G. Dietterich, The MAXQ method for hierarchical reinforcement learning, in: Machine Learning: Proc. 15th International Conference, Morgan Kaufmann, San Mateo, CA, 1998, pp. 118-126.
- (1998) Machine Learning: Proc. 15th International Conference , pp. 118-126
- Dietterich, T.G.¹

17
- 0028739953
- Robot shaping: Developing autonomous agents through learning
- M. Dorigo, M. Colombetti, Robot shaping: Developing autonomous agents through learning, Artificial Intelligence 71 (1994) 321-370.
- (1994) Artificial Intelligence , vol.71 , pp. 321-370
- Dorigo, M.¹ Colombetti, M.²

18
- 0003977430
- MIT Press, Cambridge, MA
- G.L. Drescher, Made Up Minds: A Constructivist Approach to Artificial Intelligence, MIT Press, Cambridge, MA, 1991.
- (1991) Made Up Minds: A Constructivist Approach to Artificial Intelligence
- Drescher, G.L.¹

19
- 26844577989
- Composing functions to speed up reinforcement learning in a changing world
- Springer, Berlin
- C. Drummond, Composing functions to speed up reinforcement learning in a changing world, in: Proc. 10th European Conference on Machine Learning, Springer, Berlin, 1998.
- (1998) Proc. 10th European Conference on Machine Learning
- Drummond, C.¹

20
- 85158051593
- Why PRODIGY/EBL works
- Boston, MA, MIT Press, Cambridge, MA
- O. Etzioni, Why PRODIGY/EBL works, in: Proc. AAAI-90, Boston, MA, MIT Press, Cambridge, MA, 1990, pp. 916-922.
- (1990) Proc. AAAI-90 , pp. 916-922
- Etzioni, O.¹

21
- 0015440625
- Learning and executing generalized robot plans
- R.E. Fikes, P.E. Hart, N.J. Nilsson, Learning and executing generalized robot plans, Artificial Intelligence 3 (1972) 251-288.
- (1972) Artificial Intelligence , vol.3 , pp. 251-288
- Fikes, R.E.¹ Hart, P.E.² Nilsson, N.J.³

22
- 0006419532
- High-level planning and control with incomplete information using POMDPs
- H. Geffner, B. Bonet, High-level planning and control with incomplete information using POMDPs, in: Proc. AIPS-98 Workshop on Integrating Planning, Scheduling and Execution in Dynamic and Uncertain Environments, 1998.
- (1998) Proc. AIPS-98 Workshop on Integrating Planning, Scheduling and Execution in Dynamic and Uncertain Environments
- Geffner, H.¹ Bonet, B.²

23
- 0030389008
- A statistical approach to adaptive problem solving
- J. Gratch, G. DeJong, A statistical approach to adaptive problem solving, Artificial Intelligence 88 (1-2) (1996) 101-161.
- (1996) Artificial Intelligence , vol.88 , Issue.1-2 , pp. 101-161
- Gratch, J.¹ DeJong, G.²

24
- 0026961480
- A statistical approach to solving the EBL utility problem
- San Jose, CA
- R. Greiner, I. Jurisica, A statistical approach to solving the EBL utility problem, in: Proc. AAAI-92, San Jose, CA, 1992, pp. 241-248.
- (1992) Proc. AAAI-92 , pp. 241-248
- Greiner, R.¹ Jurisica, I.²

25
- 0004242478
- Springer, New York
- R.L. Grossman, A. Nerode, A.P. Ravn, H. Rischel, Hybrid Systems, Springer, New York, 1993.
- (1993) Hybrid Systems
- Grossman, R.L.¹ Nerode, A.² Ravn, A.P.³ Rischel, H.⁴

26
- 0006419533
- Hierarchical solution of Markov decision processes using macro-actions
- M. Hauskrecht, N. Meuleau, C. Boutilier, L.P. Kaelbling, T. Dean, Hierarchical solution of Markov decision processes using macro-actions, in: Uncertainty in Artificial Intelligence: Proc. 14th Conference, 1998, pp. 220-229.
- (1998) Uncertainty in Artificial Intelligence: Proc. 14th Conference , pp. 220-229
- Hauskrecht, M.¹ Meuleau, N.² Boutilier, C.³ Kaelbling, L.P.⁴ Dean, T.⁵

27
- 0003644124
- MIT Press, Cambridge, MA
- R. Howard, Dynamic Programming and Markov Processes, MIT Press, Cambridge, MA, 1960.
- (1960) Dynamic Programming and Markov Processes
- Howard, R.¹

28
- 0031343489
- A feedback control structure for on-line learning tasks
- M. Huber, R.A. Grupen, A feedback control structure for on-line learning tasks, Robotics and Autonomous Systems 22 (3-4) (1997) 303-315.
- (1997) Robotics and Autonomous Systems , vol.22 , Issue.3-4 , pp. 303-315
- Huber, M.¹ Grupen, R.A.²

29
- 0000148778
- A heuristic approach to the discovery of macro-operators
- G.A. Iba, A heuristic approach to the discovery of macro-operators, Machine Learning 3 (1989) 285-317.
- (1989) Machine Learning , vol.3 , pp. 285-317
- Iba, G.A.¹

30
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- T. Jaakkola, M.I. Jordan, S. Singh, On the convergence of stochastic iterative dynamic programming algorithms, Neural Computation 6 (6) (1994) 1185-1201.
- (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.³

31
- 85143168613
- Hierarchical learning in stochastic domains: Preliminary results
- Morgan Kaufmann, San Mateo, CA
- L.P. Kaelbling, Hierarchical learning in stochastic domains: Preliminary results, in: Proc. 10th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993, pp. 167-173.
- (1993) Proc. 10th International Conference on Machine Learning , pp. 167-173
- Kaelbling, L.P.¹

32
- 0032045145
- Module based reinforcement learning: Experiments with a real robot
- and Autonomous Robots 5 (1998) 273-295 (special joint issue)
- Zs. Kalmár, Cs. Szepesvári, A. Lörincz, Module based reinforcement learning: Experiments with a real robot, Machine Learning 31 (1998) 55-85 and Autonomous Robots 5 (1998) 273-295 (special joint issue).
- (1998) Machine Learning , vol.31 , pp. 55-85
- Kalmár, Zs.¹ Szepesvári, Cs.² Lörincz, A.³

33
- 0021577685
- A qualitative physics based on confluences
- J. de Kleer, J.S. Brown, A qualitative physics based on confluences, Artificial Intelligence 24 (1-3) (1984) 7-83.
- (1984) Artificial Intelligence , vol.24 , Issue.1-3 , pp. 7-83
- De Kleer, J.¹ Brown, J.S.²

34
- 0003828024
- Pitman Publishers, Boston, MA
- R.E. Korf, Learning to Solve Problems by Searching for Macro-Operators, Pitman Publishers, Boston, MA, 1985.
- Learning to Solve Problems by Searching for Macro-Operators , pp. 1985
- Korf, R.E.¹

35
- 0026961481
- Automatic programming of robots using genetic programming
- San Jose, CA
- J.R. Koza, J.P. Rice, Automatic programming of robots using genetic programming, in: Proc. AAAI-92, San Jose, CA, 1992, pp. 194-201.
- (1992) Proc. AAAI-92 , pp. 194-201
- Koza, J.R.¹ Rice, J.P.²

36
- 0006502449
- Commonsense knowledge of space: Learning from experience
- Tokyo, Japan
- B.J. Kuipers, Commonsense knowledge of space: Learning from experience, in: Proc. IJCAI-79, Tokyo, Japan, 1979, pp. 499-501.
- (1979) Proc. IJCAI-79 , pp. 499-501
- Kuipers, B.J.¹

37
- 0002982589
- Chunking in SOAR: The anatomy of a general learning mechanism
- J.E. Laird, P.S. Rosenbloom, A. Newell, Chunking in SOAR: The anatomy of a general learning mechanism, Machine Learning 1 (1986) 11-46.
- (1986) Machine Learning , vol.1 , pp. 11-46
- Laird, J.E.¹ Rosenbloom, P.S.² Newell, A.³

38
- 0003673017
- Reinforcement learning for robots using neural networks
- Ph.D. Thesis, Carnegie Mellon University
- L.-J. Lin, Reinforcement learning for robots using neural networks, Ph.D. Thesis, Carnegie Mellon University, Technical Report CMU-CS-93-103, 1993.
- (1993) Technical Report CMU-CS-93-103
- Lin, L.-J.¹

39
- 84976813028
- Learning to coordinate behaviors
- Boston, MA
- P. Maes, R. Brooks, Learning to coordinate behaviors, in: Proc. AAAI-90, Boston, MA, 1990, pp. 796-802.
- (1990) Proc. AAAI-90 , pp. 796-802
- Maes, P.¹ Brooks, R.²

40
- 0026880130
- Automatic programming of behavior-based robots using reinforcement learning
- S. Mahadevan, J. Connell, Automatic programming of behavior-based robots using reinforcement learning, Artificial Intelligence 55 (2-3) (1992) 311-365.
- (1992) Artificial Intelligence , vol.55 , Issue.2-3 , pp. 311-365
- Mahadevan, S.¹ Connell, J.²

41
- 0001963197
- Self-improving factory simulation using continuous-time average-reward reinforcement learning
- S. Mahadevan, N. Marchalleck, T. Das, A. Gosavi, Self-improving factory simulation using continuous-time average-reward reinforcement learning, in: Proc. 14th International Conference on Machine Learning, 1997, pp. 202-210.
- (1997) Proc. 14th International Conference on Machine Learning , pp. 202-210
- Mahadevan, S.¹ Marchalleck, N.² Das, T.³ Gosavi, A.⁴

42
- 84898959706
- Reinforcement learning for call admission control in routing in integrated service networks
- Morgan Kaufmann, San Mateo, CA
- P. Marbach, O. Mihatsch, M. Schulte, J.N. Tsitsiklis, Reinforcement learning for call admission control in routing in integrated service networks, in: Advances in Neural Information Processing Systems 10, Morgan Kaufmann, San Mateo, CA, 1998, pp. 922-928.
- (1998) Advances in Neural Information Processing Systems , vol.10 , pp. 922-928
- Marbach, P.¹ Mihatsch, O.² Schulte, M.³ Tsitsiklis, J.N.⁴

43
- 0031504223
- Behavior-based control: Examples from navigation, learning, and group behavior
- M.J. Mataric, Behavior-based control: Examples from navigation, learning, and group behavior, J. Experiment. Theoret. Artificial Intelligence 9 (2-3) (1997) 323-336.
- (1997) J. Experiment. Theoret. Artificial Intelligence , vol.9 , Issue.2-3 , pp. 323-336
- Mataric, M.J.¹

44
- 0003543129
- Macro-actions in reinforcement learning: An empirical analysis
- University of Massachusetts, Department of Computer Science
- A. McGovern, R.S. Sutton, Macro-actions in reinforcement learning: An empirical analysis, Technical Report 98-70, University of Massachusetts, Department of Computer Science, 1998.
- (1998) Technical Report 98-70
- McGovern, A.¹ Sutton, R.S.²

45
- 0031632806
- Solving very large weakly coupled Markov decision processes
- Madison, WI
- N. Meuleau, M. Hauskrecht, K.-E. Kim, L. Peshkin, L.P. Kaelbling, T. Dean, C. Boutilier, Solving very large weakly coupled Markov decision processes, in: Proc. AAAI-98, Madison, WI, 1998, pp. 165-172.
- (1998) Proc. AAAI-98 , pp. 165-172
- Meuleau, N.¹ Hauskrecht, M.² Kim, K.-E.³ Peshkin, L.⁴ Kaelbling, L.P.⁵ Dean, T.⁶ Boutilier, C.⁷

46
- 0003543674
- Kluwer Academic, Dordrecht
- S. Minton, Learning Search Control Knowledge: An Explanation-Based Approach, Kluwer Academic, Dordrecht, 1988.
- (1988) Learning Search Control Knowledge: An Explanation-Based Approach
- Minton, S.¹

47
- 0025398889
- Quantitative results concerning the utilty of explanation-based learning
- S. Minton, Quantitative results concerning the utilty of explanation-based learning, Artificial Intelligence 42 (2-3) (1990) 363-391.
- (1990) Artificial Intelligence , vol.42 , Issue.2-3 , pp. 363-391
- Minton, S.¹

48
- 0006488247
- The parti-game algorithm for variable resolution reinforcement learning in multidimensional spaces
- MIT Press, Cambridge, MA
- A.W. Moore, The parti-game algorithm for variable resolution reinforcement learning in multidimensional spaces, in: Advances in Neural Information Processing Systems 6, MIT Press, Cambridge, MA, 1994, pp. 711-718.
- (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 711-718
- Moore, A.W.¹

49
- 0003430412
- Prentice-Hall, Englewood Cliffs, NJ
- A. Newell, H.A. Simon, Human Problem Solving, Prentice-Hall, Englewood Cliffs, NJ, 1972.
- (1972) Human Problem Solving
- Newell, A.¹ Simon, H.A.²

50
- 84899001559
- A Q-learning based dynamic channel assignment technique for mobile communication systems
- to appear
- J. Nie, S. Haykin, A Q-learning based dynamic channel assignment technique for mobile communication systems, IEEE Transactions on Vehicular Technology, to appear.
- IEEE Transactions on Vehicular Technology
- Nie, J.¹ Haykin, S.²

51
- 0027652475
- Teleo-reactive programs for agent control
- N. Nilsson, Teleo-reactive programs for agent control, J. Artificial Intelligence Res. 1 (1994) 139-158.
- (1994) J. Artificial Intelligence Res. , vol.1 , pp. 139-158
- Nilsson, N.¹

52
- 0003989214
- Ph.D. Thesis, University of California at Berkeley
- R. Parr, Hierarchical Control and Learning for Markov Decision Processes, Ph.D. Thesis, University of California at Berkeley, 1998.
- (1998) Hierarchical Control and Learning for Markov Decision Processes
- Parr, R.¹

53
- 84898956770
- Reinforcement learning with hierarchies of machines
- MIT Press, Cambridge, MA
- R. Parr, S. Russell, Reinforcement learning with hierarchies of machines, in: Advances in Neural Information Procesing Systems 10, MIT Press, Cambridge, MA, 1998, pp. 1043-1049.
- (1998) Advances in Neural Information Procesing Systems , vol.10 , pp. 1043-1049
- Parr, R.¹ Russell, S.²

54
- 0006496593
- Multi-time models for reinforcement learning
- D. Precup, R.S. Sutton, Multi-time models for reinforcement learning, in: Proc. ICML'97 Workshop on Modeling in Reinforcement Learning, 1997.
- (1997) Proc. ICML'97 Workshop on Modeling in Reinforcement Learning
- Precup, D.¹ Sutton, R.S.²

55
- 84899003140
- Multi-time models for temporally abstract planning
- MIT Press, Cambridge, MA
- D. Precup, R.S. Sutton, Multi-time models for temporally abstract planning, in: Advances in Neural Information Processing Systems 10, MIT Press, Cambridge, MA, 1998, pp. 1050-1056.
- (1998) Advances in Neural Information Processing Systems , vol.10 , pp. 1050-1056
- Precup, D.¹ Sutton, R.S.²

56
- 0006419257
- Planning with closed-loop macro actions
- D. Precup, R.S. Sutton, S.P. Singh, Planning with closed-loop macro actions, in: Working Notes 1997 AAAI Fall Symposium on Model-directed Autonomous Systems, 1997, pp. 70-76.
- (1997) Working Notes 1997 AAAI Fall Symposium on Model-directed Autonomous Systems , pp. 70-76
- Precup, D.¹ Sutton, R.S.² Singh, S.P.³

57
- 0002955348
- Theoretical results on reinforcement learning with temporally abstract options
- Springer, Berlin
- D. Precup, R.S. Sutton, S.P. Singh, Theoretical results on reinforcement learning with temporally abstract options, in: Proc. 10th European Conference on Machine Learning, Springer, Berlin, 1998.
- (1998) Proc. 10th European Conference on Machine Learning
- Precup, D.¹ Sutton, R.S.² Singh, S.P.³

58
- 0003958910
- Wiley, New York
- M.L. Puterman, Markov Decision Problems, Wiley, New York, 1994
- (1994) Markov Decision Problems
- Puterman, M.L.¹

59
- 10844252596
- Incremental development of complex behaviors through automatic construction of sensory-motor hierarchies
- Morgan Kaufmann, San Mateo, CA
- M. Ring, Incremental development of complex behaviors through automatic construction of sensory-motor hierarchies, in: Proc. 8th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1991, pp. 343-347.
- (1991) Proc. 8th International Conference on Machine Learning , pp. 343-347
- Ring, M.¹

60
- 0016069798
- Planning in ahierarchy of abstraction spaces
- E.D. Sacerdoti, Planning in ahierarchy of abstraction spaces, Artificial Intelligence 5 (1974) 115-135.
- (1974) Artificial Intelligence , vol.5 , pp. 115-135
- Sacerdoti, E.D.¹

61
- 0006506831
- Algorithms for design of hybrid systems
- S. Sastry, Algorithms for design of hybrid systems, in: Proc. International Conference of Information Sciences, 1997.
- (1997) Proc. International Conference of Information Sciences
- Sastry, S.¹

62
- 0030145238
- Qualitative system identification: Deriving structure from behavior
- A.C.C. Say, S. Kuru, Qualitative system identification: Deriving structure from behavior, Artificial Intelligence 83 (1) (1996) 75-141.
- (1996) Artificial Intelligence , vol.83 , Issue.1 , pp. 75-141
- Say, A.C.C.¹ Kuru, S.²

63
- 0006459160
- Technische Universität München, TR FKI-148-91
- J. Schmidhuber, Neural Sequence Chunkers, Technische Universität München, TR FKI-148-91, 1991.
- (1991) Neural Sequence Chunkers
- Schmidhuber, J.¹

64
- 0005610003
- Probabilistic robot navigation in partially observable environments
- Montreal, Quebec, Morgan Kaufmann, San Mateo, CA
- R. Simmons, S. Koenig, Probabilistic robot navigation in partially observable environments, in: Proc. IJCAI-95, Montreal, Quebec, Morgan Kaufmann, San Mateo, CA, 1995, pp. 1080-1087.
- (1995) Proc. IJCAI-95 , pp. 1080-1087
- Simmons, R.¹ Koenig, S.²

65
- 0026962175
- Reinforcement learning with ahierarchy of abstract models
- San Jose, CA, MIT/AAAI Press, Cambridge, MA
- S.P. Singh, Reinforcement learning with ahierarchy of abstract models, in: Proc. AAAI-92, San Jose, CA, MIT/AAAI Press, Cambridge, MA, 1992, pp. 202-207.
- (1992) Proc. AAAI-92 , pp. 202-207
- Singh, S.P.¹

66
- 0002876837
- Scaling reinforcement learning by learning variable temporal resolution models
- Morgan Kaufmann, San Mateo, CA
- S.P. Singh, Scaling reinforcement learning by learning variable temporal resolution models, in: Proc. 9th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1992, pp. 406-415.
- (1992) Proc. 9th International Conference on Machine Learning , pp. 406-415
- Singh, S.P.¹

67
- 0001652790
- The efficient learning of multiple task sequences
- Morgan Kaufmann, San Mateo, CA
- S.P. Singh, The efficient learning of multiple task sequences, in: Advances in Neural Information Processing Systems 4, Morgan Kaufmann, San Mateo, CA, 1992, pp. 251-258.
- (1992) Advances in Neural Information Processing Systems , vol.4 , pp. 251-258
- Singh, S.P.¹

68
- 0001027894
- Transfer of learning by composing solutions of elemental sequential tasks
- S.P. Singh, Transfer of learning by composing solutions of elemental sequential tasks, Machine Learning 8 (3/4) (1992) 323-340.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 323-340
- Singh, S.P.¹

69
- 0006488248
- Robust reinforcement learning in motion planning
- Morgan Kaufmann, San Mateo, CA
- S.P. Singh, A.G. Barto, R.A. Grupen, C.I. Connolly, Robust reinforcement learning in motion planning, in: Advances in Neural Information Processing Systems 6, Morgan Kaufmann, San Mateo, CA, 1994, pp. 655-662.
- (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 655-662
- Singh, S.P.¹ Barto, A.G.² Grupen, R.A.³ Connolly, C.I.⁴

70
- 84898972974
- Reinforcement learning for dynamic channel allocation in cellular telephone systems
- MIT Press, Cambridge, MA
- S.P. Singh, D. Bertsekas, Reinforcement learning for dynamic channel allocation in cellular telephone systems, in: Advances in Neural Information Processing Systems 9, MIT Press, Cambridge, MA, 1997, pp. 974-980.
- (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 974-980
- Singh, S.P.¹ Bertsekas, D.²

71
- 84922015064
- TD models: Modeling the world at a mixture of time scales
- Morgan Kaufmann, San Mateo, CA
- R.S. Sutton, TD models: Modeling the world at a mixture of time scales, in: Proc. 12th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1995, pp. 531-539.
- (1995) Proc. 12th International Conference on Machine Learning , pp. 531-539
- Sutton, R.S.¹

72
- 0004102479
- MIT Press, Cambridge, MA
- R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

73
- 0003021566
- The learning of world models by connectionist networks
- R.S. Sutton, B. Pinette, The learning of world models by connectionist networks, in: Proc. 7th Annual Conference of the Cognitive Science Society, 1985, pp. 54-64.
- (1985) Proc. 7th Annual Conference of the Cognitive Science Society, , pp. 54-64
- Sutton, R.S.¹ Pinette, B.²

74
- 0002260073
- Intra-option learning about temporally abstract actions
- Morgan Kaufmann, San Mateo, CA
- R.S. Sutton, D. Precup, S. Singh, Intra-option learning about temporally abstract actions, in: Proc. 15th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1998, pp. 556-564.
- (1998) Proc. 15th International Conference on Machine Learning , pp. 556-564
- Sutton, R.S.¹ Precup, D.² Singh, S.³

75
- 0000672258
- Improved switching among temporally abstract actions
- MIT Press, Cambridge, MA
- R.S. Sutton, S. Singh, D. Precup, B. Ravindran, Improved switching among temporally abstract actions, in: Advances in Neural Information Processing Systems 11, MIT Press, Cambridge, MA, 1999, pp. 1066-1072.
- (1999) Advances in Neural Information Processing Systems , vol.11 , pp. 1066-1072
- Sutton, R.S.¹ Singh, S.² Precup, D.³ Ravindran, B.⁴

76
- 0000797959
- The problem of expensive chunks and its solution by restricting expressiveness
- M. Tambe, A. Newell, P. Rosenbloom, The problem of expensive chunks and its solution by restricting expressiveness, Machine Learning 5 (3) (1990) 299-348.
- (1990) Machine Learning , vol.5 , Issue.3 , pp. 299-348
- Tambe, M.¹ Newell, A.² Rosenbloom, P.³

77
- 0029276036
- Temporal difference learning and TD-Gammon
- G.J. Tesauro, Temporal difference learning and TD-Gammon, Comm. ACM 38 (1995) 58-68.
- (1995) Comm. ACM , vol.38 , pp. 58-68
- Tesauro, G.J.¹

78
- 33749882712
- Finding structure in reinforcement learning
- Morgan Kaufmann, San Mateo, CA
- T. Thrun, A. Schwartz, Finding structure in reinforcement learning, in: Advances in Neural Information Processing Systems 7, Morgan Kaufmann, San Mateo, CA, 1995, pp. 385-392.
- (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 385-392
- Thrun, T.¹ Schwartz, A.²

79
- 0030418601
- Behavior coordination for a mobile robot using modular reinforcement learning
- M. Uchibe, M. Asada, K. Hosada, Behavior coordination for a mobile robot using modular reinforcement learning, in: Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, 1996, pp. 1329-1336.
- (1996) Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems , pp. 1329-1336
- Uchibe, M.¹ Asada, M.² Hosada, K.³

80
- 0004049895
- Ph.D. Thesis, Cambridge University
- C.J.C.H. Watkins, Learning with Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989.
- (1989) Learning with Delayed Rewards
- Watkins, C.J.C.H.¹

81
- 0031215211
- M. Wiering, J. Schmidhuber, HQ-learning, Adaptive Behavior 6 (2) (1997) 219-246.
- (1997) HQ-learning, Adaptive Behavior , vol.6 , Issue.2 , pp. 219-246
- Wiering, M.¹ Schmidhuber, J.²

82
- 0006496594
- Scaling reinforcement learning techniques via modularity
- Morgan Kaufmann, San Mateo, CA
- L.E. Wixson, Scaling reinforcement learning techniques via modularity, in: Proc. 8th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1991, pp. 368-372.
- (1991) Proc. 8th International Conference on Machine Learning , pp. 368-372
- Wixson, L.E.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.