메뉴 건너뛰기




Volumn 31, Issue 1-3, 1998, Pages 55-85

Module-Based Reinforcement Learning: Experiments with a Real Robot

Author keywords

Feature space; Local control; Markovian Decision Problems; Module based RL; Problem decomposition; Reinforcement learning; Robot learning; Subgoals; Switching control

Indexed keywords

FEATURE SPACE; LOCAL CONTROL; MARKOVIAN DECISION PROBLEMS; MODULE-BASED RL; PROBLEM DECOMPOSITION; REINFORCEMENT LEARNING (RL); SUBGOALS; SWITCHING CONTROL;

EID: 0032045145     PISSN: 08856125     EISSN: None     Source Type: Journal    
DOI: 10.1023/a:1007440607681     Document Type: Article
Times cited : (29)

References (72)
  • 1
    • 0030149709 scopus 로고    scopus 로고
    • Purposive behavior acquisition for a real robot by vision-based reinforcement learning
    • Asada, M., Noda, S., Tawaratsumida, S. & Hosoda, K. (1996). Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279-303.
    • (1996) Machine Learning , vol.23 , pp. 279-303
    • Asada, M.1    Noda, S.2    Tawaratsumida, S.3    Hosoda, K.4
  • 2
    • 0029210635 scopus 로고
    • Learning to act using real-time dynamic programming
    • Barto, A.G., Bradtke, S.J. & Singh, S.P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 1(72):81-138.
    • (1995) Artificial Intelligence , vol.1 , Issue.72 , pp. 81-138
    • Barto, A.G.1    Bradtke, S.J.2    Singh, S.P.3
  • 3
    • 0003787146 scopus 로고
    • Princeton University Press, Princeton, New Jersey
    • Bellman, R. (1957). Dynamic Programming. Princeton University Press, Princeton, New Jersey.
    • (1957) Dynamic Programming
    • Bellman, R.1
  • 5
    • 0031185898 scopus 로고    scopus 로고
    • Modeling agents as qualitative decision makers
    • Brafman, R.I. & Moshe, T. (1997). Modeling agents as qualitative decision makers. Artificial Intelligence, 94 (1):217-268.
    • (1997) Artificial Intelligence , vol.94 , Issue.1 , pp. 217-268
    • Brafman, R.I.1    Moshe, T.2
  • 6
    • 0003672832 scopus 로고
    • PhD thesis, Laboratory of Information and Decision, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139-4307 USA
    • Branicky, M.S. (1995). Studies in Hybrid Systems: Modeling, Analysis, and Control. PhD thesis, Laboratory of Information and Decision, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139-4307 USA.
    • (1995) Studies in Hybrid Systems: Modeling, Analysis, and Control
    • Branicky, M.S.1
  • 10
    • 0001937317 scopus 로고
    • Elephants don't play chess
    • Bradford-MIT Press, 1991
    • Brooks, R.A. (1991b). Elephants don't play chess. In Designing Autonomous Agents. Bradford-MIT Press, 1991.
    • (1991) Designing Autonomous Agents
    • Brooks, R.A.1
  • 11
    • 0026998041 scopus 로고
    • Reinforcement learning with perceptual aliasing: The perceptual distinctions approach
    • San Jose, CA. AAAI Press
    • Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 183-188, San Jose, CA. AAAI Press.
    • (1992) Proceedings of the Tenth National Conference on Artificial Intelligence , pp. 183-188
    • Chrisman, L.1
  • 13
    • 0021577685 scopus 로고
    • A qualitative physics based on confluences
    • de Kleer, J. & Seely, B.J. (1984). A qualitative physics based on confluences. Artificial Intelligence, 24(1-3): 7-83.
    • (1984) Artificial Intelligence , vol.24 , Issue.1-3 , pp. 7-83
    • Kleer, J.1    Seely, B.J.2
  • 14
    • 0029326107 scopus 로고
    • Alecsys and the autonomouse: Learning to control a real robot by distributed classifier systems
    • Dorigo, M. (1995). Alecsys and the autonomouse: Learning to control a real robot by distributed classifier systems. Machine Learning, 19(3):209-240.
    • (1995) Machine Learning , vol.19 , Issue.3 , pp. 209-240
    • Dorigo, M.1
  • 15
    • 0028739953 scopus 로고
    • Robot shaping: Developing autonomous agents through learning
    • Dorigo, M. & Colombetti, M. (1994). Robot shaping: Developing autonomous agents through learning. Artificial Intelligence, 71:321-370.
    • (1994) Artificial Intelligence , vol.71 , pp. 321-370
    • Dorigo, M.1    Colombetti, M.2
  • 17
    • 0004242478 scopus 로고
    • Lecture Notes in Computer Science. Springer-Verlag, New York
    • Grossman, R.L., Nerode, A., Ravn, A. P. & Rischel, H. (1993). Hybrid Systems, volume 736 of Lecture Notes in Computer Science. Springer-Verlag, New York.
    • (1993) Hybrid Systems , vol.736
    • Grossman, R.L.1    Nerode, A.2    Ravn, A.P.3    Rischel, H.4
  • 18
    • 0029751418 scopus 로고    scopus 로고
    • The loss from imperfect value functions in expectation-based and minimax-based tasks
    • Heger, M. (1996). The loss from imperfect value functions in expectation-based and minimax-based tasks. Machine Learning, 22:197-225.
    • (1996) Machine Learning , vol.22 , pp. 197-225
    • Heger, M.1
  • 19
    • 0020749243 scopus 로고
    • Vector-valued dynamic programming
    • Henig, M.I. (1983). Vector-valued dynamic programming. SIAM J. Control and Optimization, 21(3):490-499.
    • (1983) SIAM J. Control and Optimization , vol.21 , Issue.3 , pp. 490-499
    • Henig, M.I.1
  • 20
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakkola, T., Jordan, M.I. & Singh, S.P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185-1201.
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 22
    • 0028745065 scopus 로고
    • Generalization in an autonomous agent
    • Orlando, Florida. IEEE Inc
    • Kalmár, Zs., Szepesvári, Cs. & Lorincz, A. (1994). Generalization in an autonomous agent. In Proc. of IEEE WCCI ICNN'94, volume 3, pages 1815-1817, Orlando, Florida. IEEE Inc.
    • (1994) Proc. of IEEE WCCI ICNN'94 , vol.3 , pp. 1815-1817
    • Kalmár, Zs.1    Szepesvári, Cs.2    Lorincz, A.3
  • 23
    • 0029205333 scopus 로고
    • Generalized dynamic concept model as a route to construct adaptive autonomous agents
    • Kalmár, Zs., Szepesvári, Cs. & Lorincz, A. (1995). Generalized dynamic concept model as a route to construct adaptive autonomous agents. Neural Network World, 5:353-360.
    • (1995) Neural Network World , vol.5 , pp. 353-360
    • Kalmár, Zs.1    Szepesvári, Cs.2    Lorincz, A.3
  • 25
    • 2342475494 scopus 로고    scopus 로고
    • Complexity analysis of real-time reinforcement learning applied to finding shortest paths in deterministic domains
    • Koenig, S. & Simmons, R.G. (1997). Complexity analysis of real-time reinforcement learning applied to finding shortest paths in deterministic domains. Machine Learning: A Special Issue on Reinforcement Learning, 12:234-345.
    • (1997) Machine Learning: A Special Issue on Reinforcement Learning , vol.12 , pp. 234-345
    • Koenig, S.1    Simmons, R.G.2
  • 27
    • 0022045044 scopus 로고
    • Macro-operators: A weak method for learning
    • Korf, R.E. (1985b). Macro-operators: A weak method for learning. Artificial Intelligence, 26:35-77.
    • (1985) Artificial Intelligence , vol.26 , pp. 35-77
    • Korf, R.E.1
  • 28
    • 0023421864 scopus 로고
    • Planning as search: A quantitative approach
    • Korf, R.E. (1987). Planning as search: A quantitative approach. Artificial Intelligence, 33:65-88.
    • (1987) Artificial Intelligence , vol.33 , pp. 65-88
    • Korf, R.E.1
  • 29
  • 30
    • 0022062142 scopus 로고
    • A survey of some results in stochastic adaptive controls
    • Kumar, P.R. (1985). A survey of some results in stochastic adaptive controls. SIAM Journal of Control and Optimization, 23:329-380.
    • (1985) SIAM Journal of Control and Optimization , vol.23 , pp. 329-380
    • Kumar, P.R.1
  • 31
    • 0003861655 scopus 로고    scopus 로고
    • PhD thesis, Department of Computer Science, Brown University. Also Technical Report CS-96-09
    • Littman, M.L. (1996). Algorithms for Sequential Decision Making. PhD thesis, Department of Computer Science, Brown University. Also Technical Report CS-96-09.
    • (1996) Algorithms for Sequential Decision Making
    • Littman, M.L.1
  • 32
    • 0001961616 scopus 로고    scopus 로고
    • A Generalized Reinforcement Learning Model: Convergence and applications
    • Littman, M.L. & Szepesvári, Cs. (1996). A Generalized Reinforcement Learning Model: Convergence and applications. In Int. Conf. on Machine Learning, pages 310-318.
    • (1996) Int. Conf. on Machine Learning , pp. 310-318
    • Littman, M.L.1    Szepesvári, Cs.2
  • 35
    • 0002765109 scopus 로고
    • A bottom-up mechanism for behavior selection in an artificial creature
    • J.A. Meyer and S. Wilson, editors, MIT Press
    • Maes, P. (1991b), A bottom-up mechanism for behavior selection in an artificial creature. In J.A. Meyer and S. Wilson, editors, Proc. of the First International Conference on Simulation of Adaptive Behavior. MIT Press.
    • (1991) Proc. of the First International Conference on Simulation of Adaptive Behavior
    • Maes, P.1
  • 37
    • 84976813028 scopus 로고
    • Learning to coordinate behaviors
    • Boston, MA
    • Maes, P. & Brooks, R.A. (1990). Learning to coordinate behaviors. In Proc. of AAAI-90, pages 796-802, Boston, MA.
    • (1990) Proc. of AAAI-90 , pp. 796-802
    • Maes, P.1    Brooks, R.A.2
  • 38
    • 0026880130 scopus 로고
    • Automatic programming of behavior-based robots using reinforcement learning
    • Mahadevan, S. & Connell, J. (1992). Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence, 55:311-365.
    • (1992) Artificial Intelligence , vol.55 , pp. 311-365
    • Mahadevan, S.1    Connell, J.2
  • 39
    • 0030647149 scopus 로고    scopus 로고
    • Reinforcement learning in the multi-robot domain
    • Matarić, M. (1997). Reinforcement learning in the multi-robot domain. Autonomous Robots, 4.
    • (1997) Autonomous Robots , vol.4
    • Matarić, M.1
  • 40
    • 85151432208 scopus 로고
    • Overcoming incomplete perception with utile distinction memory
    • Amherst, Massachusetts. Morgan Kaufmann
    • McCallum, R.A. (1993). Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning, pages 190-196, Amherst, Massachusetts. Morgan Kaufmann.
    • (1993) Proceedings of the Tenth International Conference on Machine Learning , pp. 190-196
    • McCallum, R.A.1
  • 41
    • 84947933152 scopus 로고    scopus 로고
    • Finite-element methods with local triangulation refinement for continuous reinforcement learning problems
    • M.van Someren and G. Widmer, editors, Lecture Notes in Artificial Intelligence, Springer, Berlin
    • Munos, R. (1997). Finite-element methods with local triangulation refinement for continuous reinforcement learning problems. In M.van Someren and G. Widmer, editors, Machine Learning: ECML'97 (9th European Conf. on Machine Learning, Proceedings), volume 1224 of Lecture Notes in Artificial Intelligence, pages 170-183. Springer, Berlin.
    • (1997) Machine Learning: ECML'97 (9th European Conf. on Machine Learning, Proceedings) , vol.1224 , pp. 170-183
    • Munos, R.1
  • 44
    • 0004235832 scopus 로고
    • Princeton University Press, Princeton, NJ
    • Pólya, Gy.. (1945). How to solve it? Princeton University Press, Princeton, NJ.
    • (1945) How to Solve It?
    • Pólya, Gy.1
  • 47
    • 0016069798 scopus 로고
    • Planning in a hierarchy of abstraction spaces
    • Sacerdoti, E.D. (1974). Planning in a hierarchy of abstraction spaces. Artificial Intelligence, 5:115-135.
    • (1974) Artificial Intelligence , vol.5 , pp. 115-135
    • Sacerdoti, E.D.1
  • 49
    • 0030145238 scopus 로고    scopus 로고
    • Qualitative system identification: Deriving structure from behavior
    • Say, A.C.C. & Selahattin, K. (1996). Qualitative system identification: deriving structure from behavior. Artificial Intelligence, 83(1):75-141.
    • (1996) Artificial Intelligence , vol.83 , Issue.1 , pp. 75-141
    • Say, A.C.C.1    Selahattin, K.2
  • 50
    • 84899022377 scopus 로고    scopus 로고
    • How to dynamically merge markov decision processes
    • Cambridge, MA. MIT Press. in press
    • Singh, S. & Cohn, D. (1997). How to dynamically merge markov decision processes. In Advances in Neural Information Processing Systems 11, Cambridge, MA. MIT Press. in press.
    • (1997) Advances in Neural Information Processing Systems 11
    • Singh, S.1    Cohn, D.2
  • 51
    • 2342564758 scopus 로고    scopus 로고
    • On the convergence of single-step on-policy reinforcement-learning algorithms
    • accepted
    • Singh, S., Jaakkola, T., Littman, M.L. & Szepesvári, Cs.. (1997). On the convergence of single-step on-policy reinforcement-learning algorithms. Machine Learning. accepted.
    • (1997) Machine Learning
    • Singh, S.1    Jaakkola, T.2    Littman, M.L.3    Szepesvári, Cs.4
  • 54
    • 0000723997 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • Sutton, R.S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems, 8.
    • (1996) Advances in Neural Information Processing Systems , vol.8
    • Sutton, R.S.1
  • 56
    • 0028742076 scopus 로고
    • Dynamic Concept Model learns optimal policies
    • Orlando, Florida. IEEE Inc
    • Szepesvári, Cs. (1994). Dynamic Concept Model learns optimal policies. In Proc. of IEEE WCCI ICNN'94, volume 3, pages 1738-1742, Orlando, Florida. IEEE Inc.
    • (1994) Proc. of IEEE WCCI ICNN'94 , vol.3 , pp. 1738-1742
    • Szepesvári, C.1
  • 57
    • 84947910334 scopus 로고    scopus 로고
    • Learning and exploitation do not conflict under minimax optimality
    • M.van Someren and G. Widmer, editors, Lecture Notes in Artificial Intelligence, Springer, Berlin
    • Szepesvári, Cs. (1997a). Learning and exploitation do not conflict under minimax optimality. In M.van Someren and G. Widmer, editors, Machine Learning: ECML'97 (9th European Conf. on Machine Learning, Proceedings), volume 1224 of Lecture Notes in Artificial Intelligence, pages 242-249. Springer, Berlin.
    • (1997) Machine Learning: ECML'97 (9th European Conf. on Machine Learning, Proceedings) , vol.1224 , pp. 242-249
    • Szepesvári, Cs.1
  • 58
    • 0008876345 scopus 로고    scopus 로고
    • PhD thesis, Bolyai Institute of Mathematics, University of Szeged, Szeged, Aradi vrt. tere 1, HUNGARY, 6720
    • Szepesvári, Cs. (1997b). Static and Dynamic Aspects of Optimal Sequential Decision Making. PhD thesis, Bolyai Institute of Mathematics, University of Szeged, Szeged, Aradi vrt. tere 1, HUNGARY, 6720.
    • (1997) Static and Dynamic Aspects of Optimal Sequential Decision Making
    • Szepesvári, Cs.1
  • 59
    • 2342450292 scopus 로고    scopus 로고
    • Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms
    • in preparation
    • Szepesvári, Cs. & Littman, M.L. (1997). Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms. Neural Computation. in preparation.
    • (1997) Neural Computation
    • Szepesvári, Cs.1    Littman, M.L.2
  • 60
    • 84977014241 scopus 로고
    • Behavior of an adaptive self-organizing autonomous agent working with cues and competing concepts
    • Szepesvári, Cs. & Lorincz, A. (1994). Behavior of an adaptive self-organizing autonomous agent working with cues and competing concepts. Adaptive Behavior, 2(2):131-160.
    • (1994) Adaptive Behavior , vol.2 , Issue.2 , pp. 131-160
    • Szepesvári, Cs.1    Lorincz, A.2
  • 62
    • 33749882712 scopus 로고
    • Finding structure in reinforcement learning
    • Gerald Tesauro, David S. Touretzky, and Todd K. Leen, editors, The MIT Press, Cambridge
    • Thrun, S. & Schwartz, A. (1995). Finding structure in reinforcement learning. In Gerald Tesauro, David S. Touretzky, and Todd K. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 385-392. The MIT Press, Cambridge.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 385-392
    • Thrun, S.1    Schwartz, A.2
  • 64
    • 2342562099 scopus 로고
    • Asynchronous stochastic approximation and q-learning
    • Tsitsiklis, J.N. (1994). Asynchronous stochastic approximation and q-learning. Machine Learning, 8(3-4): 257-277.
    • (1994) Machine Learning , vol.8 , Issue.3-4 , pp. 257-277
    • Tsitsiklis, J.N.1
  • 65
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • Tsitsiklis, J.N. & Van Roy, B. (1996). Feature-based methods for large scale dynamic programming. Machine Learning, 22:59-94.
    • (1996) Machine Learning , vol.22 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 70
    • 0002557583 scopus 로고
    • Advanced forecasting methods for global crisis warning and models of intelligence
    • Werbös, P.J. (1977). Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook, 22:25-38.
    • (1977) General Systems Yearbook , vol.22 , pp. 25-38
    • Werbös, P.J.1
  • 72
    • 2342534771 scopus 로고
    • Optimal control by means of switching
    • Zabczyk, J. (1973). Optimal control by means of switching. Studia Mathematica, 65:161-171.
    • (1973) Studia Mathematica , vol.65 , pp. 161-171
    • Zabczyk, J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.