메뉴 건너뛰기




Volumn , Issue , 2004, Pages 359-380

Supervised actor-critic reinforcement learning

Author keywords

Data structures; Learning; Optimization; Robots; Supervised learning; Training

Indexed keywords

DATA STRUCTURES; OPTIMIZATION; PERSONNEL TRAINING; ROBOTS; STOCHASTIC SYSTEMS; SUPERVISED LEARNING; SUPERVISORY PERSONNEL;

EID: 84979715630     PISSN: None     EISSN: None     Source Type: Book    
DOI: 10.1109/9780470544785.ch14     Document Type: Chapter
Times cited : (137)

References (33)
  • 2
    • 33746599972 scopus 로고    scopus 로고
    • Reinforcement learning in motor control
    • M. A. Arbib, (ed.), MIT Press, Cambridge, MA
    • A. G. Barto, Reinforcement learning in motor control, in M. A. Arbib, (ed.), The Handbook of Brain Theory and Neural Networks, Second Edition, pp. 968-972, MIT Press, Cambridge, MA, 2003.
    • (2003) The Handbook of Brain Theory and Neural Networks, Second Edition , pp. 968-972
    • Barto, A.G.1
  • 4
    • 0031343491 scopus 로고    scopus 로고
    • Biped dynamic walking using reinforcement learning
    • H. Benbrahim and J. A. Franklin, Biped dynamic walking using reinforcement learning, Robotics and Autonomous Systems, vol. 22, pp. 283-302,1997.
    • (1997) Robotics and Autonomous Systems , vol.22 , pp. 283-302
    • Benbrahim, H.1    Franklin, J.A.2
  • 10
    • 0028739953 scopus 로고
    • Robot shaping: Developing autonomous agents through
    • M. Dorigo and M. Colombetti, Robot shaping: developing autonomous agents through learning, Artificial Intelligence, vol. 71, no. 2, pp. 321-370,1994.
    • (1994) Learning, Artificial Intelligence , vol.71 , Issue.2 , pp. 321-370
    • Dorigo, M.1    Colombetti, M.2
  • 11
    • 0025600638 scopus 로고
    • A stochastic reinforcement learning algorithm for learning realvalued functions
    • V. Gullapalli, A stochastic reinforcement learning algorithm for learning realvalued functions, Neural Networks, vol. 3, no. 6, pp. 671-692, 1990.
    • (1990) Neural Networks , vol.3 , Issue.6 , pp. 671-692
    • Gullapalli, V.1
  • 12
    • 0031343489 scopus 로고    scopus 로고
    • A feedback control structure for on-line learning tasks
    • M. Huber and R. A. Grupen, A feedback control structure for on-line learning tasks, Robotics and Autonomous Systems, vol. 22, no. 3-4, pp. 303-315, 1997.
    • (1997) Robotics and Autonomous Systems , vol.22 , Issue.3-4 , pp. 303-315
    • Huber, M.1    Grupen, R.A.2
  • 13
    • 44049116478 scopus 로고
    • Forward models: Supervised learning with a distal teacher
    • M. I. Jordan and D. E. Rumelhart, Forward models: Supervised learning with a distal teacher, Cognitive Science, vol. 16, no. 3, pp. 307-354,1992.
    • (1992) Cognitive Science , vol.16 , Issue.3 , pp. 307-354
    • Jordan, M.I.1    Rumelhart, D.E.2
  • 16
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning, planning and teaching
    • L.-J. Lin, Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning, vol. 8, no. 3-4, pp. 293-321,1992.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 293-321
    • Lin, L.-J.1
  • 17
    • 0029732210 scopus 로고    scopus 로고
    • Creating advice-taking reinforcement learners
    • R. Maclin and J. W. Shavlik, Creating advice-taking reinforcement learners, Machine Learning, vol. 22, no. 1-3, pp. 251-281,1996.
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 251-281
    • Maclin, R.1    Shavlik, J.W.2
  • 19
    • 0003200022 scopus 로고    scopus 로고
    • Sensory-motor primitives as a basis for imitation: Linking perception to action and biology to robotics
    • C. Nehaniv and K. Dautenhahn, (eds.), MIT Press, Cambridge, MA
    • M. J. Mataric, Sensory-motor primitives as a basis for imitation: linking perception to action and biology to robotics, in C. Nehaniv and K. Dautenhahn, (eds.), Imitation in Animals and Artifacts, MIT Press, Cambridge, MA, 2000.
    • (2000) Imitation in Animals and Artifacts
    • Mataric, M.J.1
  • 20
    • 0141596576 scopus 로고    scopus 로고
    • Policy invariance under reward transformations: Theory and applications to reward shaping
    • Morgan Kaufmann, San Francisco, CA
    • A. Y. Ng, D. Harada, and S. Russell, Policy invariance under reward transformations: Theory and applications to reward shaping, Proc. Sixteenth International Conference on Machine Learning, pp. 278-287, Morgan Kaufmann, San Francisco, CA, 1999.
    • (1999) Proc. Sixteenth International Conference on Machine Learning , pp. 278-287
    • Ng, A.Y.1    Harada, D.2    Russell, S.3
  • 21
    • 0013530450 scopus 로고    scopus 로고
    • Lyapunov-constrained action sets for reinforcement learning
    • C. Brodley and A. Danyluk, (eds.), Morgan Kaufmann, San Francisco, CA
    • T. J. Perkins and A. G. Barto, Lyapunov-constrained action sets for reinforcement learning, in C. Brodley and A. Danyluk, (eds.), Proc. 18th International Conference on Machine Learning, pp. 409-416, Morgan Kaufmann, San Francisco, CA, 2001.
    • (2001) Proc. 18Th International Conference on Machine Learning , pp. 409-416
    • Perkins, T.J.1    Barto, A.G.2
  • 23
    • 0010276944 scopus 로고    scopus 로고
    • Implicit imitation in multiagent reinforcement learning
    • I. Bratko and S. Dzeroski, (eds.), Morgan Kaufmann, San Francisco, CA
    • B. Price and C. Boutilier, Implicit imitation in multiagent reinforcement learning, in I. Bratko and S. Dzeroski, (eds.), Proc. 16th International Conference on Machine Learning, pp. 325-334, Morgan Kaufmann, San Francisco, CA, 1999.
    • (1999) Proc. 16Th International Conference on Machine Learning , pp. 325-334
    • Price, B.1    Boutilier, C.2
  • 24
    • 0031231885 scopus 로고    scopus 로고
    • Experiments with reinforcement learning in problems with continuous state and action spaces
    • J. C. Santamaria, R. S. Sutton, and A. Ram, Experiments with reinforcement learning in problems with continuous state and action spaces, Adaptive Behavior, vol. 6, pp. 163-217, 1997.
    • (1997) Adaptive Behavior , vol.6 , pp. 163-217
    • Santamaria, J.C.1    Sutton, R.S.2    Ram, A.3
  • 25
    • 84898995067 scopus 로고    scopus 로고
    • Learning from demonstration
    • M. C. Mozer, M. I. Jordan, and T. Petsche, (eds.), MIT Press, Cambridge, MA
    • S. Schaal, Learning from demonstration, in M. C. Mozer, M. I. Jordan, and T. Petsche, (eds.), Advances In Neural Information Processing Systems 9, pp. 1040-1046, MIT Press, Cambridge, MA, 1997.
    • (1997) Advances in Neural Information Processing Systems 9 , pp. 1040-1046
    • Schaal, S.1
  • 26
    • 0033151712 scopus 로고    scopus 로고
    • Is imitation learning the route to humanoid robots?
    • S. Schaal, Is imitation learning the route to humanoid robots? Trends in Cognitive Science, vol. 3, pp. 233-242,1999.
    • (1999) Trends in Cognitive Science , vol.3 , pp. 233-242
    • Schaal, S.1
  • 27
    • 0002933526 scopus 로고    scopus 로고
    • Linearization and gain-scheduling
    • W. S. Levine, (ed.), CRC Press, Boca Raton, FL
    • J. S. Shamma, Linearization and gain-scheduling, in W. S. Levine, (ed.), The Control Handbook, pp. 388-396, CRC Press, Boca Raton, FL, 1996.
    • (1996) The Control Handbook , pp. 388-396
    • Shamma, J.S.1
  • 30
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • R. S. Sutton, Learning to predict by the method of temporal differences, Machine Learning, vol. 3, pp. 9-44,1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 33
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, vol. 8, pp. 229-256,1992.
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.