메뉴 건너뛰기




Volumn 12, Issue , 2012, Pages 579-610

Reinforcement learning in robotics: A survey

Author keywords

Entropy; Torque

Indexed keywords


EID: 84892593209     PISSN: 18674534     EISSN: 18674542     Source Type: Book Series    
DOI: 10.1007/978-3-642-27645-3_18     Document Type: Chapter
Times cited : (175)

References (132)
  • 6
    • 0030149709 scopus 로고    scopus 로고
    • Purposive behavior acquisition for a real robot by vision-based reinforcement learning
    • Asada, M., Noda, S., Tawaratsumida, S., Hosoda, K.: Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning 23(2-3), 279–303 (1996)
    • (1996) Machine Learning , vol.23 , Issue.2-3 , pp. 279-303
    • Asada, M.1    Noda, S.2    Tawaratsumida, S.3    Hosoda, K.4
  • 7
    • 0031073475 scopus 로고    scopus 로고
    • Locally weighted learning for control
    • Atkeson, C., Moore, A., Stefan, S.: Locally weighted learning for control. AI Review 11, 75–113 (1997)
    • (1997) AI Review , vol.11 , pp. 75-113
    • Atkeson, C.1    Moore, A.2    Stefan, S.3
  • 8
    • 0039816976 scopus 로고
    • Using local trajectory optimizers to speed up global optimization in dynamic programming
    • Atkeson, C.G.: Using local trajectory optimizers to speed up global optimization in dynamic programming. In: Advances in Neural Information Processing Systems, NIPS (1994)
    • (1994) Advances in Neural Information Processing Systems, NIPS
    • Atkeson, C.G.1
  • 14
    • 0141988716 scopus 로고    scopus 로고
    • Recent advances in hierarchical reinforcement learning
    • Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems 13(4), 341–379 (2003)
    • (2003) Discrete Event Dynamic Systems , vol.13 , Issue.4 , pp. 341-379
    • Barto, A.G.1    Mahadevan, S.2
  • 15
    • 0003787146 scopus 로고
    • Princeton University Press, Princeton
    • Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
    • (1957) Dynamic Programming
    • Bellman, R.E.1
  • 18
    • 0031343491 scopus 로고    scopus 로고
    • Biped dynamic walking using reinforcement learning
    • Benbrahim, H., Franklin, J.A.: Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems 22(3-4), 283–302 (1997)
    • (1997) Robotics and Autonomous Systems , vol.22 , Issue.3-4 , pp. 283-302
    • Benbrahim, H.1    Franklin, J.A.2
  • 21
    • 1542307046 scopus 로고    scopus 로고
    • Practical methods for optimal control using nonlinear programming
    • Society for Industrial and Applied Mathematics (SIAM), Philadelphia
    • Betts, J.T.: Practical methods for optimal control using nonlinear programming. In: Advances in Design and Control, vol. 3. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2001)
    • (2001) Advances in Design and Control , vol.3
    • Betts, J.T.1
  • 26
    • 67650065351 scopus 로고    scopus 로고
    • Apprenticeship learning for helicopter control
    • Coates, A., Abbeel, P., Ng, A.Y.: Apprenticeship learning for helicopter control. Commun. ACM 52(7), 97–105 (2009)
    • (2009) Commun. ACM , vol.52 , Issue.7 , pp. 97-105
    • Coates, A.1    Abbeel, P.2    Ng, A.Y.3
  • 29
    • 0346982426 scopus 로고    scopus 로고
    • Using expectation-maximization for reinforcement learning
    • Dayan, P., Hinton, G.E.: Using expectation-maximization for reinforcement learning. Neural Computation 9(2), 271–278 (1997)
    • (1997) Neural Computation , vol.9 , Issue.2 , pp. 271-278
    • Dayan, P.1    Hinton, G.E.2
  • 34
    • 59149102884 scopus 로고    scopus 로고
    • Robot Navigation Based on Fuzzy RL Algorithm
    • Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds.), Springer, Heidelberg
    • Duan, Y., Cui, B., Yang, H.: Robot Navigation Based on Fuzzy RL Algorithm. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds.) ISNN 2008, Part I. LNCS, vol. 5263, pp. 391–399. Springer, Heidelberg (2008)
    • (2008) ISNN 2008, Part I. LNCS , vol.5263 , pp. 391-399
    • Duan, Y.1    Cui, B.2    Yang, H.3
  • 35
    • 38649142135 scopus 로고    scopus 로고
    • Learning CPG-based biped locomotion with a policy gradient method: Application to a humanoid robot
    • Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., Cheng, G.: Learning CPG-based biped locomotion with a policy gradient method: Application to a humanoid robot. I. J. Robotic Res. 27(2), 213–228 (2008)
    • (2008) I. J. Robotic Res. , vol.27 , Issue.2 , pp. 213-228
    • Endo, G.1    Morimoto, J.2    Matsubara, T.3    Nakanishi, J.4    Cheng, G.5
  • 36
    • 39449120595 scopus 로고    scopus 로고
    • Free gait generation with reinforcement learning for a six-legged robot
    • Erden, M.S., Leblebicioaglu, K.: Free gait generation with reinforcement learning for a six-legged robot. Robot. Auton. Syst. 56(3), 199–212 (2008)
    • (2008) Robot. Auton. Syst. , vol.56 , Issue.3 , pp. 199-212
    • Erden, M.S.1    Leblebicioaglu, K.2
  • 37
    • 84884277423 scopus 로고    scopus 로고
    • Rapid reinforcement learning for reactive control policy design for autonomous robots
    • Fagg, A.H., Lotspeich, D.L., Hoff, J., Bekey, G.A.: Rapid reinforcement learning for reactive control policy design for autonomous robots. In: Artificial Life in Robotics (1998)
    • (1998) Artificial Life in Robotics
    • Fagg, A.H.1    Lotspeich, D.L.2    Hoff, J.3    Bekey, G.A.4
  • 40
    • 0023543886 scopus 로고
    • Likelihood ratio gradient estimation: An overview
    • Glynn, P.: Likelihood ratio gradient estimation: an overview. In: Winter Simulation Conference, WSC (1987)
    • (1987) Winter Simulation Conference, WSC
    • Glynn, P.1
  • 43
    • 34948857495 scopus 로고    scopus 로고
    • Reinforcement learning for imitating constrained reaching movements
    • Guenter, F., Hersch, M., Calinon, S., Billard, A.: Reinforcement learning for imitating constrained reaching movements. Advanced Robotics 21(13), 1521–1544 (2007)
    • (2007) Advanced Robotics , vol.21 , Issue.13 , pp. 1521-1544
    • Guenter, F.1    Hersch, M.2    Calinon, S.3    Billard, A.4
  • 49
    • 23144448134 scopus 로고    scopus 로고
    • Novelty and reinforcement learning in the value system of developmental robots
    • Huang, X., Weng, J.: Novelty and reinforcement learning in the value system of developmental robots. In: Lund University Cognitive Studies (2002)
    • (2002) Lund University Cognitive Studies
    • Huang, X.1    Weng, J.2
  • 53
    • 84878320217 scopus 로고    scopus 로고
    • Modular Reinforcement Learning: An Application to a Real Robot Task
    • Birk, A., Demiris, J. (eds.), Springer, Heidelberg
    • Kalmár, Z., Szepesvári, C., Lörincz, A.: Modular Reinforcement Learning: An Application to a Real Robot Task. In: Birk, A., Demiris, J. (eds.) EWLR 1997. LNCS (LNAI), vol. 1545, pp. 29–45. Springer, Heidelberg (1998)
    • (1998) EWLR 1997. LNCS (LNAI) , vol.1545 , pp. 29-45
    • Kalmár, Z.1    Szepesvári, C.2    Lörincz, A.3
  • 55
    • 77950552568 scopus 로고    scopus 로고
    • Learning to manipulate articulated objects in unstructured environments using a grounded relational representation
    • Katz, D., Pyuro, Y., Brock, O.: Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In: Robotics: Science and Systems, R:SS (2008)
    • (2008) Robotics: Science and Systems, R:SS
    • Katz, D.1    Pyuro, Y.2    Brock, O.3
  • 70
    • 77955426970 scopus 로고    scopus 로고
    • Combining active learning and reactive control for robot grasping
    • Kroemer, O., Detry, R., Piater, J., Peters, J.: Combining active learning and reactive control for robot grasping. Robotics and Autonomous Systems 58(9), 1105–1116 (2010)
    • (2010) Robotics and Autonomous Systems , vol.58 , Issue.9 , pp. 1105-1116
    • Kroemer, O.1    Detry, R.2    Piater, J.3    Peters, J.4
  • 72
    • 38149139273 scopus 로고    scopus 로고
    • Imitative Reinforcement Learning for Soccer Playing Robots
    • Lakemeyer, G., Sklar, E., Sorrenti, D.G., Takahashi, T. (eds.), Springer, Heidelberg
    • Latzke, T., Behnke, S., Bennewitz, M.: Imitative Reinforcement Learning for Soccer Playing Robots. In: Lakemeyer, G., Sklar, E., Sorrenti, D.G., Takahashi, T. (eds.) RoboCup 2006. LNCS (LNAI), vol. 4434, pp. 47–58. Springer, Heidelberg (2007)
    • (2007) Robocup 2006. LNCS (LNAI) , vol.4434 , pp. 47-58
    • Latzke, T.1    Behnke, S.2    Bennewitz, M.3
  • 74
    • 0026880130 scopus 로고
    • Automatic programming of behavior-based robots using reinforcement learning
    • Mahadevan, S., Connell, J.: Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence 55(2-3), 311–365 (1992)
    • (1992) Artificial Intelligence , vol.55 , Issue.2-3 , pp. 311-365
    • Mahadevan, S.1    Connell, J.2
  • 77
    • 0030647149 scopus 로고    scopus 로고
    • Reinforcement learning in the multi-robot domain
    • Mataric, M.J.: Reinforcement learning in the multi-robot domain. Autonomous Robots 4, 73–83 (1997)
    • (1997) Autonomous Robots , vol.4 , pp. 73-83
    • Mataric, M.J.1
  • 81
    • 0035979437 scopus 로고    scopus 로고
    • Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning
    • Morimoto, J., Doya, K.: Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems 36(1), 37–51 (2001)
    • (2001) Robotics and Autonomous Systems , vol.36 , Issue.1 , pp. 37-51
    • Morimoto, J.1    Doya, K.2
  • 88
    • 38149039530 scopus 로고    scopus 로고
    • Perception and Developmental Learning of Affordances in Autonomous Robots
    • Hertzberg, J., Beetz, M., Englert, R. (eds.), Springer, Heidelberg
    • Paletta, L., Fritz, G., Kintzler, F., Irran, J., Dorffner, G.: Perception and Developmental Learning of Affordances in Autonomous Robots. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 235–250. Springer, Heidelberg (2007)
    • (2007) KI 2007. LNCS (LNAI) , vol.4667 , pp. 235-250
    • Paletta, L.1    Fritz, G.2    Kintzler, F.3    Irran, J.4    Dorffner, G.5
  • 90
    • 84999067567 scopus 로고    scopus 로고
    • Reinforcement learning in situated agents: Some theoretical problems and practical solutions
    • Pendrith, M.: Reinforcement learning in situated agents: Some theoretical problems and practical solutions. In: European Workshop on Learning Robots (EWRL) (1999)
    • (1999) European Workshop on Learning Robots (EWRL)
    • Pendrith, M.1
  • 92
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7-9), 1180–1190 (2008b)
    • (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 93
    • 44949241322 scopus 로고    scopus 로고
    • Reinforcement learning of motor skills with policy gradients
    • Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Networks 21(4), 682–697 (2008c)
    • (2008) Neural Networks , vol.21 , Issue.4 , pp. 682-697
    • Peters, J.1    Schaal, S.2
  • 102
    • 56049089041 scopus 로고    scopus 로고
    • State-Dependent Exploration for Policy Gradient Methods
    • Daelemans, W., Goethals, B., Morik, K. (eds.), Springer, Heidelberg
    • Rückstieß, T., Felder, M., Schmidhuber, J.: State-Dependent Exploration for Policy Gradient Methods. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 234–249. Springer, Heidelberg (2008)
    • (2008) ECML PKDD 2008, Part II. LNCS (LNAI) , vol.5212 , pp. 234-249
    • Rückstieß, T.1    Felder, M.2    Schmidhuber, J.3
  • 103
    • 84902174443 scopus 로고    scopus 로고
    • Reinforcement Learning for Biped Locomotion
    • Dorronsoro, J.R. (ed.), Springer, Heidelberg
    • Sato, M.-A., Nakamura, Y., Ishii, S.: Reinforcement Learning for Biped Locomotion. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 777–782. Springer, Heidelberg (2002)
    • (2002) ICANN 2002. LNCS , vol.2415 , pp. 777-782
    • Sato, M.-A.1    Nakamura, Y.2    Ishii, S.3
  • 105
    • 0028374275 scopus 로고
    • Robot juggling: An implementation of memory-based learning
    • Schaal, S., Atkeson, C.G.: Robot juggling: An implementation of memory-based learning. Control Systems Magazine 14(1), 57–71 (1994)
    • (1994) Control Systems Magazine , vol.14 , Issue.1 , pp. 57-71
    • Schaal, S.1    Atkeson, C.G.2
  • 106
    • 0036639869 scopus 로고    scopus 로고
    • Scalable techniques from nonparameteric statistics for real-time robot learning
    • Schaal, S., Atkeson, C.G., Vijayakumar, S.: Scalable techniques from nonparameteric statistics for real-time robot learning. Applied Intelligence 17(1), 49–60 (2002)
    • (2002) Applied Intelligence , vol.17 , Issue.1 , pp. 49-60
    • Schaal, S.1    Atkeson, C.G.2    Vijayakumar, S.3
  • 107
    • 34848832311 scopus 로고    scopus 로고
    • Dynamics systems vs. Optimal control - A unifying view
    • Schaal, S., Mohajerian, P., Ijspeert, A.J.: Dynamics systems vs. optimal control - a unifying view. Progress in Brain Research 165(1), 425–445 (2007)
    • (2007) Progress in Brain Research , vol.165 , Issue.1 , pp. 425-445
    • Schaal, S.1    Mohajerian, P.2    Ijspeert, A.J.3
  • 113
    • 0002995053 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: International Machine Learning Conference (1990)
    • (1990) International Machine Learning Conference
    • Sutton, R.S.1
  • 116
  • 117
    • 70349102909 scopus 로고    scopus 로고
    • Policy Gradient Learning of Cooperative Interaction with a Robot Using User’s Biological Signals
    • Köppen, M., Kasabov, N., Coghill, G. (eds.), Springer, Heidelberg
    • Tamei, T., Shibata, T.: Policy Gradient Learning of Cooperative Interaction with a Robot Using User’s Biological Signals. In: Köppen, M., Kasabov, N., Coghill, G. (eds.) ICONIP 2008. LNCS, vol. 5507, pp. 1029–1037. Springer, Heidelberg (2009)
    • (2009) ICONIP 2008. LNCS , vol.5507 , pp. 1029-1037
    • Tamei, T.1    Shibata, T.2
  • 122
    • 0029386385 scopus 로고
    • An approach to learning mobile robot navigation
    • Thrun, S.: An approach to learning mobile robot navigation. Robotics and Autonomous Systems 15, 301–319 (1995)
    • (1995) Robotics and Autonomous Systems , vol.15 , pp. 301-319
    • Thrun, S.1
  • 124
    • 78651507715 scopus 로고    scopus 로고
    • Expectation-Maximization methods for solving (PO)MDPs and optimal control problems
    • Cambridge University Press
    • Toussaint, M., Storkey, A., Harmeling, S.: Expectation-Maximization methods for solving (PO)MDPs and optimal control problems. In: Inference and Learning in Dynamic Models. Cambridge University Press (2010)
    • (2010) Inference and Learning in Dynamic Models
    • Toussaint, M.1    Storkey, A.2    Harmeling, S.3
  • 126
    • 0031629214 scopus 로고    scopus 로고
    • Cooperative behavior acquisition in multi mobile robots environment by reinforcement learning based on state vector estimation
    • Uchibe, E., Asada, M., Hosoda, K.: Cooperative behavior acquisition in multi mobile robots environment by reinforcement learning based on state vector estimation. In: IEEE International Conference on Robotics and Automation (ICRA) (1998)
    • (1998) IEEE International Conference on Robotics and Automation (ICRA)
    • Uchibe, E.1    Asada, M.2    Hosoda, K.3
  • 127
    • 70349327392 scopus 로고    scopus 로고
    • Learning model-free robot control by a Monte Carlo EM algorithm
    • Vlassis, N., Toussaint, M., Kontes, G., Piperidis, S.: Learning model-free robot control by a Monte Carlo EM algorithm. Autonomous Robots 27(2), 123–130 (2009)
    • (2009) Autonomous Robots , vol.27 , Issue.2 , pp. 123-130
    • Vlassis, N.1    Toussaint, M.2    Kontes, G.3    Piperidis, S.4
  • 130
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1
  • 131
    • 54249138633 scopus 로고    scopus 로고
    • A Reinforcement Learning Technique with an Adaptive Action Generator for a Multi-Robot System
    • Asada, M., Hallam, J.C.T., Meyer, J.-A., Tani, J. (eds.), Springer, Heidelberg
    • Yasuda, T., Ohkura, K.: A Reinforcement Learning Technique with an Adaptive Action Generator for a Multi-Robot System. In: Asada, M., Hallam, J.C.T., Meyer, J.-A., Tani, J. (eds.) SAB 2008. LNCS (LNAI), vol. 5040, pp. 250–259. Springer, Heidelberg (2008)
    • (2008) SAB 2008. LNCS (LNAI) , vol.5040 , pp. 250-259
    • Yasuda, T.1    Ohkura, K.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.