메뉴 건너뛰기




Volumn 36, Issue 1, 2012, Pages 42-59

Reinforcement learning and optimal adaptive control: An overview and implementation examples

Author keywords

ADP; Optimal adaptive control; Q learning; Reinforcement learning

Indexed keywords

ADAPTIVE CONTROL SYSTEMS; ADMINISTRATIVE DATA PROCESSING; ANTHROPOMORPHIC ROBOTS; CONTROL THEORY; CONTROLLERS; COST FUNCTIONS; DYNAMIC PROGRAMMING; ROBOTIC ARMS; ROBOTICS;

EID: 84860519689     PISSN: 13675788     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.arcontrol.2012.03.004     Document Type: Article
Times cited : (204)

References (115)
  • 2
    • 14844340822 scopus 로고    scopus 로고
    • Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
    • M. Abu-Khalaf, and F. Lewis Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach Automatica 41 5 2005 779 791
    • (2005) Automatica , vol.41 , Issue.5 , pp. 779-791
    • Abu-Khalaf, M.1    Lewis, F.2
  • 6
    • 0035100761 scopus 로고    scopus 로고
    • Reinforcement learning in a rule-based navigator for robotic manipulators
    • K. Althoefer, B. Krekelberg, D. Husmeier, and L. Seneviratne Reinforcement learning in a rule-based navigator for robotic manipulators Neurocomputing 37 1-4 2001 51 70
    • (2001) Neurocomputing , vol.37 , Issue.14 , pp. 51-70
    • Althoefer, K.1    Krekelberg, B.2    Husmeier, D.3    Seneviratne, L.4
  • 10
    • 0020970738 scopus 로고
    • Neuron-like adaptive elements that can solve difficult learning control problems
    • A. Barto, and R. Sutton Neuron-like adaptive elements that can solve difficult learning control problems IEEE Transactions on Systems, Man, and Cybernetics 5 1983 834 846
    • (1983) IEEE Transactions on Systems, Man, and Cybernetics , vol.5 , pp. 834-846
    • Barto, A.1    Sutton, R.2
  • 12
    • 0002283578 scopus 로고
    • Reinforcement learning and adaptive critic methods
    • White Sofge V.N. Reinhold New York
    • A.G. Barto Reinforcement learning and adaptive critic methods White Sofge Handbook of intelligent control 1992 V.N. Reinhold New York 69 91
    • (1992) Handbook of Intelligent Control , pp. 69-91
    • Barto, A.G.1
  • 13
    • 79953172693 scopus 로고    scopus 로고
    • Neuro-dynamic programming: An overview and recent results
    • K.-H. Waldmann, U.M. Stocker, Springer Berlin Heidelberg
    • D. Bertsekas Neuro-dynamic programming: An overview and recent results K.-H. Waldmann, U.M. Stocker, Operations research proceedings 2006 2006 Springer Berlin Heidelberg 71 72
    • (2006) Operations Research Proceedings 2006 , pp. 71-72
    • Bertsekas, D.1
  • 17
    • 84860501567 scopus 로고    scopus 로고
    • Temporal difference methods for general projected equations
    • D. Bertsekas Temporal difference methods for general projected equations IEEE Transactions on Automatic Control 99 2011 1
    • (2011) IEEE Transactions on Automatic Control , Issue.99 , pp. 1
    • Bertsekas, D.1
  • 20
    • 33645410501 scopus 로고    scopus 로고
    • Dynamic programming and suboptimal control: A survey from ADP to MPC
    • D.P. Bertsekas Dynamic programming and suboptimal control: A survey from ADP to MPC European Journal of Control 11 4-5 2005 310 334
    • (2005) European Journal of Control , vol.11 , Issue.45 , pp. 310-334
    • Bertsekas, D.P.1
  • 21
    • 84980552700 scopus 로고    scopus 로고
    • Dynamic programming and suboptimal control: A survey from ADP to MPC
    • Bertsekas, D. P. (2005b). Dynamic programming and suboptimal control: A survey from ADP to MPC. In CDC proceedings.
    • (2005) CDC Proceedings
    • Bertsekas, D.P.1
  • 24
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • J.A. Boyan Technical update: Least-squares temporal difference learning Machine Learning 49 2 2002 233 246
    • (2002) Machine Learning , vol.49 , Issue.2 , pp. 233-246
    • Boyan, J.A.1
  • 25
    • 0000859970 scopus 로고
    • Reinforcement learning applied to linear quadratic regulation
    • N. Morgan Kaufman
    • S.J. Bradtke Reinforcement learning applied to linear quadratic regulation Advances in neural information processing systems 5 1993 N. Morgan Kaufman 295 302
    • (1993) Advances in Neural Information Processing Systems 5 , pp. 295-302
    • Bradtke, S.J.1
  • 26
    • 0033324616 scopus 로고    scopus 로고
    • Application of reinforcement learning control to a nonlinear dexterous robot
    • 1999
    • Bucak, I.; Zohdy, M. (1999). Application of reinforcement learning control to a nonlinear dexterous robot. In Proceedings of the 38th IEEE conference on decision and control, 1999 (Vol. 5, pp. 5108 -5113).
    • (1999) Proceedings of the 38th IEEE Conference on Decision and Control , vol.5 , pp. 5108-5113
    • Bucak, I.1    Zohdy, M.2
  • 34
    • 84860532988 scopus 로고    scopus 로고
    • Real-time dynamic fuzzy Q-learning and control of mobile robots
    • World Scientific and Engineering Academy and Society (WSEAS) Stevens Point, Wisconsin, USA
    • D. Chang, and M.J. Er Real-time dynamic fuzzy Q-learning and control of mobile robots Proceedings of the 2nd WSEAS international conference on electronics, control and signal processing 2003 World Scientific and Engineering Academy and Society (WSEAS) Stevens Point, Wisconsin, USA 82:1 82:9
    • (2003) Proceedings of the 2nd WSEAS International Conference on Electronics, Control and Signal Processing , pp. 821-829
    • Chang, D.1    Er, M.J.2
  • 37
    • 81355166317 scopus 로고    scopus 로고
    • Approximate value iteration in the reinforcement learning context: Application to electrical power system control
    • D. Ernst, M. Glavic, P. Geurts, and L. Wehenkel Approximate value iteration in the reinforcement learning context: Application to electrical power system control International Journal of Emerging Electric Power Systems 3 1 2005 1066.1 1066.37
    • (2005) International Journal of Emerging Electric Power Systems , vol.3 , Issue.1 , pp. 10661-106637
    • Ernst, D.1    Glavic, M.2    Geurts, P.3    Wehenkel, L.4
  • 40
    • 79956274048 scopus 로고    scopus 로고
    • Revisiting natural actor-critics with value function approximation
    • V. Torra, Y. Narukawa, M. Daumas, Lecture notes in artificial intelligence LNAI Springer Verlag Heidelberg, Berlin, Perpinya (France)
    • M. Geist, and O. Pietquin Revisiting natural actor-critics with value function approximation V. Torra, Y. Narukawa, M. Daumas, Proceedings of 7th international conference in modeling decisions for artificial intelligence (MDAI 2010) Lecture notes in artificial intelligence LNAI Vol. 6408 2010 Springer Verlag Heidelberg, Berlin, Perpinya (France) 207 218
    • (2010) Proceedings of 7th International Conference in Modeling Decisions for Artificial Intelligence (MDAI 2010) , vol.6408 , pp. 207-218
    • Geist, M.1    Pietquin, O.2
  • 42
    • 58449097347 scopus 로고    scopus 로고
    • Springer-Verlag Berlin, Heidelberg Chapter: Basis expansion in natural actor critic methods (pp. 110-123)
    • S. Girgin, and P. Preux Recent advances in reinforcement learning 2008 Springer-Verlag Berlin, Heidelberg Chapter: Basis expansion in natural actor critic methods (pp. 110-123)
    • (2008) Recent Advances in Reinforcement Learning
    • Girgin, S.1    Preux, P.2
  • 44
    • 79958779459 scopus 로고    scopus 로고
    • Reinforcement learning in feedback control - Challenges and benchmarks from technical process control
    • R. Hafner, and M. Riedmiller Reinforcement learning in feedback control - Challenges and benchmarks from technical process control Machine Learning 84 1-2 2011 137 169
    • (2011) Machine Learning , vol.84 , Issue.12 , pp. 137-169
    • Hafner, R.1    Riedmiller, M.2
  • 45
    • 34047138362 scopus 로고    scopus 로고
    • Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints
    • P. He, and S. Jagannathan Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 37 2 2007 425 436
    • (2007) IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics , vol.37 , Issue.2 , pp. 425-436
    • He, P.1    Jagannathan, S.2
  • 50
    • 85024429815 scopus 로고
    • A new approach to linear filtering and prediction problems
    • SERIES D
    • R.E. Kalman A new approach to linear filtering and prediction problems Journal of Basic Engineering 82 Series D 1960 35 45
    • (1960) Journal of Basic Engineering , vol.82 , pp. 35-45
    • Kalman, R.E.1
  • 54
    • 84866760675 scopus 로고    scopus 로고
    • A novel Q-learning based adaptive optimal controller implementation for a humanoid robotic arm
    • Milan, Italy
    • Khan, S.; Herrmann, G.; Lewis, F.; Pipe, T.; Melhuish, C. (2011a). A novel Q-learning based adaptive optimal controller implementation for a humanoid robotic arm. In IFAC 2011 world congress. Milan, Italy.
    • (2011) IFAC 2011 World Congress
    • Khan, S.1    Herrmann, G.2    Lewis, F.3    Pipe, T.4    Melhuish, C.5
  • 57
    • 84857639663 scopus 로고    scopus 로고
    • Safe adaptive compliance control of a humanoid robotic arm with anti-windup compensation and posture control
    • S. Khan, G. Herrmann, T. Pipe, C. Melhuish, and A. Spiers Safe adaptive compliance control of a humanoid robotic arm with anti-windup compensation and posture control International Journal of Social Robotics 2 2010 305 319
    • (2010) International Journal of Social Robotics , vol.2 , pp. 305-319
    • Khan, S.1    Herrmann, G.2    Pipe, T.3    Melhuish, C.4    Spiers, A.5
  • 58
    • 84860543637 scopus 로고    scopus 로고
    • Towards safe human robot interaction: Integration of compliant control, an anthropomorphic hand and verbal communication
    • Taiwan
    • Khan, S.; Lenz, A.; Herrmann, G.; Pipe, T.; Melhuish, C. (2011c). Towards safe human robot interaction: Integration of compliant control, an anthropomorphic hand and verbal communication. In FIRA 2011 conference: ICAHRR. Taiwan.
    • (2011) FIRA 2011 Conference: ICAHRR
    • Khan, S.1    Lenz, A.2    Herrmann, G.3    Pipe, T.4    Melhuish, C.5
  • 62
    • 0032182160 scopus 로고    scopus 로고
    • Reinforcement learning and robust control for robot compliance tasks
    • C. Kuan, and K. Young Reinforcement learning and robust control for robot compliance tasks Journal of Intelligent and Robotic Systems 23 1998 165 182
    • (1998) Journal of Intelligent and Robotic Systems , vol.23 , pp. 165-182
    • Kuan, C.1    Young, K.2
  • 66
    • 79551685808 scopus 로고    scopus 로고
    • Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data
    • F. Lewis, and K. Vamvoudakis Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 41 1 2011 14 25
    • (2011) IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics , vol.41 , Issue.1 , pp. 14-25
    • Lewis, F.1    Vamvoudakis, K.2
  • 67
    • 70349116541 scopus 로고    scopus 로고
    • Reinforcement learning and adaptive dynamic programming for feedback control
    • F.L. Lewis, and D. Vrabie Reinforcement learning and adaptive dynamic programming for feedback control Circuits and Systems Magazine 09 3 2009 32 50
    • (2009) Circuits and Systems Magazine , vol.9 , Issue.3 , pp. 32-50
    • Lewis, F.L.1    Vrabie, D.2
  • 70
    • 77953096693 scopus 로고    scopus 로고
    • Enhanced Q-learning algorithm for dynamic power management with performance constraint
    • 2010
    • Liu, W.; Tan, Y.; Qiu, Q. (2010). Enhanced Q-learning algorithm for dynamic power management with performance constraint. In Design, automation test in Europe conference exhibition (DATE), 2010 (pp. 602-605).
    • (2010) Design, Automation Test in Europe Conference Exhibition (DATE) , pp. 602-605
    • Liu, W.1    Tan, Y.2    Qiu, Q.3
  • 72
    • 0000955979 scopus 로고    scopus 로고
    • Incremental multi-step Q-learning
    • M. Kaufmann
    • J. Peng, and R.J. Williams Incremental multi-step Q-learning Machine learning 1996 M. Kaufmann 226 232
    • (1996) Machine Learning , pp. 226-232
    • Peng, J.1    Williams, R.J.2
  • 74
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • J. Peters, and S. Schaal Natural actor-critic Neurocomputing 71 7-9 2008 1180 1190
    • (2008) Neurocomputing , vol.71 , Issue.79 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 77
    • 23044526452 scopus 로고    scopus 로고
    • An architecture for learning potential field cognitive maps with an application to mobile robot navigation
    • A.G. Pipe An architecture for learning potential field cognitive maps with an application to mobile robot navigation Adaptive Behavior 8 2 2000 173 204
    • (2000) Adaptive Behavior , vol.8 , Issue.2 , pp. 173-204
    • Pipe, A.G.1
  • 79
    • 33646398129 scopus 로고    scopus 로고
    • Neural fitted Q iteration first experiences with a data efficient neural reinforcement learning method
    • Springer
    • M. Riedmiller Neural fitted Q iteration first experiences with a data efficient neural reinforcement learning method In 16th European conference on machine learning 2005 Springer 317 328
    • (2005) 16th European Conference on Machine Learning , pp. 317-328
    • Riedmiller, M.1
  • 86
    • 4544279348 scopus 로고    scopus 로고
    • Multi-agent reinforcement learning: A critical survey
    • Shoham, Y.; Powers, R.; Grenager, T. (2003). Multi-agent reinforcement learning: A critical survey. Tech. rep.; Stanford University. < http://citeseerx.ist.psu.edu/viewdoc/summary? >.
    • (2003) Tech. Rep.; Stanford University
    • Shoham, Y.1    Powers, R.2    Grenager, T.3
  • 97
    • 57849099219 scopus 로고    scopus 로고
    • A review of reinforcement learning
    • S. Thrun, and M. Littman A review of reinforcement learning AI Magazine 21 2000 103 105
    • (2000) AI Magazine , vol.21 , pp. 103-105
    • Thrun, S.1    Littman, M.2
  • 98
    • 77950630017 scopus 로고    scopus 로고
    • Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
    • K. Vamvoudakis, and F. Lewis Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem Automatica 46 5 2010 878 888
    • (2010) Automatica , vol.46 , Issue.5 , pp. 878-888
    • Vamvoudakis, K.1    Lewis, F.2
  • 100
    • 67349145396 scopus 로고    scopus 로고
    • Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
    • goal-directed neural system
    • D. Vrabie, and F. Lewis Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems Neural Networks 22 3 2009 237 246 goal-directed neural system
    • (2009) Neural Networks , vol.22 , Issue.3 , pp. 237-246
    • Vrabie, D.1    Lewis, F.2
  • 101
    • 58349110975 scopus 로고    scopus 로고
    • Adaptive optimal control for continuous-time linear systems based on policy iteration
    • D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. Lewis Adaptive optimal control for continuous-time linear systems based on policy iteration Automatica 45 2 2009 477 484
    • (2009) Automatica , vol.45 , Issue.2 , pp. 477-484
    • Vrabie, D.1    Pastravanu, O.2    Abu-Khalaf, M.3    Lewis, F.4
  • 103
    • 34249833101 scopus 로고
    • Technical note: Q-learning
    • C. Watkins, and P. Dayan Technical note: Q-learning Machine Learning 8 1992 279 292
    • (1992) Machine Learning , vol.8 , pp. 279-292
    • Watkins, C.1    Dayan, P.2
  • 104
    • 0242639620 scopus 로고    scopus 로고
    • An introduction to the Kalman filter
    • G. Welch, and G. Bishop An introduction to the Kalman filter Design 7 1 2001 1 16
    • (2001) Design , vol.7 , Issue.1 , pp. 1-16
    • Welch, G.1    Bishop, G.2
  • 105
    • 0002031779 scopus 로고
    • Approximate dynamic programming for real-time control and neural modeling
    • White Sofge V.N. Reinhold New York, USA
    • P. Werbos Approximate dynamic programming for real-time control and neural modeling White Sofge Handbook of intelligent control 1992 V.N. Reinhold New York, USA 493 525
    • (1992) Handbook of Intelligent Control , pp. 493-525
    • Werbos, P.1
  • 106
    • 0002437599 scopus 로고
    • Neurocontrol and supervised learning: An overview and evaluationg
    • White Sofge V.N. Reinhold New York, USA
    • P. Werbos Neurocontrol and supervised learning: An overview and evaluationg White Sofge Handbook of intelligent control 1992 V.N. Reinhold New York, USA 65 89
    • (1992) Handbook of Intelligent Control , pp. 65-89
    • Werbos, P.1
  • 107
    • 84893393162 scopus 로고    scopus 로고
    • 1 ADP: Goals, opportunities and principles
    • J. SiBarto, W. Powell, D. Wunsch, Wiley-IEEE Press Piscataway, NJ, USA
    • P. Werbos 1 ADP: Goals, opportunities and principles J. SiBarto, W. Powell, D. Wunsch, Handbook of learning and approximate dynamic programming 2004 Wiley-IEEE Press Piscataway, NJ, USA 3 44
    • (2004) Handbook of Learning and Approximate Dynamic Programming , pp. 3-44
    • Werbos, P.1
  • 108
    • 85013387551 scopus 로고    scopus 로고
    • Using ADP to understand and replicate brain intelligence: The next level design
    • L. Perlovsky, R. Kozma, Understanding complex systems Springer Berlin/Heidelberg
    • P. Werbos Using ADP to understand and replicate brain intelligence: The next level design L. Perlovsky, R. Kozma, Neurodynamics of cognition and consciousness Understanding complex systems Vol. 25 2007 Springer Berlin/Heidelberg 109 123
    • (2007) Neurodynamics of Cognition and Consciousness , vol.25 , pp. 109-123
    • Werbos, P.1
  • 109
    • 67349247013 scopus 로고    scopus 로고
    • Intelligence in the brain: A theory of how it works and how to build it
    • goal-directed neural system
    • P. Werbos Intelligence in the brain: A theory of how it works and how to build it Neural Networks 22 3 2009 200 212 goal-directed neural system
    • (2009) Neural Networks , vol.22 , Issue.3 , pp. 200-212
    • Werbos, P.1
  • 110
    • 0025229247 scopus 로고
    • Consistency of HDP applied to a simple reinforcement learning problem
    • P.J. Werbos Consistency of HDP applied to a simple reinforcement learning problem Neural Networks 3 2 1990 179 189
    • (1990) Neural Networks , vol.3 , Issue.2 , pp. 179-189
    • Werbos, P.J.1
  • 111
    • 49049091767 scopus 로고    scopus 로고
    • Foreword - ADP: The key direction for future research in intelligent control and understanding brain intelligence
    • P.J. Werbos Foreword - ADP: The key direction for future research in intelligent control and understanding brain intelligence IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 38 4 2008 898 900
    • (2008) IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics , vol.38 , Issue.4 , pp. 898-900
    • Werbos, P.J.1
  • 113
    • 84860531726 scopus 로고    scopus 로고
    • Reinforcement learning of optimal controls
    • S.E. Haupt, A. Pasini, C. Marzban, Springer Netherlands
    • J.K. Williams Reinforcement learning of optimal controls S.E. Haupt, A. Pasini, C. Marzban, Artificial intelligence methods in the environmental sciences 2009 Springer Netherlands 297 327
    • (2009) Artificial Intelligence Methods in the Environmental Sciences , pp. 297-327
    • Williams, J.K.1
  • 115
    • 34547098844 scopus 로고    scopus 로고
    • Kernel-based least squares policy iteration for reinforcement learning
    • X. Xu, D. Hu, and X. Lu Kernel-based least squares policy iteration for reinforcement learning IEEE Transactions on Neural Networks 18 4 2007 973 992
    • (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 973-992
    • Xu, X.1    Hu, D.2    Lu, X.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.