SCOPUS 정보 검색 플랫폼

Annual Reviews in Control

Volumn 36, Issue 1, 2012, Pages 42-59

Reinforcement learning and optimal adaptive control: An overview and implementation examples

(5) Khan, Said G a Herrmann, Guido b Lewis, Frank L c Pipe, Tony a Melhuish, Chris b

a UNIVERSITY OF THE WEST OF ENGLAND (United Kingdom)

b UNIVERSITY OF BRISTOL (United Kingdom)

c UNIVERSITY OF TEXAS AT ARLINGTON (United States)

Author keywords

ADP; Optimal adaptive control; Q learning; Reinforcement learning

Indexed keywords

ADAPTIVE CONTROL SYSTEMS; ADMINISTRATIVE DATA PROCESSING; ANTHROPOMORPHIC ROBOTS; CONTROL THEORY; CONTROLLERS; COST FUNCTIONS; DYNAMIC PROGRAMMING; ROBOTIC ARMS; ROBOTICS;

ADAPTIVE CONTROLLERS; ADAPTIVE DYNAMIC PROGRAMMING; HUMANOID ROBOT ARM; LEARNING TECHNIQUES; OPTIMAL ADAPTIVE CONTROLS; OPTIMAL CONTROL THEORY; Q-LEARNING; TRACKING CONTROLLER;

REINFORCEMENT LEARNING;

EID: 84860519689 PISSN: 13675788 EISSN: None Source Type: Journal
DOI: 10.1016/j.arcontrol.2012.03.004 Document Type: Article

Times cited : (204)

References (115)

1
- 34547119809
- Springer-Verlag UK
- ∞ constrained feedback control: A practical approach using neural networks 2006 Springer-Verlag UK
- (2006) ∞ Constrained Feedback Control: A Practical Approach Using Neural Networks
- Abu-Khalaf, M.¹ Huang, J.² Lewis, F.³

2
- 14844340822
- Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
- M. Abu-Khalaf, and F. Lewis Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach Automatica 41 5 2005 779 791
- (2005) Automatica , vol.41 , Issue.5 , pp. 779-791
- Abu-Khalaf, M.¹ Lewis, F.²

3
- 84860525590
- Experience replay for real-time reinforcement learning control
- S. Adam, L. Busoniu, and R. Babuska Experience replay for real-time reinforcement learning control IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 99 2011 1 12
- (2011) IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews , Issue.99 , pp. 1-12
- Adam, S.¹ Busoniu, L.² Babuska, R.³

4
- 49049089962
- Discrete-time nonlinear hjb solution using approximate dynamic programming: Convergence proof
- A. Al-Tamimi, F. Lewis, and F. Abu-Khalaf Discrete-time nonlinear hjb solution using approximate dynamic programming: Convergence proof IEEE Transaction on System, Man and Cybernetics - Part B: Cybernetics 38 4 2008 943 949
- (2008) IEEE Transaction on System, Man and Cybernetics - Part B: Cybernetics , vol.38 , Issue.4 , pp. 943-949
- Al-Tamimi, A.¹ Lewis, F.² Abu-Khalaf, F.³

5
- 33846781129
- ∞ control
- ∞ control Automatica 43 3 2007 47 481
- (2007) Automatica , vol.43 , Issue.3 , pp. 47-481
- Al-Tamimi, A.¹ Lewis, F.² Abu-Khalaf, M.³

6
- 0035100761
- Reinforcement learning in a rule-based navigator for robotic manipulators
- K. Althoefer, B. Krekelberg, D. Husmeier, and L. Seneviratne Reinforcement learning in a rule-based navigator for robotic manipulators Neurocomputing 37 1-4 2001 51 70
- (2001) Neurocomputing , vol.37 , Issue.14 , pp. 51-70
- Althoefer, K.¹ Krekelberg, B.² Husmeier, D.³ Seneviratne, L.⁴

7
- 0030652809
- Learning tasks from a single demonstration
- Atkeson, C. G.; Schaal, S. (1997). Learning tasks from a single demonstration. In IEEE international conference on robotics and automation (icra97) (pp. 1706-1712).
- (1997) IEEE International Conference on Robotics and Automation (icra97) , pp. 1706-1712
- Atkeson, C.G.¹ Schaal, S.²

8
- 49049111594
- Issues on stability of ADP feedback controllers for dynamical systems
- S. Balakrishnan, J. Ding, and F. Lewis Issues on stability of ADP feedback controllers for dynamical systems IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 38 4 2008 913 917
- (2008) IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics , vol.38 , Issue.4 , pp. 913-917
- Balakrishnan, S.¹ Ding, J.² Lewis, F.³

9
- 0029210635
- Learning to act using real-time dynamic programming
- A. Barto, S.J. Bradtke, and S.P. Singh Learning to act using real-time dynamic programming Artificial Intelligence 72 1995 81 138
- (1995) Artificial Intelligence , vol.72 , pp. 81-138
- Barto, A.¹ Bradtke, S.J.² Singh, S.P.³

10
- 0020970738
- Neuron-like adaptive elements that can solve difficult learning control problems
- A. Barto, and R. Sutton Neuron-like adaptive elements that can solve difficult learning control problems IEEE Transactions on Systems, Man, and Cybernetics 5 1983 834 846
- (1983) IEEE Transactions on Systems, Man, and Cybernetics , vol.5 , pp. 834-846
- Barto, A.¹ Sutton, R.²

11
- 78650159903
- Barto, A.; Sutton, R. S.; Werbos, P. J. (1989). Connectionist learning for control: An overview.
- (1989) Connectionist Learning for Control: An Overview
- Barto, A.¹ Sutton, R.S.² Werbos, P.J.³

12
- 0002283578
- Reinforcement learning and adaptive critic methods
- White Sofge V.N. Reinhold New York
- A.G. Barto Reinforcement learning and adaptive critic methods White Sofge Handbook of intelligent control 1992 V.N. Reinhold New York 69 91
- (1992) Handbook of Intelligent Control , pp. 69-91
- Barto, A.G.¹

13
- 79953172693
- Neuro-dynamic programming: An overview and recent results
- K.-H. Waldmann, U.M. Stocker, Springer Berlin Heidelberg
- D. Bertsekas Neuro-dynamic programming: An overview and recent results K.-H. Waldmann, U.M. Stocker, Operations research proceedings 2006 2006 Springer Berlin Heidelberg 71 72
- (2006) Operations Research Proceedings 2006 , pp. 71-72
- Bertsekas, D.¹

14
- 0003565783
- 3rd ed. Springer
- Bertsekas, D. (Ed.) (2007). Dynamic programming and optimal control. Vol. II, 3rd ed. Springer.
- (2007) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.¹

15
- 0003351108
- Neuro-dynamic programming
- Bertsekas, D. (2009). Neuro-dynamic programming. In Encyclopedia of optimization (pp. 2555-2560).
- (2009) Encyclopedia of Optimization , pp. 2555-2560
- Bertsekas, D.¹

16
- 79953155554
- Bertsekas, D. (2010a). Approximate policy iteration: A survey and some new methods. < http://stuff.mit.edu/people/dimitrib/API-Survey.pdf >.
- (2010) Approximate Policy Iteration: A Survey and Some New Methods
- Bertsekas, D.¹

17
- 84860501567
- Temporal difference methods for general projected equations
- D. Bertsekas Temporal difference methods for general projected equations IEEE Transactions on Automatic Control 99 2011 1
- (2011) IEEE Transactions on Automatic Control , Issue.99 , pp. 1
- Bertsekas, D.¹

18
- 0003487482
- Athena Scientific
- D. Bertsekas, and J. Tsitsiklis Neuro-dynamic programming 1996 Athena Scientific
- (1996) Neuro-dynamic Programming
- Bertsekas, D.¹ Tsitsiklis, J.²

19
- 79953158573
- Q-learning and enhanced policy iteration in discounted dynamic programming
- Bertsekas, D.; Yu, H. (2010). Q-learning and enhanced policy iteration in discounted dynamic programming. In Proceedings of the 49th IEEE conference on decision and control, CDC (pp. 1409-1416).
- (2010) Proceedings of the 49th IEEE Conference on Decision and Control, CDC , pp. 1409-1416
- Bertsekas, D.¹ Yu, H.²

20
- 33645410501
- Dynamic programming and suboptimal control: A survey from ADP to MPC
- D.P. Bertsekas Dynamic programming and suboptimal control: A survey from ADP to MPC European Journal of Control 11 4-5 2005 310 334
- (2005) European Journal of Control , vol.11 , Issue.45 , pp. 310-334
- Bertsekas, D.P.¹

21
- 84980552700
- Dynamic programming and suboptimal control: A survey from ADP to MPC
- Bertsekas, D. P. (2005b). Dynamic programming and suboptimal control: A survey from ADP to MPC. In CDC proceedings.
- (2005) CDC Proceedings
- Bertsekas, D.P.¹

22
- 79953145727
- Pathologies of temporal difference methods in approximate dynamic programming
- Bertsekas, D. P. (2010b). Pathologies of temporal difference methods in approximate dynamic programming. In Proceedings of the 49th IEEE conference on decision and control, CDC 2010 (pp. 3034-3039).
- (2010) Proceedings of the 49th IEEE Conference on Decision and Control, CDC 2010 , pp. 3034-3039
- Bertsekas, D.P.¹

23
- 70349984547
- Natural actor-critic algorithms
- S. Bhatnagar, R.S. Sutton, M. Ghavamzadeh, and M. Lee Natural actor-critic algorithms Automatica 45 2009 2471 2482
- (2009) Automatica , vol.45 , pp. 2471-2482
- Bhatnagar, S.¹ Sutton, R.S.² Ghavamzadeh, M.³ Lee, M.⁴

24
- 0036832950
- Technical update: Least-squares temporal difference learning
- J.A. Boyan Technical update: Least-squares temporal difference learning Machine Learning 49 2 2002 233 246
- (2002) Machine Learning , vol.49 , Issue.2 , pp. 233-246
- Boyan, J.A.¹

25
- 0000859970
- Reinforcement learning applied to linear quadratic regulation
- N. Morgan Kaufman
- S.J. Bradtke Reinforcement learning applied to linear quadratic regulation Advances in neural information processing systems 5 1993 N. Morgan Kaufman 295 302
- (1993) Advances in Neural Information Processing Systems 5 , pp. 295-302
- Bradtke, S.J.¹

26
- 0033324616
- Application of reinforcement learning control to a nonlinear dexterous robot
- 1999
- Bucak, I.; Zohdy, M. (1999). Application of reinforcement learning control to a nonlinear dexterous robot. In Proceedings of the 38th IEEE conference on decision and control, 1999 (Vol. 5, pp. 5108 -5113).
- (1999) Proceedings of the 38th IEEE Conference on Decision and Control , vol.5 , pp. 5108-5113
- Bucak, I.¹ Zohdy, M.²

27
- 0035493967
- Reinforcement learning control of nonlinear multi-link system
- O. Bucak, and M. Zohdy Reinforcement learning control of nonlinear multi-link system Engineering Applications of Artificial Intelligence 14 5 2001 563 575
- (2001) Engineering Applications of Artificial Intelligence , vol.14 , Issue.5 , pp. 563-575
- Bucak, O.¹ Zohdy, M.²

28
- 79551503316
- Variable impedance control - A reinforcement learning approach
- Buchli, J.; Theodorou, E.; Stulp, F.; Schaal, S. (2010). variable impedance control - A reinforcement learning approach. In Robotics science and systems.
- (2010) Robotics Science and Systems
- Buchli, J.¹ Theodorou, E.² Stulp, F.³ Schaal, S.⁴

29
- 34547192059
- Multi-agent reinforcement learning: A survey
- Busoniu, L.; Babuska, R.; De Schutter, B. (2006). Multi-agent reinforcement learning: A survey. In 9th International conference on control, automation, robotics and vision, 2006 (ICARCV '06) (pp. 1-6).
- (2006) 9th International Conference on Control, Automation, Robotics and Vision, 2006 (ICARCV '06) , pp. 1-6
- Busoniu, L.¹ Babuska, R.² De Schutter, B.³

30
- 40949147745
- A comprehensive survey of multiagent reinforcement learning
- L. Busoniu, R. Babuska, and B. De Schutter A comprehensive survey of multiagent reinforcement learning IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38 2 2008 156 172
- (2008) IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews , vol.38 , Issue.2 , pp. 156-172
- Busoniu, L.¹ Babuska, R.² De Schutter, B.³

31
- 85046476577
- CRS Press Taylor and Francis Group, USA
- L. Busoniu, R. Babuska, B. Schutter, and D. Ernst Reinforcement learning and dynamic programming using function approximators 2010 CRS Press Taylor and Francis Group, USA
- (2010) Reinforcement Learning and Dynamic Programming Using Function Approximators
- Busoniu, L.¹ Babuska, R.² Schutter, B.³ Ernst, D.⁴

32
- 77958522395
- Using prior knowledge to accelerate online least-squares policy iteration
- Busoniu, L.; De Schutter, B.; Babuska, R.; Ernst, D. (2010b). Using prior knowledge to accelerate online least-squares policy iteration. In 2010 IEEE international conference on automation quality and testing robotics (AQTR) (Vol. 1, pp. 1-6).
- (2010) 2010 IEEE International Conference on Automation Quality and Testing Robotics (AQTR) , vol.1 , pp. 1-6
- Busoniu, L.¹ De Schutter, B.² Babuska, R.³ Ernst, D.⁴

33
- 77957782880
- Online least-squares policy iteration for reinforcement learning control
- Busoniu, L.; Ernst, D.; De Schutter, B.; Babuska, R. (2010c). Online least-squares policy iteration for reinforcement learning control. In American control conference (ACC), 2010 (pp. 486-491).
- (2010) American Control Conference (ACC), 2010 , pp. 486-491
- Busoniu, L.¹ Ernst, D.² De Schutter, B.³ Babuska, R.⁴

34
- 84860532988
- Real-time dynamic fuzzy Q-learning and control of mobile robots
- World Scientific and Engineering Academy and Society (WSEAS) Stevens Point, Wisconsin, USA
- D. Chang, and M.J. Er Real-time dynamic fuzzy Q-learning and control of mobile robots Proceedings of the 2nd WSEAS international conference on electronics, control and signal processing 2003 World Scientific and Engineering Academy and Society (WSEAS) Stevens Point, Wisconsin, USA 82:1 82:9
- (2003) Proceedings of the 2nd WSEAS International Conference on Electronics, Control and Signal Processing , pp. 821-829
- Chang, D.¹ Er, M.J.²

35
- 0029765998
- Nested Q-learning of hierarchical control structures
- Digney, B. (1996). Nested Q-learning of hierarchical control structures. In IEEE international conference on neural networks, 1996 (Vol. 1, pp. 161-166).
- (1996) IEEE International Conference on Neural Networks, 1996 , vol.1 , pp. 161-166
- Digney, B.¹

36
- 21844465127
- Tree-based batch mode reinforcement learning
- D. Ernst, P. Geurts, L. Wehenkel, and L. Littman Tree-based batch mode reinforcement learning Journal of Machine Learning Research 6 2005 503 556
- (2005) Journal of Machine Learning Research , vol.6 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³ Littman, L.⁴

37
- 81355166317
- Approximate value iteration in the reinforcement learning context: Application to electrical power system control
- D. Ernst, M. Glavic, P. Geurts, and L. Wehenkel Approximate value iteration in the reinforcement learning context: Application to electrical power system control International Journal of Emerging Electric Power Systems 3 1 2005 1066.1 1066.37
- (2005) International Journal of Emerging Electric Power Systems , vol.3 , Issue.1 , pp. 10661-106637
- Ernst, D.¹ Glavic, M.² Geurts, P.³ Wehenkel, L.⁴

38
- 0346242001
- Phd thesis
- Gaskett, C. (2002). Q learning for robot control: Phd thesis.
- (2002) Q Learning for Robot Control
- Gaskett, C.¹

39
- 78651465938
- Kalman temporal differences
- M. Geist, and O. Pietquin Kalman temporal differences Journal of Artificial Intelligence Research 39 2010 483 532
- (2010) Journal of Artificial Intelligence Research , vol.39 , pp. 483-532
- Geist, M.¹ Pietquin, O.²

40
- 79956274048
- Revisiting natural actor-critics with value function approximation
- V. Torra, Y. Narukawa, M. Daumas, Lecture notes in artificial intelligence LNAI Springer Verlag Heidelberg, Berlin, Perpinya (France)
- M. Geist, and O. Pietquin Revisiting natural actor-critics with value function approximation V. Torra, Y. Narukawa, M. Daumas, Proceedings of 7th international conference in modeling decisions for artificial intelligence (MDAI 2010) Lecture notes in artificial intelligence LNAI Vol. 6408 2010 Springer Verlag Heidelberg, Berlin, Perpinya (France) 207 218
- (2010) Proceedings of 7th International Conference in Modeling Decisions for Artificial Intelligence (MDAI 2010) , vol.6408 , pp. 207-218
- Geist, M.¹ Pietquin, O.²

41
- 76649127744
- Tracking in reinforcement learning
- Springer-Verlag Berlin, Heidelberg
- M. Geist, O. Pietquin, and G. Fricout Tracking in reinforcement learning Proceedings of the 16th international conference on neural information processing: Part I (ICONIP '09) 2009 Springer-Verlag Berlin, Heidelberg 502 511
- (2009) Proceedings of the 16th International Conference on Neural Information Processing: Part i (ICONIP '09) , pp. 502-511
- Geist, M.¹ Pietquin, O.² Fricout, G.³

42
- 58449097347
- Springer-Verlag Berlin, Heidelberg Chapter: Basis expansion in natural actor critic methods (pp. 110-123)
- S. Girgin, and P. Preux Recent advances in reinforcement learning 2008 Springer-Verlag Berlin, Heidelberg Chapter: Basis expansion in natural actor critic methods (pp. 110-123)
- (2008) Recent Advances in Reinforcement Learning
- Girgin, S.¹ Preux, P.²

43
- 36348930983
- Neural reinforcement learning controllers for a real robot application
- Hafner, R.; Riedmiller, M. (2007). Neural reinforcement learning controllers for a real robot application. In IEEE international conference on robotics and automation, 2007 (pp. 2098-2103).
- (2007) IEEE International Conference on Robotics and Automation, 2007 , pp. 2098-2103
- Hafner, R.¹ Riedmiller, M.²

44
- 79958779459
- Reinforcement learning in feedback control - Challenges and benchmarks from technical process control
- R. Hafner, and M. Riedmiller Reinforcement learning in feedback control - Challenges and benchmarks from technical process control Machine Learning 84 1-2 2011 137 169
- (2011) Machine Learning , vol.84 , Issue.12 , pp. 137-169
- Hafner, R.¹ Riedmiller, M.²

45
- 34047138362
- Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints
- P. He, and S. Jagannathan Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 37 2 2007 425 436
- (2007) IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics , vol.37 , Issue.2 , pp. 425-436
- He, P.¹ Jagannathan, S.²

46
- 84860528287
- Reinforcement learning in a nutshell
- Heidrich-Meisner, V.; Lauer, M.; Igel, C.; Riedmiller, M. A. (2007). Reinforcement learning in a nutshell. In ESANN (pp. 277-288).
- (2007) ESANN , pp. 277-288
- Heidrich-Meisner, V.¹ Lauer, M.² Igel, C.³ Riedmiller, M.A.⁴

47
- 84860513650
- Behavioral experiments on reinforcement learning in human motor control
- Hoffmann, H.; Theodorou, E.; Schaal, S. (2008). Behavioral experiments on reinforcement learning in human motor control. In Abstracts of the eighteenth annual meeting of neural control of movement.
- (2008) Abstracts of the Eighteenth Annual Meeting of Neural Control of Movement
- Hoffmann, H.¹ Theodorou, E.² Schaal, S.³

48
- 84860506271
- Biological robot arm motion through reinforcement learning
- Izawa, J.; Kondo, T.; Ito, K. (2002). Biological robot arm motion through reinforcement learning. In SICE 2002: Proceedings of the 41st SICE annual conference (Vol. 1, pp. 413-418).
- (2002) SICE 2002: Proceedings of the 41st SICE Annual Conference , vol.1 , pp. 413-418
- Izawa, J.¹ Kondo, T.² Ito, K.³

49
- 0029679044
- Reinforcement learning: A survey
- L. Kael, M. Littman, and A. Moore Reinforcement learning: A survey Journal of Artificial Intelligence Research 4 0 1997 237 285
- (1997) Journal of Artificial Intelligence Research , vol.4 , Issue.0 , pp. 237-285
- Kael, L.¹ Littman, M.² Moore, A.³

50
- 85024429815
- A new approach to linear filtering and prediction problems
- SERIES D
- R.E. Kalman A new approach to linear filtering and prediction problems Journal of Basic Engineering 82 Series D 1960 35 45
- (1960) Journal of Basic Engineering , vol.82 , pp. 35-45
- Kalman, R.E.¹

51
- 33947202284
- Online adaptive critic flight control using approximated plant dynamics
- Kampen, E.; Chu, Q.; Mulder, J. (2006). Online adaptive critic flight control using approximated plant dynamics. In International conference on machine learning and cybernetics, 2006 (pp. 256-261).
- (2006) International Conference on Machine Learning and Cybernetics, 2006 , pp. 256-261
- Kampen, E.¹ Chu, Q.² Mulder, J.³

52
- 29044440299
- Path integrals and symmetry breaking for optimal control theory
- H. Kappen Path integrals and symmetry breaking for optimal control theory Journal of Statistical Mechanics: Theory and Experiment 2005
- (2005) Journal of Statistical Mechanics: Theory and Experiment
- Kappen, H.¹

53
- 57149092201
- Kappen, H. (2008). An introduction to stochastic control theory, path integrals and reinforcement learning. < http://www.snn.ru.nl >.
- (2008) An Introduction to Stochastic Control Theory, Path Integrals and Reinforcement Learning
- Kappen, H.¹

54
- 84866760675
- A novel Q-learning based adaptive optimal controller implementation for a humanoid robotic arm
- Milan, Italy
- Khan, S.; Herrmann, G.; Lewis, F.; Pipe, T.; Melhuish, C. (2011a). A novel Q-learning based adaptive optimal controller implementation for a humanoid robotic arm. In IFAC 2011 world congress. Milan, Italy.
- (2011) IFAC 2011 World Congress
- Khan, S.¹ Herrmann, G.² Lewis, F.³ Pipe, T.⁴ Melhuish, C.⁵

55
- 82955228032
- Q-learning based cartesian model reference compliance controller implementation for a humanoid robot arm
- China
- Khan, S.; Herrmann, G.; Lewis, F.; Pipe, T.; Melhuish, C. (2011b). Q-learning based cartesian model reference compliance controller implementation for a humanoid robot arm. In IEEE international conference CIS/RAM 2011. China.
- (2011) IEEE International Conference CIS/RAM 2011
- Khan, S.¹ Herrmann, G.² Lewis, F.³ Pipe, T.⁴ Melhuish, C.⁵

56
- 78651490965
- Adaptive multi-dimensional compliance control of a humanoid robotic arm with anti-windup compensation
- 2010
- Khan, S.; Herrmann, G.; Pipe, T.; Melhuish, C. (2010a). Adaptive multi-dimensional compliance control of a humanoid robotic arm with anti-windup compensation. In IEEE/RSJ international conference on intelligent robots and systems (IROS), 2010 (pp. 2218-2223).
- (2010) IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 2218-2223
- Khan, S.¹ Herrmann, G.² Pipe, T.³ Melhuish, C.⁴

57
- 84857639663
- Safe adaptive compliance control of a humanoid robotic arm with anti-windup compensation and posture control
- S. Khan, G. Herrmann, T. Pipe, C. Melhuish, and A. Spiers Safe adaptive compliance control of a humanoid robotic arm with anti-windup compensation and posture control International Journal of Social Robotics 2 2010 305 319
- (2010) International Journal of Social Robotics , vol.2 , pp. 305-319
- Khan, S.¹ Herrmann, G.² Pipe, T.³ Melhuish, C.⁴ Spiers, A.⁵

58
- 84860543637
- Towards safe human robot interaction: Integration of compliant control, an anthropomorphic hand and verbal communication
- Taiwan
- Khan, S.; Lenz, A.; Herrmann, G.; Pipe, T.; Melhuish, C. (2011c). Towards safe human robot interaction: Integration of compliant control, an anthropomorphic hand and verbal communication. In FIRA 2011 conference: ICAHRR. Taiwan.
- (2011) FIRA 2011 Conference: ICAHRR
- Khan, S.¹ Lenz, A.² Herrmann, G.³ Pipe, T.⁴ Melhuish, C.⁵

59
- 77957342942
- Application of soft computing techniques to a LQG controller design
- X.-T. Yan, C. Jiang, B. Eynard, Springer London
- S.G. Khan, W. Naeem, R. Sutton, and S. Sharma Application of soft computing techniques to a LQG controller design X.-T. Yan, C. Jiang, B. Eynard, Advanced design and manufacture to gain a competitive edge 2008 Springer London 137 146
- (2008) Advanced Design and Manufacture to Gain A Competitive Edge , pp. 137-146
- Khan, S.G.¹ Naeem, W.² Sutton, R.³ Sharma, S.⁴

60
- 51649113734
- Learning robot stiffness for contact tasks using the natural actor-critic
- Kim, B.; Kang, B.; Park, S.; Kang, S. (2008). Learning robot stiffness for contact tasks using the natural actor-critic. In IEEE international conference on robotics and automation, 2008 (ICRA 2008) (pp. 3832-3837).
- (2008) IEEE International Conference on Robotics and Automation, 2008 (ICRA 2008) , pp. 3832-3837
- Kim, B.¹ Kang, B.² Park, S.³ Kang, S.⁴

61
- 77949776001
- Impedance learning for robotic contact task using natural actor-critic algorithm
- B. Kim, J. Park, S. Park, and S. Kang Impedance learning for robotic contact task using natural actor-critic algorithm IEEE Transaction on System, Man and Cybernetics, part B: Cybernetics 4 2 2010
- (2010) IEEE Transaction on System, Man and Cybernetics, Part B: Cybernetics , vol.4 , Issue.2
- Kim, B.¹ Park, J.² Park, S.³ Kang, S.⁴

62
- 0032182160
- Reinforcement learning and robust control for robot compliance tasks
- C. Kuan, and K. Young Reinforcement learning and robust control for robot compliance tasks Journal of Intelligent and Robotic Systems 23 1998 165 182
- (1998) Journal of Intelligent and Robotic Systems , vol.23 , pp. 165-182
- Kuan, C.¹ Young, K.²

63
- 55749086450
- V. Kuzmin Connectionist Q-learning in robot control task 2002
- (2002) Connectionist Q-learning in Robot Control Task
- Kuzmin, V.¹

64
- 4644323293
- Least-squares policy iteration
- M.G. Lagoudakis, and R. Parr Least-squares policy iteration Journal of Machine Learning Research 4 2003 1107 1149
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

65
- 3242739038
- Marcel Dekker Inc New York, USA
- F. Lewis, D. Dawson, and C. Abdallah Robot manipulator control: Theory and practice 2003 Marcel Dekker Inc New York, USA
- (2003) Robot Manipulator Control: Theory and Practice
- Lewis, F.¹ Dawson, D.² Abdallah, C.³

66
- 79551685808
- Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data
- F. Lewis, and K. Vamvoudakis Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 41 1 2011 14 25
- (2011) IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics , vol.41 , Issue.1 , pp. 14-25
- Lewis, F.¹ Vamvoudakis, K.²

67
- 70349116541
- Reinforcement learning and adaptive dynamic programming for feedback control
- F.L. Lewis, and D. Vrabie Reinforcement learning and adaptive dynamic programming for feedback control Circuits and Systems Magazine 09 3 2009 32 50
- (2009) Circuits and Systems Magazine , vol.9 , Issue.3 , pp. 32-50
- Lewis, F.L.¹ Vrabie, D.²

68
- 84899834143
- Online exploration in least-squares policy iteration
- Li, L.; Littman, M. L.; Mansley, C. (2009). Online exploration in least-squares policy iteration. In Proceedings of 8th international conference on autonomous agents and multiagent systems (AAMAS 2009).
- (2009) Proceedings of 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009)
- Li, L.¹ Littman, M.L.² Mansley, C.³

69
- 73649096185
- A survey of approximate dynamic programming
- Lin, W.; Hui, P.; Hua-yong, Z.; Lin-cheng, S. (2009). A survey of approximate dynamic programming. In International conference on intelligent human-machine systems and cybernetics, 2009 (IHMSC '09) (Vol. 2, pp. 396-399).
- (2009) International Conference on Intelligent Human-machine Systems and Cybernetics, 2009 (IHMSC '09) , vol.2 , pp. 396-399
- Lin, W.¹ Hui, P.² Hua-Yong, Z.³ Lin-Cheng, S.⁴

70
- 77953096693
- Enhanced Q-learning algorithm for dynamic power management with performance constraint
- 2010
- Liu, W.; Tan, Y.; Qiu, Q. (2010). Enhanced Q-learning algorithm for dynamic power management with performance constraint. In Design, automation test in Europe conference exhibition (DATE), 2010 (pp. 602-605).
- (2010) Design, Automation Test in Europe Conference Exhibition (DATE) , pp. 602-605
- Liu, W.¹ Tan, Y.² Qiu, Q.³

71
- 77954101982
- Gq(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
- Atlantis Press
- H.R. Maei, and R.S. Sutton Gq(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces Proceedings of the 3d conference on artificial general intelligence, AGI10 2010 Atlantis Press 1 6
- (2010) Proceedings of the 3d Conference on Artificial General Intelligence, AGI10 , pp. 1-6
- Maei, H.R.¹ Sutton, R.S.²

72
- 0000955979
- Incremental multi-step Q-learning
- M. Kaufmann
- J. Peng, and R.J. Williams Incremental multi-step Q-learning Machine learning 1996 M. Kaufmann 226 232
- (1996) Machine Learning , pp. 226-232
- Peng, J.¹ Williams, R.J.²

73
- 38649095925
- Learning to control in operational space
- J. Peters, and S. Schaal Learning to control in operational space International Journal of Robotics Research 2008 197 212
- (2008) International Journal of Robotics Research , pp. 197-212
- Peters, J.¹ Schaal, S.²

74
- 40649106649
- Natural actor-critic
- J. Peters, and S. Schaal Natural actor-critic Neurocomputing 71 7-9 2008 1180 1190
- (2008) Neurocomputing , vol.71 , Issue.79 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

75
- 34447553096
- Reinfrocement learning for humanoid robotics
- Karlsruhe, Germany
- Peters, J. Vijayakumar, S.; Schaal, S. (2003). Reinfrocement learning for humanoid robotics. In Third IEEE-RAS international conference on humanoid robots. Karlsruhe, Germany.
- (2003) Third IEEE-RAS International Conference on Humanoid Robots
- Peters Vijayakumar, J.S.¹ Schaal, S.²

76
- 84881039547
- Sample efficient on-line learning of optimal dialogue policies with Kalman temporal differences
- Barcelona, Spain
- Pietquin, O.; Geist, M.; Chandramohan, S.; Frezza-Buet, H. (2011). Sample efficient on-line learning of optimal dialogue policies with Kalman temporal differences. In International joint conference on artificial intelligence (IJCAI 2011) (pp. 1878-1883). Barcelona, Spain.
- (2011) International Joint Conference on Artificial Intelligence (IJCAI 2011) , pp. 1878-1883
- Pietquin, O.¹ Geist, M.² Chandramohan, S.³ Frezza-Buet, H.⁴

77
- 23044526452
- An architecture for learning potential field cognitive maps with an application to mobile robot navigation
- A.G. Pipe An architecture for learning potential field cognitive maps with an application to mobile robot navigation Adaptive Behavior 8 2 2000 173 204
- (2000) Adaptive Behavior , vol.8 , Issue.2 , pp. 173-204
- Pipe, A.G.¹

78
- 0031236002
- Adaptive critic designs
- D. Prokhorov, and D. Wunsch Adaptive critic designs IEEE Transactions on Neural Networks 8 5 1997 997 1007
- (1997) IEEE Transactions on Neural Networks , vol.8 , Issue.5 , pp. 997-1007
- Prokhorov, D.¹ Wunsch, D.²

79
- 33646398129
- Neural fitted Q iteration first experiences with a data efficient neural reinforcement learning method
- Springer
- M. Riedmiller Neural fitted Q iteration first experiences with a data efficient neural reinforcement learning method In 16th European conference on machine learning 2005 Springer 317 328
- (2005) 16th European Conference on Machine Learning , pp. 317-328
- Riedmiller, M.¹

80
- 67650996818
- Reinforcement learning for robot soccer
- M. Riedmiller, T. Gabel, R. Hafner, and S. Lange Reinforcement learning for robot soccer Autonomous Robots 27 1 2009 55 73
- (2009) Autonomous Robots , vol.27 , Issue.1 , pp. 55-73
- Riedmiller, M.¹ Gabel, T.² Hafner, R.³ Lange, S.⁴

81
- 34548763245
- Evaluation of policy gradient methods and variants on the cart-pole benchmark
- Riedmiller, M.; Peters, J.; Schaal, S. (2007). Evaluation of policy gradient methods and variants on the cart-pole benchmark. In IEEE international symposium on approximate dynamic programming and reinforcement learning, 2007 (ADPRL 2007) (pp. 254-261).
- (2007) IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007 (ADPRL 2007) , pp. 254-261
- Riedmiller, M.¹ Peters, J.² Schaal, S.³

82
- 34047183193
- Reinforcement learning interfaces for biomedical database systems
- New York City, USA
- Rudowsky, I.; Kulyba, M.; Kunin, M.; Parsons, S.; Raphan, T. (2006). Reinforcement learning interfaces for biomedical database systems. In Proceedings of the 28th IEEE EMBS annual international conference (pp. 6269-6272). New York City, USA.
- (2006) Proceedings of the 28th IEEE EMBS Annual International Conference , pp. 6269-6272
- Rudowsky, I.¹ Kulyba, M.² Kunin, M.³ Parsons, S.⁴ Raphan, T.⁵

83
- 85056061207
- Taylor & Francis USA
- J. Sarangapani Neural network control of nonlinear discrete-time systems 2006 Taylor & Francis USA
- (2006) Neural Network Control of Nonlinear Discrete-time Systems
- Sarangapani, J.¹

84
- 84898995067
- Learning by demontration
- S. Schaal Learning by demontration Advances in Neural Information Processing Systems 9 1997 1040 1046
- (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 1040-1046
- Schaal, S.¹

85
- 67650330019
- Reinforcement learning control of robot manipulators in uncertain environments
- Shah, H.; Gopal, M. (2009). Reinforcement learning control of robot manipulators in uncertain environments. In IEEE international conference on industrial technology, 2009 (ICIT 2009) (pp. 1-6).
- (2009) IEEE International Conference on Industrial Technology, 2009 (ICIT 2009) , pp. 1-6
- Shah, H.¹ Gopal, M.²

86
- 4544279348
- Multi-agent reinforcement learning: A critical survey
- Shoham, Y.; Powers, R.; Grenager, T. (2003). Multi-agent reinforcement learning: A critical survey. Tech. rep.; Stanford University. < http://citeseerx.ist.psu.edu/viewdoc/summary? >.
- (2003) Tech. Rep.; Stanford University
- Shoham, Y.¹ Powers, R.² Grenager, T.³

87
- 84921399937
- IEEE Press/Wiley
- J. Si, A. Barto, W. Powell, and D. Wunsch Handbook of learning and approximate dynamic programming 2004 IEEE Press/Wiley
- (2004) Handbook of Learning and Approximate Dynamic Programming
- Si, J.¹ Barto, A.² Powell, W.³ Wunsch, D.⁴

88
- 0035763997
- Reinforcement learning for robot control
- D. Gage, & H. Choset (Eds.)
- Smart, W.; Kaelbling, L. (2002). Reinforcement learning for robot control. In D. Gage, & H. Choset (Eds.), Society of photo-optical instrumentation engineers (SPIE) conference series (Vol. 4573, pp. 92-103).
- (2002) Society of Photo-optical Instrumentation Engineers (SPIE) Conference Series , vol.4573 , pp. 92-103
- Smart, W.¹ Kaelbling, L.²

89
- 82955207401
- Adaptive dynamic programming applied to a 6dof quadrotor
- B. Igelnik, IGI Global
- P. Stingu, and F. Lewis Adaptive dynamic programming applied to a 6dof quadrotor B. Igelnik, Computational modeling and simulation of intellect: Current state and future perspectives 2010 IGI Global
- (2010) Computational Modeling and Simulation of Intellect: Current State and Future Perspectives
- Stingu, P.¹ Lewis, F.²

90
- 0003617454
- Phd thesis
- Sutton, R. (1984). Temporal credit assigning in reinforcement learning: Phd thesis.
- (1984) Temporal Credit Assigning in Reinforcement Learning
- Sutton, R.¹

91
- 0026385066
- Reinforcement learning is direct adaptive optimal control
- Sutton, R.; Barto, A. G.; Williams, R. J. (1992). Reinforcement learning is direct adaptive optimal control. In Proceedings of the American control conference (pp. 2143-2146).
- (1992) Proceedings of the American Control Conference , pp. 2143-2146
- Sutton, R.¹ Barto, A.G.² Williams, R.J.³

92
- 72949089205
- Reinforcement learning: Past, present and future
- Springer-Verlag London, UK
- R.S. Sutton Reinforcement learning: Past, present and future Selected papers from the second Asia-Pacific conference on simulated evolution and learning (SEAL'98) 1999 Springer-Verlag London, UK 195 197
- (1999) Selected Papers from the Second Asia-Pacific Conference on Simulated Evolution and Learning (SEAL'98) , pp. 195-197
- Sutton, R.S.¹

93
- 0004102479
- The MIT Press Cambridge, Massachusetts London, England
- R.S. Sutton, and A.G. Barto Reinforcement learning: An introduction 1998 The MIT Press Cambridge, Massachusetts London, England
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

94
- 71149099079
- Fast gradient-descent methods for temporal-difference learning with linear function approximation
- Sutton, R. S.; Maei, H. R.; Precup, D.; Bhatnagar, S.; Silver, D.; Szepesvri, C.; et al. (2009). Fast gradient-descent methods for temporal-difference learning with linear function approximation. In Proceedings of the 26th international conference on machine learning.
- (2009) Proceedings of the 26th International Conference on Machine Learning
- Sutton, R.S.¹ Maei, H.R.² Precup, D.³ Bhatnagar, S.⁴ Silver, D.⁵ Szepesvri, C.⁶

95
- 77955836276
- Reinforcement learning of motor skills in high dimensions: A path integral approach
- 2010
- Theodorou, E.; Buchli, J.; Schaal, S. (2010). Reinforcement learning of motor skills in high dimensions: A path integral approach. In IEEE international conference on robotics and automation (ICRA), 2010 (pp. 2397-2403).
- (2010) IEEE International Conference on Robotics and Automation (ICRA) , pp. 2397-2403
- Theodorou, E.¹ Buchli, J.² Schaal, S.³

96
- 49249136036
- Reinforcement learning for optimal control of arm movements
- Theodorou, E.; Peters, J.; Schaal, S. (2007). Reinforcement learning for optimal control of arm movements. In Abstracts of the 37st meeting of the society of neuroscience.
- (2007) Abstracts of the 37st Meeting of the Society of Neuroscience
- Theodorou, E.¹ Peters, J.² Schaal, S.³

97
- 57849099219
- A review of reinforcement learning
- S. Thrun, and M. Littman A review of reinforcement learning AI Magazine 21 2000 103 105
- (2000) AI Magazine , vol.21 , pp. 103-105
- Thrun, S.¹ Littman, M.²

98
- 77950630017
- Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
- K. Vamvoudakis, and F. Lewis Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem Automatica 46 5 2010 878 888
- (2010) Automatica , vol.46 , Issue.5 , pp. 878-888
- Vamvoudakis, K.¹ Lewis, F.²

99
- 0031388983
- A neuro-dynamic programming approach to retailer inventory management
- Van Roy, B.; Bertsekas, D.; Lee, Y.; Tsitsiklis, J. (1997). A neuro-dynamic programming approach to retailer inventory management. In Proceedings of the 36th IEEE conference on decision and control, 1997 (Vol. 4, pp. 4052-4057).
- (1997) Proceedings of the 36th IEEE Conference on Decision and Control, 1997 , vol.4 , pp. 4052-4057
- Van Roy, B.¹ Bertsekas, D.² Lee, Y.³ Tsitsiklis, J.⁴

100
- 67349145396
- Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
- goal-directed neural system
- D. Vrabie, and F. Lewis Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems Neural Networks 22 3 2009 237 246 goal-directed neural system
- (2009) Neural Networks , vol.22 , Issue.3 , pp. 237-246
- Vrabie, D.¹ Lewis, F.²

101
- 58349110975
- Adaptive optimal control for continuous-time linear systems based on policy iteration
- D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. Lewis Adaptive optimal control for continuous-time linear systems based on policy iteration Automatica 45 2 2009 477 484
- (2009) Automatica , vol.45 , Issue.2 , pp. 477-484
- Vrabie, D.¹ Pastravanu, O.² Abu-Khalaf, M.³ Lewis, F.⁴

102
- 0004049893
- Phd thesis
- Watkins, C. (1989). Learning from delayed rewards: Phd thesis.
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

103
- 34249833101
- Technical note: Q-learning
- C. Watkins, and P. Dayan Technical note: Q-learning Machine Learning 8 1992 279 292
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

104
- 0242639620
- An introduction to the Kalman filter
- G. Welch, and G. Bishop An introduction to the Kalman filter Design 7 1 2001 1 16
- (2001) Design , vol.7 , Issue.1 , pp. 1-16
- Welch, G.¹ Bishop, G.²

105
- 0002031779
- Approximate dynamic programming for real-time control and neural modeling
- White Sofge V.N. Reinhold New York, USA
- P. Werbos Approximate dynamic programming for real-time control and neural modeling White Sofge Handbook of intelligent control 1992 V.N. Reinhold New York, USA 493 525
- (1992) Handbook of Intelligent Control , pp. 493-525
- Werbos, P.¹

106
- 0002437599
- Neurocontrol and supervised learning: An overview and evaluationg
- White Sofge V.N. Reinhold New York, USA
- P. Werbos Neurocontrol and supervised learning: An overview and evaluationg White Sofge Handbook of intelligent control 1992 V.N. Reinhold New York, USA 65 89
- (1992) Handbook of Intelligent Control , pp. 65-89
- Werbos, P.¹

107
- 84893393162
- 1 ADP: Goals, opportunities and principles
- J. SiBarto, W. Powell, D. Wunsch, Wiley-IEEE Press Piscataway, NJ, USA
- P. Werbos 1 ADP: Goals, opportunities and principles J. SiBarto, W. Powell, D. Wunsch, Handbook of learning and approximate dynamic programming 2004 Wiley-IEEE Press Piscataway, NJ, USA 3 44
- (2004) Handbook of Learning and Approximate Dynamic Programming , pp. 3-44
- Werbos, P.¹

108
- 85013387551
- Using ADP to understand and replicate brain intelligence: The next level design
- L. Perlovsky, R. Kozma, Understanding complex systems Springer Berlin/Heidelberg
- P. Werbos Using ADP to understand and replicate brain intelligence: The next level design L. Perlovsky, R. Kozma, Neurodynamics of cognition and consciousness Understanding complex systems Vol. 25 2007 Springer Berlin/Heidelberg 109 123
- (2007) Neurodynamics of Cognition and Consciousness , vol.25 , pp. 109-123
- Werbos, P.¹

109
- 67349247013
- Intelligence in the brain: A theory of how it works and how to build it
- goal-directed neural system
- P. Werbos Intelligence in the brain: A theory of how it works and how to build it Neural Networks 22 3 2009 200 212 goal-directed neural system
- (2009) Neural Networks , vol.22 , Issue.3 , pp. 200-212
- Werbos, P.¹

110
- 0025229247
- Consistency of HDP applied to a simple reinforcement learning problem
- P.J. Werbos Consistency of HDP applied to a simple reinforcement learning problem Neural Networks 3 2 1990 179 189
- (1990) Neural Networks , vol.3 , Issue.2 , pp. 179-189
- Werbos, P.J.¹

111
- 49049091767
- Foreword - ADP: The key direction for future research in intelligent control and understanding brain intelligence
- P.J. Werbos Foreword - ADP: The key direction for future research in intelligent control and understanding brain intelligence IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 38 4 2008 898 900
- (2008) IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics , vol.38 , Issue.4 , pp. 898-900
- Werbos, P.J.¹

112
- 0003544743
- Sofge Van Nostrant Reinhold
- D.A. White, and D.A. Sorge Sofge Handbook of intelligent control 1992 Van Nostrant Reinhold
- (1992) Handbook of Intelligent Control
- White, D.A.¹ Sorge, D.A.²

113
- 84860531726
- Reinforcement learning of optimal controls
- S.E. Haupt, A. Pasini, C. Marzban, Springer Netherlands
- J.K. Williams Reinforcement learning of optimal controls S.E. Haupt, A. Pasini, C. Marzban, Artificial intelligence methods in the environmental sciences 2009 Springer Netherlands 297 327
- (2009) Artificial Intelligence Methods in the Environmental Sciences , pp. 297-327
- Williams, J.K.¹

114
- 0034548295
- Convergence analysis of adaptive critic based optimal control
- Xin, L.; Balakrishnan, S. (2000). Convergence analysis of adaptive critic based optimal control. In Proceedings of the American control conference, 2000 (Vol. 3. pp. 1929-1933).
- (2000) Proceedings of the American Control Conference, 2000 , vol.3 , pp. 1929-1933
- Xin, L.¹ Balakrishnan, S.²

115
- 34547098844
- Kernel-based least squares policy iteration for reinforcement learning
- X. Xu, D. Hu, and X. Lu Kernel-based least squares policy iteration for reinforcement learning IEEE Transactions on Neural Networks 18 4 2007 973 992
- (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 973-992
- Xu, X.¹ Hu, D.² Lu, X.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.