메뉴 건너뛰기




Volumn 24, Issue 5, 2013, Pages 762-775

Online learning control using adaptive critic designs with sparse kernel machines

Author keywords

Adaptive critic designs; Approximate dynamic programming; Kernel machines; Learning control; Markov decision processes; Reinforcement learning

Indexed keywords

ADAPTIVE CRITIC DESIGNS; APPROXIMATE DYNAMIC PROGRAMMING; KERNEL MACHINE; LEARNING CONTROL; MARKOV DECISION PROCESSES;

EID: 84884922436     PISSN: 2162237X     EISSN: 21622388     Source Type: Journal    
DOI: 10.1109/TNNLS.2012.2236354     Document Type: Article
Times cited : (117)

References (48)
  • 2
    • 66449130966 scopus 로고    scopus 로고
    • Adaptive dynamic programming: An introduction
    • May
    • F. Y. Wang, H. Zhang, and D. Liu, "Adaptive dynamic programming: An introduction," IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39-47, May 2009.
    • (2009) IEEE Comput. Intell. Mag. , vol.4 , Issue.2 , pp. 39-47
    • Wang, F.Y.1    Zhang, H.2    Liu, D.3
  • 5
    • 67349247013 scopus 로고    scopus 로고
    • Intelligence in the brain: A theory of how it works and how to build it
    • Apr.
    • P. J. Werbos, "Intelligence in the brain: A theory of how it works and how to build it," Neural Netw., vol. 22, no. 3, pp. 200-212, Apr. 2009.
    • (2009) Neural Netw. , vol.22 , Issue.3 , pp. 200-212
    • Werbos, P.J.1
  • 8
    • 26844483839 scopus 로고    scopus 로고
    • A self-learning call admission control scheme for CDMA cellular networks
    • DOI 10.1109/TNN.2005.853408
    • D. Liu, Y. Zhang, and H. Zhang, "A self-learning call admission control scheme for CDMA cellular networks," IEEE Trans. Neural Netw., vol. 16, no. 5, pp. 1219-1228, Sep. 2005. (Pubitemid 41444623)
    • (2005) IEEE Transactions on Neural Networks , vol.16 , Issue.5 , pp. 1219-1228
    • Liu, D.1    Zhang, Y.2    Zhang, H.3
  • 9
    • 0032208335 scopus 로고    scopus 로고
    • Elevator group control using multiple reinforcement learning agents
    • Nov.
    • R. H. Crites and A. G. Barto, "Elevator group control using multiple reinforcement learning agents," Mach. Learn., vol. 33, nos. 2-3, pp. 235-262, Nov. 1998.
    • (1998) Mach. Learn. , vol.33 , Issue.2-3 , pp. 235-262
    • Crites, R.H.1    Barto, A.G.2
  • 10
    • 0000985504 scopus 로고
    • TD-Gammon, a self-teaching backgammon program, achieves master-level play
    • Mar.
    • G. Tesauro, "TD-Gammon, a self-teaching backgammon program, achieves master-level play," Neural Comput., vol. 6, no. 2, pp. 215-219, Mar. 1994.
    • (1994) Neural Comput. , vol.6 , Issue.2 , pp. 215-219
    • Tesauro, G.1
  • 11
    • 49649121741 scopus 로고    scopus 로고
    • Reinforcement-learning-based dual-control methodology for complex nonlinear discrete-time systems with application to spark engine EGR operation
    • Aug.
    • P. Shih, B. C. Kaul, S. Jagannathan, and J. A. Drallmeier, "Reinforcement-learning-based dual-control methodology for complex nonlinear discrete-time systems with application to spark engine EGR operation," IEEE Trans. Neural Netw., vol. 19, no. 8, pp. 1369-1388, Aug. 2008.
    • (2008) IEEE Trans. Neural Netw. , vol.19 , Issue.8 , pp. 1369-1388
    • Shih, P.1    Kaul, B.C.2    Jagannathan, S.3    Drallmeier, J.A.4
  • 15
    • 0013535965 scopus 로고    scopus 로고
    • Infinite-horizon policy-gradient estimation
    • Jul.
    • J. Baxter and P. L. Bartlett, "Infinite-horizon policy-gradient estimation," J. Artif. Intell. Res., vol. 15, no. 1, pp. 319-350, Jul. 2001.
    • (2001) J. Artif. Intell. Res. , vol.15 , Issue.1 , pp. 319-350
    • Baxter, J.1    Bartlett, P.L.2
  • 17
    • 0031236002 scopus 로고    scopus 로고
    • Adaptive critic designs
    • Jul.
    • D. V. Prokhorov and D. C. Wunsch, "Adaptive critic designs," IEEE Trans. Neural Netw., vol. 8, no. 5, pp. 997-1007, Jul. 1997.
    • (1997) IEEE Trans. Neural Netw. , vol.8 , Issue.5 , pp. 997-1007
    • Prokhorov, D.V.1    Wunsch, D.C.2
  • 18
    • 0020970738 scopus 로고
    • Neuron-like adaptive elements that can solve difficult learning control problems
    • Sep.-Oct.
    • A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuron-like adaptive elements that can solve difficult learning control problems," IEEE Trans. Syst., Man, Cybern., vol. 13, no. 5, pp. 834-846, Sep.-Oct. 1983.
    • (1983) IEEE Trans. Syst., Man, Cybern. , vol.13 , Issue.5 , pp. 834-846
    • Barto, A.G.1    Sutton, R.S.2    Anderson, C.W.3
  • 19
    • 0036565019 scopus 로고    scopus 로고
    • Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator
    • May
    • G. K. Venayagamoorthy, R. G. Harley, and D. C. Wunsch, "Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator," IEEE Trans. Neural Netw., vol. 13, no. 3, pp. 764-773, May 2002.
    • (2002) IEEE Trans. Neural Netw. , vol.13 , Issue.3 , pp. 764-773
    • Venayagamoorthy, G.K.1    Harley, R.G.2    Wunsch, D.C.3
  • 20
    • 70349116541 scopus 로고    scopus 로고
    • Reinforcement learning and adaptive dynamic programming for feedback control
    • Aug.
    • F. L. Lewis and D. Vrabie, "Reinforcement learning and adaptive dynamic programming for feedback control," IEEE Circuits Syst. Mag., vol. 9, no. 3, pp. 32-50, Aug. 2009.
    • (2009) IEEE Circuits Syst. Mag. , vol.9 , Issue.3 , pp. 32-50
    • Lewis, F.L.1    Vrabie, D.2
  • 21
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • Mar.
    • J. Peters and S. Schaal, "Natural actor-critic," Neurocomputing, vol. 71, nos. 7-9, pp. 1180-1190, Mar. 2008.
    • (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 23
    • 0030196717 scopus 로고    scopus 로고
    • Adaptive-critic-based neural networks for aircraft optimal control
    • S. N. Balakrishnan and V. Biega, "Adaptive-critic-based neural networks for aircraft optimal control," J. Guid., Control, Dynamics, vol. 19, no. 4, pp. 893-898, 1996. (Pubitemid 126539437)
    • (1996) Journal of Guidance, Control, and Dynamics , vol.19 , Issue.4 , pp. 893-898
    • Balakrishnan, S.N.1    Biega, V.2
  • 24
    • 0043026775 scopus 로고    scopus 로고
    • Helicopter trimming and tracking control using direct neural dynamic programming
    • Jul.
    • R. Enns and J. Si, "Helicopter trimming and tracking control using direct neural dynamic programming," IEEE Trans. Neural Netw., vol. 14, no. 4, pp. 929-939, Jul. 2003.
    • (2003) IEEE Trans. Neural Netw. , vol.14 , Issue.4 , pp. 929-939
    • Enns, R.1    Si, J.2
  • 25
    • 49049106959 scopus 로고    scopus 로고
    • Direct heuristic dynamic programming for damping oscillations in a large power system
    • Aug.
    • C. Lu, J. Si, and X. Xie, "Direct heuristic dynamic programming for damping oscillations in a large power system," IEEE Trans. Syst., Man, Cybern., Part B, Cybern., vol. 38, no. 4, pp. 1008-1013, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern., Part B, Cybern. , vol.38 , Issue.4 , pp. 1008-1013
    • Lu, C.1    Si, J.2    Xie, X.3
  • 26
    • 49649121741 scopus 로고    scopus 로고
    • Reinforcement-learning-based dual-control methodology for complex nonlinear discrete-time systems with application to spark engine EGR operation
    • Aug.
    • P. Shih, B. C. Kaul, S. Jagannathan, and J. A. Drallmeier, "Reinforcement-learning-based dual-control methodology for complex nonlinear discrete-time systems with application to spark engine EGR operation," IEEE Trans. Neural Netw., vol. 19, no. 8, pp. 1369-1388, Aug. 2008.
    • (2008) IEEE Trans. Neural Netw. , vol.19 , Issue.8 , pp. 1369-1388
    • Shih, P.1    Kaul, B.C.2    Jagannathan, S.3    Drallmeier, J.A.4
  • 27
    • 0031028741 scopus 로고    scopus 로고
    • Effective Backpropagation training with variable stepsize
    • Jan.
    • G. D. Magoulasa, M. N. Vrahatisb, and G. S. Androulakisb, "Effective Backpropagation training with variable stepsize," Neural Netw., vol. 10, no. 1, pp. 69-82, Jan. 1997.
    • (1997) Neural Netw. , vol.10 , Issue.1 , pp. 69-82
    • Magoulasa, G.D.1    Vrahatisb, M.N.2    Androulakisb, G.S.3
  • 28
    • 79960468564 scopus 로고    scopus 로고
    • Asymptotic tracking by a reinforcement learning-based adaptive critic controller
    • S. Bhasin, N. Sharma, P. Patre, and W. E. Dixon, "Asymptotic tracking by a reinforcement learning-based adaptive critic controller," J. Control Theory Appl., vol. 9, No. 3, pp. 400-409, 2011.
    • (2011) J. Control Theory Appl. , vol.9 , Issue.3 , pp. 400-409
    • Bhasin, S.1    Sharma, N.2    Patre, P.3    Dixon, W.E.4
  • 29
    • 77950630017 scopus 로고    scopus 로고
    • Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
    • May
    • K. G. Vamvoudakis and F. L. Lewis, "Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem," Automatica, vol. 46, no. 5, pp. 878-888, May 2010.
    • (2010) Automatica , vol.46 , Issue.5 , pp. 878-888
    • Vamvoudakis, K.G.1    Lewis, F.L.2
  • 30
    • 83655163786 scopus 로고    scopus 로고
    • Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method
    • Dec.
    • H. Zhang, L. Cui, X. Zhang, and Y. Luo, "Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method," IEEE Trans. Neural Netw., vol. 22, no. 12, pp. 2226-2236, Dec. 2011.
    • (2011) IEEE Trans. Neural Netw. , vol.22 , Issue.12 , pp. 2226-2236
    • Zhang, H.1    Cui, L.2    Zhang, X.3    Luo, Y.4
  • 34
    • 0011812771 scopus 로고    scopus 로고
    • Kernel independent component analysis
    • Jul.
    • F. R. Bach and M. I. Jordan, "Kernel independent component analysis," J. Mach. Learn. Res., vol. 3, pp. 1-48, Jul. 2002.
    • (2002) J. Mach. Learn. Res. , vol.3 , pp. 1-48
    • Bach, F.R.1    Jordan, M.I.2
  • 35
    • 51049096780 scopus 로고    scopus 로고
    • Kernel methods in machine learning
    • T. Hofmann, B. Schölkopf, and A. J. Smola, "Kernel methods in machine learning," Ann. Statist., vol. 36, no. 3 pp. 1171-1220, 2008.
    • (2008) Ann. Statist. , vol.36 , Issue.3 , pp. 1171-1220
    • Hofmann, T.1    Schölkopf, B.2    Smola, A.J.3
  • 36
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • D. Ormoneit and S. Sen, "Kernel-based reinforcement learning," Mach. Learn., vol. 49, nos. 2-3, pp. 161-178, 2002.
    • (2002) Mach. Learn. , vol.49 , Issue.2-3 , pp. 161-178
    • Ormoneit, D.1    Sen, S.2
  • 37
    • 1942421151 scopus 로고    scopus 로고
    • Bayes meets bellman: The Gaussian Process approach to temporal difference learning
    • Y. Engel, S. Mannor, and R. Meir, "Bayes meets bellman: The Gaussian Process approach to temporal difference learning," in Proc. Int. Conf. Mach. Learn., 2003, pp. 154-161.
    • Proc. Int. Conf. Mach. Learn., 2003 , pp. 154-161
    • Engel, Y.1    Mannor, S.2    Meir, R.3
  • 38
    • 84899029004 scopus 로고    scopus 로고
    • Batch value function approximation via support vectors
    • Cambridge, MA: MIT Press
    • T. G. Dietterich and X. Wang, "Batch value function approximation via support vectors," in Advances in Neural Information Processing Systems 14, Cambridge, MA: MIT Press, 2002, pp. 1491-1498.
    • (2002) Advances in Neural Information Processing Systems , vol.14 , pp. 1491-1498
    • Dietterich, T.G.1    Wang, X.2
  • 39
    • 84899026055 scopus 로고    scopus 로고
    • Gaussian processes in reinforcement learning
    • S. Thrun, L. K. Saul, and B. Schölkopf, Eds., Cambridge, MA: MIT Press
    • C. E. Rasmussen and M. Kuss, "Gaussian processes in reinforcement learning," in Advances in Neural Information Processing Systems 16, S. Thrun, L. K. Saul, and B. Schölkopf, Eds., Cambridge, MA: MIT Press, 2004, pp. 751-759.
    • (2004) Advances in Neural Information Processing Systems , vol.16 , pp. 751-759
    • Rasmussen, C.E.1    Kuss, M.2
  • 40
    • 4644323293 scopus 로고    scopus 로고
    • Least-squares policy iteration
    • Dec.
    • M. G. Lagoudakis and R. Parr, "Least-squares policy iteration," J. Mach. Learn. Res., vol. 4, pp. 1107-1149, Dec. 2003.
    • (2003) J. Mach. Learn. Res. , vol.4 , pp. 1107-1149
    • Lagoudakis, M.G.1    Parr, R.2
  • 41
    • 34547098844 scopus 로고    scopus 로고
    • Kernel-based least squares policy iteration for reinforcement learning
    • DOI 10.1109/TNN.2007.899161, Neural Networks for Feedback Control Systems
    • X. Xu, D. Hu, and X. Lu, "Kernel-based least-squares policy iteration for reinforcement learning," IEEE Trans. Neural Netw., vol. 18, no. 4, pp. 973-992, Jul. 2007. (Pubitemid 47098876)
    • (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 973-992
    • Xu, X.1    Hu, D.2    Lu, X.3
  • 42
    • 3543096272 scopus 로고    scopus 로고
    • The kernel recursive least-squares algorithm
    • Aug.
    • Y. Engel, S. Mannor, and R. Meir, "The kernel recursive least-squares algorithm," IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2275-2285, Aug. 2004.
    • (2004) IEEE Trans. Signal Process. , vol.52 , Issue.8 , pp. 2275-2285
    • Engel, Y.1    Mannor, S.2    Meir, R.3
  • 43
    • 0041345290 scopus 로고    scopus 로고
    • Efficient reinforcement learning using recursive least-squares methods
    • X. Xu, H. G. He, and D. W. Hu, "Efficient reinforcement learning using recursive least-squares methods," J. Artif. Intell. Res., vol. 16, pp. 259-292, Jun. 2002. (Pubitemid 43057174)
    • (2002) Journal of Artificial Intelligence Research , vol.16 , pp. 259-292
    • Xu, X.1    He, H.-G.2    Hu, D.3
  • 44
    • 33750328566 scopus 로고    scopus 로고
    • Kernel least-squares temporal difference learning
    • X. Xu, T. Xie, D. Hu, and X. Lu, "Kernel least-squares temporal difference learning," Int. J. Inf. Technol., vol. 11, no. 9, pp. 54-63, 2005.
    • (2005) Int. J. Inf. Technol. , vol.11 , Issue.9 , pp. 54-63
    • Xu, X.1    Xie, T.2    Hu, D.3    Lu, X.4
  • 45
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal difference learning with function approximation
    • May
    • J. N. Tsitsiklis and B. V. Roy, "An analysis of temporal difference learning with function approximation," IEEE Trans. Autom. Control, vol. 42, no. 5, pp. 674-690, May 1997.
    • (1997) IEEE Trans. Autom. Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Roy, B.V.2
  • 46
    • 0037288398 scopus 로고    scopus 로고
    • Least squares policy evaluation algorithms with linear function approximation
    • Jan.-Apr.
    • A. Nedic and D. P. Bertsekas, "Least squares policy evaluation algorithms with linear function approximation," Discrete Event Dyn. Syst., vol. 13, nos. 1-2, pp. 79-110, Jan.-Apr. 2003.
    • (2003) Discrete Event Dyn. Syst. , vol.13 , Issue.1-2 , pp. 79-110
    • Nedic, A.1    Bertsekas, D.P.2
  • 47
    • 84875270081 scopus 로고    scopus 로고
    • Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update
    • Jul.
    • T. Dierks and S. Jagannathan, "Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update," IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 7, pp. 1118-1129, Jul. 2012.
    • (2012) IEEE Trans. Neural Netw. Learn. Syst. , vol.23 , Issue.7 , pp. 1118-1129
    • Dierks, T.1    Jagannathan, S.2
  • 48
    • 84875242151 scopus 로고    scopus 로고
    • Sensitivity-based adaptive learning rules for binary feedforward neural networks
    • Mar.
    • S. Zhong, X. Zeng, S. Wu, and L. Han, "Sensitivity-based adaptive learning rules for binary feedforward neural networks," IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 3, pp. 480-491, Mar. 2012.
    • (2012) IEEE Trans. Neural Netw. Learn. Syst. , vol.23 , Issue.3 , pp. 480-491
    • Zhong, S.1    Zeng, X.2    Wu, S.3    Han, L.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.