메뉴 건너뛰기




Volumn 261, Issue , 2014, Pages 1-31

Reinforcement learning algorithms with function approximation: Recent advances and applications

Author keywords

Approximate dynamic programming; Function approximation; Generalization; Learning control; Reinforcement learning

Indexed keywords

APPROXIMATE DYNAMIC PROGRAMMING; FEATURE REPRESENTATION; FUNCTION APPROXIMATION; FUNCTION APPROXIMATION TECHNIQUES; GENERALIZATION; LEARNING CONTROL; MARKOV DECISION PROCESSES; PREDICTION AND CONTROL;

EID: 84891828192     PISSN: 00200255     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.ins.2013.08.037     Document Type: Article
Times cited : (170)

References (155)
  • 1
    • 0037616356 scopus 로고    scopus 로고
    • Reinforcement learning for true adaptive traffic signal control
    • B. Abdulhai, and R. Pringle et al. Reinforcement learning for true adaptive traffic signal control Journal of Transportation Engineering 129 3 2003 278 285
    • (2003) Journal of Transportation Engineering , vol.129 , Issue.3 , pp. 278-285
    • Abdulhai, B.1    Pringle, R.2
  • 2
    • 33846781129 scopus 로고    scopus 로고
    • Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
    • A. Al-Tamimi, F.L. Lewis, and M. Abu-Khalaf Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control Automatica 43 2007 473 481
    • (2007) Automatica , vol.43 , pp. 473-481
    • Al-Tamimi, A.1    Lewis, F.L.2    Abu-Khalaf, M.3
  • 4
    • 0000396062 scopus 로고    scopus 로고
    • Natural gradient works efficiently in learning
    • S. Amari Natural gradient works efficiently in learning Neural Computation 10 2 1998 251 276
    • (1998) Neural Computation , vol.10 , Issue.2 , pp. 251-276
    • Amari, S.1
  • 5
    • 70449644892 scopus 로고    scopus 로고
    • Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems
    • St. Louis, MO, USA, June 10-12
    • A. Antos, R. Munos, C. Szepesvari, Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems, in: 2009 American Control Conference, Hyatt Regency Riverfront, St. Louis, MO, USA, June 10-12, pp. 725-730.
    • 2009 American Control Conference, Hyatt Regency Riverfront , pp. 725-730
    • Antos, A.1    Munos, R.2    Szepesvari, C.3
  • 7
    • 77953015097 scopus 로고    scopus 로고
    • Reinforcement learning-based multi-agent system for network traffic signal control
    • I. Arel, and C. Liu et al. Reinforcement learning-based multi-agent system for network traffic signal control IET Intelligent Transport Systems 4 2 2010 128 135
    • (2010) IET Intelligent Transport Systems , vol.4 , Issue.2 , pp. 128-135
    • Arel, I.1    Liu, C.2
  • 12
    • 77956287357 scopus 로고    scopus 로고
    • Urban traffic signal control using reinforcement learning agents
    • P.G. Balaji, and X. German et al. Urban traffic signal control using reinforcement learning agents IET Intelligent Transport Systems 4 3 2010 177 188
    • (2010) IET Intelligent Transport Systems , vol.4 , Issue.3 , pp. 177-188
    • Balaji, P.G.1    German, X.2
  • 13
    • 0030196717 scopus 로고    scopus 로고
    • Adaptive-critic-based neural networks for aircraft optimal control
    • S.N. Balakrishnan, and V. Biega Adaptive-critic-based neural networks for aircraft optimal control Journal of Guidance, Control, Dynamics 19 4 1996 893 898
    • (1996) Journal of Guidance, Control, Dynamics , vol.19 , Issue.4 , pp. 893-898
    • Balakrishnan, S.N.1    Biega, V.2
  • 14
    • 84986214645 scopus 로고    scopus 로고
    • Reinforcement learning and its relationship to supervised learning
    • J. Si, A. Barto, W. Powell, D. Wunsch, Wiley-IEEE Press New York
    • A.G. Barto, and T.G. Dietterich Reinforcement learning and its relationship to supervised learning J. Si, A. Barto, W. Powell, D. Wunsch, Handbook of Learning and Approximate Dynamic Programming 2004 Wiley-IEEE Press New York
    • (2004) Handbook of Learning and Approximate Dynamic Programming
    • Barto, A.G.1    Dietterich, T.G.2
  • 20
    • 0031076413 scopus 로고    scopus 로고
    • Stochastic approximation with two time scales
    • V.S. Borkar Stochastic approximation with two time scales Systems & Control Letters 29 5 1997 291 294
    • (1997) Systems & Control Letters , vol.29 , Issue.5 , pp. 291-294
    • Borkar, V.S.1
  • 22
    • 85153940465 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • J. Boyan, A.W. Moore, Generalization in reinforcement learning: safely approximating the value function, in: Advances in Neural Information Processing Systems, 1995, pp. 369-376.
    • (1995) Advances in Neural Information Processing Systems , pp. 369-376
    • Boyan, J.1    Moore, A.W.2
  • 23
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • J. Boyan Technical update: least-squares temporal difference learning Machine Learning 49 2-3 2002 233 246
    • (2002) Machine Learning , vol.49 , Issue.23 , pp. 233-246
    • Boyan, J.1
  • 24
    • 0000719863 scopus 로고
    • Packet routing in dynamically changing networks: A reinforcement learning approach
    • NIPS 1994
    • J. Boyan, and M. Littman Packet routing in dynamically changing networks: a reinforcement learning approach Advances in neural information processing systems 6 NIPS 1994 1994
    • (1994) Advances in Neural Information Processing Systems , vol.6
    • Boyan, J.1    Littman, M.2
  • 27
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S.J. Brartke, and A. Barto Linear least-squares algorithms for temporal difference learning Machine Learning 22 1996 33 57
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Brartke, S.J.1    Barto, A.2
  • 30
    • 27844448118 scopus 로고    scopus 로고
    • A behavior-based scheme using reinforcement learning for autonomous underwater vehicles
    • M. Carreras, and J. Yuh et al. A behavior-based scheme using reinforcement learning for autonomous underwater vehicles IEEE Journal of Oceanic Engineering 30 2 2005 416 427
    • (2005) IEEE Journal of Oceanic Engineering , vol.30 , Issue.2 , pp. 416-427
    • Carreras, M.1    Yuh, J.2
  • 31
    • 0032208335 scopus 로고    scopus 로고
    • Elevator group control using multiple reinforcement learning agents
    • R.H. Crites, and A.G. Barto Elevator group control using multiple reinforcement learning agents Machine Learning 33 2-3 1998 235 262
    • (1998) Machine Learning , vol.33 , Issue.23 , pp. 235-262
    • Crites, R.H.1    Barto, A.G.2
  • 33
    • 0000595242 scopus 로고
    • Note on learning rate schedules for stochastic optimization
    • Lippman, et al. (Eds.)
    • C. Darken, J. Moody, Note on learning rate schedules for stochastic optimization, in: Lippman, et al. (Eds.), Advances in Neural Information Processing Systems, vol. 3, 1991, pp. 1009-1016.
    • (1991) Advances in Neural Information Processing Systems , vol.3 , pp. 1009-1016
    • Darken, C.1    Moody, J.2
  • 34
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general λ
    • P. Dayan The convergence of TD(λ) for general λ Machine Learning 8 1992 341 362
    • (1992) Machine Learning , vol.8 , pp. 341-362
    • Dayan, P.1
  • 35
    • 0028388685 scopus 로고
    • TD(λ) converges with probability 1
    • P. Dayan, and T.J. Sejnowski TD(λ) converges with probability 1 Machine Learning 14 1994 295 301
    • (1994) Machine Learning , vol.14 , pp. 295-301
    • Dayan, P.1    Sejnowski, T.J.2
  • 36
    • 84899029004 scopus 로고    scopus 로고
    • Batch value function approximation via support vectors
    • MIT Press Cambridge, MA
    • T.G. Dietterich, and X. Wang Batch value function approximation via support vectors Advances in Neural Information Processing Systems vol. 14 2002 MIT Press Cambridge, MA 1491 1498
    • (2002) Advances in Neural Information Processing Systems , vol.14 , pp. 1491-1498
    • Dietterich, T.G.1    Wang, X.2
  • 37
    • 0002278788 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning with the Max-Q value function decomposition
    • T.G. Dietterich Hierarchical reinforcement learning with the Max-Q value function decomposition Journal of Artificial Intelligence Research 13 2000 227 303
    • (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
    • Dietterich, T.G.1
  • 38
    • 0003506152 scopus 로고    scopus 로고
    • State abstraction in MAXQ hierarchical reinforcement learning
    • S.A. Solla, T.K. Leen, K.R. Muller (Eds.) NIPS
    • T.G. Dietterich, State abstraction in MAXQ hierarchical reinforcement learning, in: S.A. Solla, T.K. Leen, K.R. Muller (Eds.), Advances in Neural Information Processing Systems, NIPS, 2000, pp. 994-1000.
    • (2000) Advances in Neural Information Processing Systems , pp. 994-1000
    • Dietterich, T.G.1
  • 39
    • 4444312102 scopus 로고    scopus 로고
    • Integrating guidance into relational reinforcement learning
    • K. Driessens, and S. Dzeroski Integrating guidance into relational reinforcement learning Machine Learning 57 2004 271 304
    • (2004) Machine Learning , vol.57 , pp. 271-304
    • Driessens, K.1    Dzeroski, S.2
  • 40
    • 84948172455 scopus 로고    scopus 로고
    • Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner
    • L. De Raedt, P. Flach, Lecture Notes in Artificial Intelligence Springer-Verlag
    • K. Driessens, J. Ramon, and H. Blockeel Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner L. De Raedt, P. Flach, Proceedings of the 13th European Conference on Machine Learning Lecture Notes in Artificial Intelligence vol. 2167 2001 Springer-Verlag 97 108
    • (2001) Proceedings of the 13th European Conference on Machine Learning , vol.2167 , pp. 97-108
    • Driessens, K.1    Ramon, J.2    Blockeel, H.3
  • 43
    • 0043026775 scopus 로고    scopus 로고
    • Helicopter trimming and tracking control using direct neural dynamic programming
    • R. Enns, and J. Si Helicopter trimming and tracking control using direct neural dynamic programming IEEE Transactions on Neural Networks 14 4 2003 929 939
    • (2003) IEEE Transactions on Neural Networks , vol.14 , Issue.4 , pp. 929-939
    • Enns, R.1    Si, J.2
  • 45
    • 1442265466 scopus 로고    scopus 로고
    • Power systems stability control: Reinforcement learning framework
    • D. Ernst, and M. Glavic et al. Power systems stability control: reinforcement learning framework IEEE Transactions on Power Systems 19 1 2004 427 435
    • (2004) IEEE Transactions on Power Systems , vol.19 , Issue.1 , pp. 427-435
    • Ernst, D.1    Glavic, M.2
  • 46
    • 83155175393 scopus 로고    scopus 로고
    • Model selection in reinforcement learning
    • A. Farahmand, and Cs. Szepesvári Model selection in reinforcement learning Machine Learning 85 3 2011 299 332
    • (2011) Machine Learning , vol.85 , Issue.3 , pp. 299-332
    • Farahmand, A.1    Szepesvári, Cs.2
  • 48
    • 77952245702 scopus 로고    scopus 로고
    • Distributed Q-Learning for aggregated interference control in cognitive radio networks
    • A. Galindo-Serrano, and L. Giupponi Distributed Q-Learning for aggregated interference control in cognitive radio networks IEEE Transactions on Vehicular Technology 59 4 2010 1823 1834
    • (2010) IEEE Transactions on Vehicular Technology , vol.59 , Issue.4 , pp. 1823-1834
    • Galindo-Serrano, A.1    Giupponi, L.2
  • 49
    • 33748273074 scopus 로고    scopus 로고
    • Graph kernels and Gaussian Processes for relational reinforcement learning
    • K. Driessens, J. Ramon, and T. Gärtner Graph kernels and Gaussian Processes for relational reinforcement learning Machine Learning 64 1-3 2006 91 119
    • (2006) Machine Learning , vol.64 , Issue.13 , pp. 91-119
    • Driessens, K.1    Ramon, J.2    Gärtner, T.3
  • 51
    • 0742319170 scopus 로고    scopus 로고
    • Reinforcement learning for long-run average cost
    • A. Gosavi Reinforcement learning for long-run average cost European Journal of Operational Research 155 2004 654 674
    • (2004) European Journal of Operational Research , vol.155 , pp. 654-674
    • Gosavi, A.1
  • 52
    • 33748998787 scopus 로고    scopus 로고
    • Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
    • A.P. George, and W.B. Powell Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming Machine Learning 65 2006 167 198
    • (2006) Machine Learning , vol.65 , pp. 167-198
    • George, A.P.1    Powell, W.B.2
  • 53
    • 56449115872 scopus 로고    scopus 로고
    • ILSTD: Eligibility traces and convergence analysis
    • B. Scḧolkopf, J. Platt, T. Hoffman, MIT Press Cambridge, MA
    • A. Geramifard, M. Bowling, M. Zinkevich, and R.S. Sutton iLSTD: eligibility traces and convergence analysis B. Scḧolkopf, J. Platt, T. Hoffman, Advances in Neural Information Processing Systems vol. 19 2007 MIT Press Cambridge, MA 441 448
    • (2007) Advances in Neural Information Processing Systems , vol.19 , pp. 441-448
    • Geramifard, A.1    Bowling, M.2    Zinkevich, M.3    Sutton, R.S.4
  • 56
    • 0004019973 scopus 로고    scopus 로고
    • Convolution Kernels on Discrete Structures
    • University of California at Santa Cruz
    • D. Haussler, Convolution Kernels on Discrete Structures, Technical Report, Department of Computer Science, University of California at Santa Cruz, 1999.
    • (1999) Technical Report, Department of Computer Science
    • Haussler, D.1
  • 59
    • 79957981549 scopus 로고    scopus 로고
    • Transformation invariant on-line target recognition
    • K.M. Iftekharuddin Transformation invariant on-line target recognition IEEE Transactions on Neural Networks 22 6 2011 906 918
    • (2011) IEEE Transactions on Neural Networks , vol.22 , Issue.6 , pp. 906-918
    • Iftekharuddin, K.M.1
  • 60
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • T. Jaakkola, M. Jordan, and S. Singh On the convergence of stochastic iterative dynamic programming algorithms Neural Computation 6 6 1994 185 1201
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 185-1201
    • Jaakkola, T.1    Jordan, M.2    Singh, S.3
  • 62
    • 80053224177 scopus 로고    scopus 로고
    • Efficient exploration in reinforcement learning-based cognitive radio spectrum sharing
    • T. Jiang, and D. Grace et al. Efficient exploration in reinforcement learning-based cognitive radio spectrum sharing IET Communication 5 10 2011 1309 1317
    • (2011) IET Communication , vol.5 , Issue.10 , pp. 1309-1317
    • Jiang, T.1    Grace, D.2
  • 63
    • 68949099445 scopus 로고    scopus 로고
    • Hybrid least-squares algorithms for approximate policy evaluation
    • J. Johns, M. Petrik, and S. Mahadevan Hybrid least-squares algorithms for approximate policy evaluation Machine Learning 76 2009 243 256
    • (2009) Machine Learning , vol.76 , pp. 243-256
    • Johns, J.1    Petrik, M.2    Mahadevan, S.3
  • 69
    • 70349092184 scopus 로고    scopus 로고
    • Special issue on approximate dynamic programming and reinforcement learning for feedback control
    • F.L. Lewis, G. Lendaris, and D. Liu Special issue on approximate dynamic programming and reinforcement learning for feedback control IEEE Transactions on Systems, Man, and Cybernetics B 38 4 2008
    • (2008) IEEE Transactions on Systems, Man, and Cybernetics B , vol.38 , Issue.4
    • Lewis, F.L.1    Lendaris, G.2    Liu, D.3
  • 70
    • 70349116541 scopus 로고    scopus 로고
    • Reinforcement learning and adaptive dynamic programming for feedback control
    • F.L. Lewis, and D. Vrabie Reinforcement learning and adaptive dynamic programming for feedback control IEEE Circuits and Systems Magazine 9 3 2009 32 50
    • (2009) IEEE Circuits and Systems Magazine , vol.9 , Issue.3 , pp. 32-50
    • Lewis, F.L.1    Vrabie, D.2
  • 71
    • 26844483839 scopus 로고    scopus 로고
    • A self-learning call admission control scheme for CDMA cellular networks
    • D. Liu, Y. Zhang, and H. Zhang A self-learning call admission control scheme for CDMA cellular networks IEEE Transactions on Neural Networks 16 5 2005 1219 1228
    • (2005) IEEE Transactions on Neural Networks , vol.16 , Issue.5 , pp. 1219-1228
    • Liu, D.1    Zhang, Y.2    Zhang, H.3
  • 72
    • 84979245843 scopus 로고    scopus 로고
    • Convergent temporal-difference learning with arbitrary smooth function approximation
    • J. Laferty, C. Williams, MIT Press Cambridge, MA, USA
    • H.R. Maei, C. Szepesvári, S. Bhatnagar, D. Precup, and R.S. Sutton Convergent temporal-difference learning with arbitrary smooth function approximation J. Laferty, C. Williams, Advances in Neural Information Processing Systems vol. 22 2010 MIT Press Cambridge, MA, USA
    • (2010) Advances in Neural Information Processing Systems , vol.22
    • Maei, H.R.1    Szepesvári, C.2    Bhatnagar, S.3    Precup, D.4    Sutton, R.S.5
  • 73
    • 77956541799 scopus 로고    scopus 로고
    • Toward off-policy learning control with function approximation
    • J. Furnkranz, T. Joachims, Omnipress
    • H.R. Maei, C. Szepesvári, S. Bhatnagar, and R. Sutton Toward off-policy learning control with function approximation J. Furnkranz, T. Joachims, ICML 2010 Omnipress 719 726
    • (2010) ICML , pp. 719-726
    • Maei, H.R.1    Szepesvári, C.2    Bhatnagar, S.3    Sutton, R.4
  • 75
    • 35748957806 scopus 로고    scopus 로고
    • Proto-value functions: A laplacian framework for learning representation and control in markov decision processes
    • S. Mahadevan, and M. Maggioni Proto-value functions: a laplacian framework for learning representation and control in markov decision processes Journal of Machine Learning Research 8 2007 2169 2231
    • (2007) Journal of Machine Learning Research , vol.8 , pp. 2169-2231
    • Mahadevan, S.1    Maggioni, M.2
  • 79
    • 0033171330 scopus 로고    scopus 로고
    • Cognitive radio: Making software radios more personal
    • J. Mitola III, and G.Q. Maguire Cognitive radio: making software radios more personal IEEE Personal Communications 6 4 1999 13 18
    • (1999) IEEE Personal Communications , vol.6 , Issue.4 , pp. 13-18
    • Mitola III, J.1    Maguire, G.Q.2
  • 80
    • 33847661590 scopus 로고    scopus 로고
    • Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system
    • S. Mohagheghi, and G.K. Venayagamoorthy et al. Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system IEEE Transactions on Power Systems 21 4 2006 1744 1754
    • (2006) IEEE Transactions on Power Systems , vol.21 , Issue.4 , pp. 1744-1754
    • Mohagheghi, S.1    Venayagamoorthy, G.K.2
  • 81
    • 33947313024 scopus 로고    scopus 로고
    • A reinforcement learning model to assess market power under auction-based energy pricingm
    • V. Nanduri, and T.K. Das A reinforcement learning model to assess market power under auction-based energy pricingm IEEE Transactions on Power Systems 22 1 2007 85 95
    • (2007) IEEE Transactions on Power Systems , vol.22 , Issue.1 , pp. 85-95
    • Nanduri, V.1    Das, T.K.2
  • 82
    • 0037288398 scopus 로고    scopus 로고
    • Least squares policy evaluation algorithms with linear function approximation
    • A. Nedic, and D.P. Bertsekas Least squares policy evaluation algorithms with linear function approximation Discrete Event Dynamic Systems 13 1 2003 79 110
    • (2003) Discrete Event Dynamic Systems , vol.13 , Issue.1 , pp. 79-110
    • Nedic, A.1    Bertsekas, D.P.2
  • 84
    • 33747452437 scopus 로고    scopus 로고
    • Adaptive stock trading with dynamic asset allocation using reinforcement learning
    • J.O.J. Lee, J.W. Lee, and B.-T. Zhang Adaptive stock trading with dynamic asset allocation using reinforcement learning Information Sciences 176 15 2006 2121 2147
    • (2006) Information Sciences , vol.176 , Issue.15 , pp. 2121-2147
    • Lee, J.O.J.1    Lee, J.W.2    Zhang, B.-T.3
  • 85
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • D. Ormoneit, and S. Sen Kernel-based reinforcement learning Machine Learning 49 2-3 2002 161 178
    • (2002) Machine Learning , vol.49 , Issue.23 , pp. 161-178
    • Ormoneit, D.1    Sen, S.2
  • 86
    • 84898956770 scopus 로고    scopus 로고
    • Reinforcement learning with hierarchies of machines
    • MIT Press Cambridge, MA
    • R. Parr, and S. Russell Reinforcement learning with hierarchies of machines Advances in Neural Information Processing Systems 1998 MIT Press Cambridge, MA 1043 1049
    • (1998) Advances in Neural Information Processing Systems , pp. 1043-1049
    • Parr, R.1    Russell, S.2
  • 87
    • 0032136860 scopus 로고    scopus 로고
    • Delayed reinforcement learning for adaptive image segmentation and feature extraction
    • J. Peng, and B. Bhanu Delayed reinforcement learning for adaptive image segmentation and feature extraction IEEE Transactions on System Man and Cybernetics-Part C 28 3 1998 482 488
    • (1998) IEEE Transactions on System Man and Cybernetics-Part C , vol.28 , Issue.3 , pp. 482-488
    • Peng, J.1    Bhanu, B.2
  • 89
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • J. Peters, and S. Schaal Natural actor-critic Neurocomputing 71 2008 1180 1190
    • (2008) Neurocomputing , vol.71 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 95
    • 84899026055 scopus 로고    scopus 로고
    • Gaussian processes in reinforcement learning
    • S. Thrun, L.K. Saul, B. Schölkopf, MIT Press
    • C.E. Rasmussen, and M. Kuss Gaussian processes in reinforcement learning S. Thrun, L.K. Saul, B. Schölkopf, Advances in Neural Information Processing Systems vol. 16 2004 MIT Press 751 759
    • (2004) Advances in Neural Information Processing Systems , vol.16 , pp. 751-759
    • Rasmussen, C.E.1    Kuss, M.2
  • 96
    • 67650996818 scopus 로고    scopus 로고
    • Reinforcement learning for robot soccer
    • M. Riedmiller, and T. Gabel et al. Reinforcement learning for robot soccer Autonomous Robots 27 1 2009 55 74
    • (2009) Autonomous Robots , vol.27 , Issue.1 , pp. 55-74
    • Riedmiller, M.1    Gabel, T.2
  • 99
    • 0001201756 scopus 로고
    • Some studies in machine learning using game of checkers
    • A.L. Samuel Some studies in machine learning using game of checkers IBM Jounal on Research and Development 3 1959 211 229
    • (1959) IBM Jounal on Research and Development , vol.3 , pp. 211-229
    • Samuel, A.L.1
  • 103
    • 69249229732 scopus 로고    scopus 로고
    • Predicting investment behavior: An augmented reinforcement learning model
    • T. Shimokawa, and K. Suzuki et al. Predicting investment behavior: an augmented reinforcement learning model Neurocomputing 72 2009 3447 3461
    • (2009) Neurocomputing , vol.72 , pp. 3447-3461
    • Shimokawa, T.1    Suzuki, K.2
  • 105
    • 84898972974 scopus 로고    scopus 로고
    • Reinforcement learning for dynamic channel allocation in cellular telephone systems
    • NIPS 1996
    • S.P. Singh, and D. Bertsekas Reinforcement learning for dynamic channel allocation in cellular telephone systems Advances in Neural Information Processsing Systems 9 NIPS 1996 1997 974 980
    • (1997) Advances in Neural Information Processsing Systems , vol.9 , pp. 974-980
    • Singh, S.P.1    Bertsekas, D.2
  • 106
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • S.P. Singh, T. Jaakkola, M.L. Littman, and Cs. Szepesvari Convergence results for single-step on-policy reinforcement-learning algorithms Machine Learning 38 2000 287 308
    • (2000) Machine Learning , vol.38 , pp. 287-308
    • Singh, S.P.1    Jaakkola, T.2    Littman, M.L.3    Szepesvari, Cs.4
  • 107
    • 0028497385 scopus 로고
    • An upper bound on the loss from approximate optimal value functions
    • S.P. Singh, and R.C. Yee An upper bound on the loss from approximate optimal value functions Machine Learning 16 3 1994 227 233
    • (1994) Machine Learning , vol.16 , Issue.3 , pp. 227-233
    • Singh, S.P.1    Yee, R.C.2
  • 109
    • 27544506565 scopus 로고    scopus 로고
    • Reinforcement learning for RoboCup-soccer keepaway
    • P. Stone, and R.S. Sutton et al. Reinforcement learning for RoboCup-soccer keepaway Adaptive Behavior 13 3 2005 165 188
    • (2005) Adaptive Behavior , vol.13 , Issue.3 , pp. 165-188
    • Stone, P.1    Sutton, R.S.2
  • 111
    • 0026852362 scopus 로고
    • Reinforcement learning is direct adaptive control
    • R. Sutton, A.G. Barto, and R.J. Williams Reinforcement learning is direct adaptive control IEEE Control Systems 12 2 1992 19 22
    • (1992) IEEE Control Systems , vol.12 , Issue.2 , pp. 19-22
    • Sutton, R.1    Barto, A.G.2    Williams, R.J.3
  • 113
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • R. Sutton Learning to predict by the method of temporal differences Machine Learning 3 1988 9 44
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.1
  • 115
    • 77956513316 scopus 로고    scopus 로고
    • A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation
    • MIT Press Cambridge, MA, USA
    • R. Sutton, C. Szepesvari, and H.R. Maei A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation Advances in Neural Information Processing Systems vol. 21 2009 MIT Press Cambridge, MA, USA 1609 1616
    • (2009) Advances in Neural Information Processing Systems , vol.21 , pp. 1609-1616
    • Sutton, R.1    Szepesvari, C.2    Maei, H.R.3
  • 117
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    • R. Sutton, D. Precup, and S. Singh Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning Artificial Intelligence 112 1999 181 211
    • (1999) Artificial Intelligence , vol.112 , pp. 181-211
    • Sutton, R.1    Precup, D.2    Singh, S.3
  • 120
    • 0000985504 scopus 로고
    • TD-Gammon, a self-teaching backgammon program, achieves master-level play
    • G. Tesauro TD-Gammon, a self-teaching backgammon program, achieves master-level play Neural Computation 6 1994 215 219
    • (1994) Neural Computation , vol.6 , pp. 215-219
    • Tesauro, G.1
  • 123
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal difference learning with function approximation
    • J.N. Tsitsiklis, and B.V. Roy An analysis of temporal difference learning with function approximation IEEE Transactions on Automatic Control 42 5 1997 674 690
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Roy, B.V.2
  • 124
    • 0031636218 scopus 로고    scopus 로고
    • Tree based discretization for continuous state space reinforcement learning
    • W.T.B. Uther, M.M. Veloso, Tree based discretization for continuous state space reinforcement learning, in: Proceedings of AAAI-98, 1998, pp. 769-774.
    • (1998) Proceedings of AAAI-98 , pp. 769-774
    • Uther, W.T.B.1    Veloso, M.M.2
  • 125
    • 77950630017 scopus 로고    scopus 로고
    • Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
    • K.G. Vamvoudakis, and Frank L. Lewis Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem Automatica 46 5 2010 878 888
    • (2010) Automatica , vol.46 , Issue.5 , pp. 878-888
    • Vamvoudakis, K.G.1    Lewis, F.L.2
  • 126
    • 79960897012 scopus 로고    scopus 로고
    • Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations
    • K.G. Vamvoudakis, and Frank L. Lewis Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations Automatica 47 8 2011 1556 1569
    • (2011) Automatica , vol.47 , Issue.8 , pp. 1556-1569
    • Vamvoudakis, K.G.1    Lewis, F.L.2
  • 128
    • 0036565019 scopus 로고    scopus 로고
    • Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator
    • G.K. Venayagamoorthy, R.G. Harley, and D.C. Wunsch Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator IEEE Transactions on Neural Networks 13 3 2002 764 773
    • (2002) IEEE Transactions on Neural Networks , vol.13 , Issue.3 , pp. 764-773
    • Venayagamoorthy, G.K.1    Harley, R.G.2    Wunsch, D.C.3
  • 130
    • 58349110975 scopus 로고    scopus 로고
    • Adaptive optimal control for continuous-time linear systems based on policy iteration
    • D. Vrabie, F. Lewis, and M. Abu-Khalaf Adaptive optimal control for continuous-time linear systems based on policy iteration Automatica 45 2 2009 477 484
    • (2009) Automatica , vol.45 , Issue.2 , pp. 477-484
    • Vrabie, D.1    Lewis, F.2    Abu-Khalaf, M.3
  • 131
    • 67349145396 scopus 로고    scopus 로고
    • Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
    • D. Vrabie, and F. Lewis Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems Neural Networks 22 3 2009 237 246
    • (2009) Neural Networks , vol.22 , Issue.3 , pp. 237-246
    • Vrabie, D.1    Lewis, F.2
  • 132
    • 34250731840 scopus 로고    scopus 로고
    • A fuzzy actor-critic reinforcement learning network
    • X. Wang, Y. Cheng, and J.-Q. Yi A fuzzy actor-critic reinforcement learning network Information Sciences 177 18 2007 3764 3781
    • (2007) Information Sciences , vol.177 , Issue.18 , pp. 3764-3781
    • Wang, X.1    Cheng, Y.2    Yi, J.-Q.3
  • 134
    • 0004049893 scopus 로고
    • Ph.D. thesis, Cambridge Univ., Cambridge, England
    • C. Watkins, Learning from Delayed Rewards, Ph.D. thesis, Cambridge Univ., Cambridge, England, 1989.
    • (1989) Learning from Delayed Rewards
    • Watkins, C.1
  • 136
    • 67349247013 scopus 로고    scopus 로고
    • Intelligence in the brain: A theory of how it works and how to build it
    • P.J. Werbos Intelligence in the brain: a theory of how it works and how to build it Neural Networks 2009 200 212
    • (2009) Neural Networks , pp. 200-212
    • Werbos, P.J.1
  • 141
    • 0041345290 scopus 로고    scopus 로고
    • Efficient reinforcement learning using recursive least-squares methods
    • X. Xu, H.G. He, and D.W. Hu Efficient reinforcement learning using recursive least-squares methods Journal of Artificial Intelligence Research 16 2002 259 292
    • (2002) Journal of Artificial Intelligence Research , vol.16 , pp. 259-292
    • Xu, X.1    He, H.G.2    Hu, D.W.3
  • 143
    • 34547098844 scopus 로고    scopus 로고
    • Kernel based least-squares policy iteration for reinforcement learning
    • X. Xu, D.W. Hu, and X.C. Lu Kernel based least-squares policy iteration for reinforcement learning IEEE Transactions on Neural Networks 18 4 2007 973 992
    • (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 973-992
    • Xu, X.1    Hu, D.W.2    Lu, X.C.3
  • 144
    • 77649270156 scopus 로고    scopus 로고
    • Sequential anomaly detection based on temporal-difference learning: Principles, models and case studies
    • X. Xu Sequential anomaly detection based on temporal-difference learning: principles, models and case studies Applied Soft Computing 10 3 2010 859 867
    • (2010) Applied Soft Computing , vol.10 , Issue.3 , pp. 859-867
    • Xu, X.1
  • 145
    • 83855164075 scopus 로고    scopus 로고
    • Hierarchical approximate policy iteration with binary-tree state space decomposition
    • X. Xu, C. Liu, S. Yang, and D. Hu Hierarchical approximate policy iteration with binary-tree state space decomposition IEEE Transactions on Neural Networks 22 12 2011 1863 1877
    • (2011) IEEE Transactions on Neural Networks , vol.22 , Issue.12 , pp. 1863-1877
    • Xu, X.1    Liu, C.2    Yang, S.3    Hu, D.4
  • 146
    • 79956192776 scopus 로고    scopus 로고
    • Continuous-action reinforcement learning with fast policy search and adaptive basis function selection
    • X. Xu, C. Liu, and D. Hu Continuous-action reinforcement learning with fast policy search and adaptive basis function selection Soft Computing - A Fusion of Foundations, Methodologies and Applications 15 6 2011 1055 1070
    • (2011) Soft Computing - A Fusion of Foundations, Methodologies and Applications , vol.15 , Issue.6 , pp. 1055-1070
    • Xu, X.1    Liu, C.2    Hu, D.3
  • 149
    • 27644480261 scopus 로고    scopus 로고
    • Integrating relevance feedback techniques for image retrieval using reinforcement learning
    • P. Yin, and B. Bhanu et al. Integrating relevance feedback techniques for image retrieval using reinforcement learning IEEE Transactions on Pattern Analysis and Machine Intelligence 27 10 2005 1536 1551
    • (2005) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.27 , Issue.10 , pp. 1536-1551
    • Yin, P.1    Bhanu, B.2
  • 150
    • 79960889687 scopus 로고    scopus 로고
    • Stochastic optimal relaxed automatic generation control in non-Markov environment based on multi-step Q(λ) learning
    • T. Yu, and B. Zhou et al. Stochastic optimal relaxed automatic generation control in non-Markov environment based on multi-step Q(λ) learning IEEE Transactions on Power Systems 26 3 2011 1272 1282
    • (2011) IEEE Transactions on Power Systems , vol.26 , Issue.3 , pp. 1272-1282
    • Yu, T.1    Zhou, B.2
  • 151
    • 83655163786 scopus 로고    scopus 로고
    • Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method
    • H. Zhang, L. Cui, X. Zhang, and Y. Luo Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method IEEE Transactions on Neural Networks 22 12 2011 2226 2236
    • (2011) IEEE Transactions on Neural Networks , vol.22 , Issue.12 , pp. 2226-2236
    • Zhang, H.1    Cui, L.2    Zhang, X.3    Luo, Y.4
  • 153
    • 84855426943 scopus 로고    scopus 로고
    • Reinforcement learning for repeated power control game in cognitive radio networks
    • P. Zhou, and Y. Chang et al. Reinforcement learning for repeated power control game in cognitive radio networks IEEE Journal on Selected Areas in Communications 30 1 2012 54 69
    • (2012) IEEE Journal on Selected Areas in Communications , vol.30 , Issue.1 , pp. 54-69
    • Zhou, P.1    Chang, Y.2
  • 154
    • 0036696369 scopus 로고    scopus 로고
    • Robot learning with GA-based fuzzy reinforcement learning agents
    • C. Zhou Robot learning with GA-based fuzzy reinforcement learning agents Information Sciences 145 2002 45 68
    • (2002) Information Sciences , vol.145 , pp. 45-68
    • Zhou, C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.