SCOPUS 정보 검색 플랫폼

Information Sciences

Volumn 261, Issue , 2014, Pages 1-31

Reinforcement learning algorithms with function approximation: Recent advances and applications

(3) Xu, Xin a Zuo, Lei a Huang, Zhenhua a

a NATIONAL UNIVERSITY OF DEFENSE TECHNOLOGY (China)

Author keywords

Approximate dynamic programming; Function approximation; Generalization; Learning control; Reinforcement learning

Indexed keywords

APPROXIMATE DYNAMIC PROGRAMMING; FEATURE REPRESENTATION; FUNCTION APPROXIMATION; FUNCTION APPROXIMATION TECHNIQUES; GENERALIZATION; LEARNING CONTROL; MARKOV DECISION PROCESSES; PREDICTION AND CONTROL;

BENCHMARKING; LEARNING ALGORITHMS; MARKOV PROCESSES; REINFORCEMENT LEARNING;

APPROXIMATION ALGORITHMS;

EID: 84891828192 PISSN: 00200255 EISSN: None Source Type: Journal
DOI: 10.1016/j.ins.2013.08.037 Document Type: Article

Times cited : (170)

References (155)

1
- 0037616356
- Reinforcement learning for true adaptive traffic signal control
- B. Abdulhai, and R. Pringle et al. Reinforcement learning for true adaptive traffic signal control Journal of Transportation Engineering 129 3 2003 278 285
- (2003) Journal of Transportation Engineering , vol.129 , Issue.3 , pp. 278-285
- Abdulhai, B.¹ Pringle, R.²

2
- 33846781129
- Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
- A. Al-Tamimi, F.L. Lewis, and M. Abu-Khalaf Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control Automatica 43 2007 473 481
- (2007) Automatica , vol.43 , pp. 473-481
- Al-Tamimi, A.¹ Lewis, F.L.² Abu-Khalaf, M.³

3
- 34548747944
- Adaptive critic designs for discrete-time zero-sum games with application to H-Infinity control
- A. Al-Tamimi, M. Abu-Khalaf, and F.L. Lewis Adaptive critic designs for discrete-time zero-sum games with application to H-Infinity control IEEE Transactions on Systems Man Cybernetics-Part B 2006
- (2006) IEEE Transactions on Systems Man Cybernetics-Part B
- Al-Tamimi, A.¹ Abu-Khalaf, M.² Lewis, F.L.³

4
- 0000396062
- Natural gradient works efficiently in learning
- S. Amari Natural gradient works efficiently in learning Neural Computation 10 2 1998 251 276
- (1998) Neural Computation , vol.10 , Issue.2 , pp. 251-276
- Amari, S.¹

5
- 70449644892
- Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems
- St. Louis, MO, USA, June 10-12
- A. Antos, R. Munos, C. Szepesvari, Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems, in: 2009 American Control Conference, Hyatt Regency Riverfront, St. Louis, MO, USA, June 10-12, pp. 725-730.
- 2009 American Control Conference, Hyatt Regency Riverfront , pp. 725-730
- Antos, A.¹ Munos, R.² Szepesvari, C.³

6
- 0036927201
- State abstraction for programmable reinforcement learning agents
- D. Andre, S.J. Russell, State abstraction for programmable reinforcement learning agents, in: Proceedings of the Eighteenth National Conference on Artificial Intelligence, 2002, pp. 119-125.
- (2002) Proceedings of the Eighteenth National Conference on Artificial Intelligence , pp. 119-125
- Andre, D.¹ Russell, S.J.²

7
- 77953015097
- Reinforcement learning-based multi-agent system for network traffic signal control
- I. Arel, and C. Liu et al. Reinforcement learning-based multi-agent system for network traffic signal control IET Intelligent Transport Systems 4 2 2010 128 135
- (2010) IET Intelligent Transport Systems , vol.4 , Issue.2 , pp. 128-135
- Arel, I.¹ Liu, C.²

8
- 0011812771
- Kernel independent component analysis
- F.R. Bach, and M.I. Jordan Kernel independent component analysis Journal of Machine Learning Research 3 2002 1 48
- (2002) Journal of Machine Learning Research , vol.3 , pp. 1-48
- Bach, F.R.¹ Jordan, M.I.²

9
- 0034859944
- Autonomous helicopter control using reinforcement learning policy search methods
- Seoul, Korea
- J.A. Bagnell, J.G. Schneider, Autonomous helicopter control using reinforcement learning policy search methods, in: Proceedings of the 2001 IEEE International Conference on Robotics & Automation, Seoul, Korea, 2001, pp. 1615-1620.
- (2001) Proceedings of the 2001 IEEE International Conference on Robotics & Automation , pp. 1615-1620
- Bagnell, J.A.¹ Schneider, J.G.²

10
- 84858765598
- Covariant policy search
- G. Gottlob, T. Walsh, Morgan Kaufmann San Francisco, CA, USA
- J.A. Bagnell, and J.G. Schneider Covariant policy search G. Gottlob, T. Walsh, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03) 2003 Morgan Kaufmann San Francisco, CA, USA 1019 1024
- (2003) Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03) , pp. 1019-1024
- Bagnell, J.A.¹ Schneider, J.G.²

11
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- Morgan Kaufman San Francisco, CA, USA
- L.C. Baird Residual algorithms: reinforcement learning with function approximation Proceedings of the 12th International Conference on Machine Learning (ICML 1995) 1995 Morgan Kaufman San Francisco, CA, USA 30 37
- (1995) Proceedings of the 12th International Conference on Machine Learning (ICML 1995) , pp. 30-37
- Baird, L.C.¹

12
- 77956287357
- Urban traffic signal control using reinforcement learning agents
- P.G. Balaji, and X. German et al. Urban traffic signal control using reinforcement learning agents IET Intelligent Transport Systems 4 3 2010 177 188
- (2010) IET Intelligent Transport Systems , vol.4 , Issue.3 , pp. 177-188
- Balaji, P.G.¹ German, X.²

13
- 0030196717
- Adaptive-critic-based neural networks for aircraft optimal control
- S.N. Balakrishnan, and V. Biega Adaptive-critic-based neural networks for aircraft optimal control Journal of Guidance, Control, Dynamics 19 4 1996 893 898
- (1996) Journal of Guidance, Control, Dynamics , vol.19 , Issue.4 , pp. 893-898
- Balakrishnan, S.N.¹ Biega, V.²

14
- 84986214645
- Reinforcement learning and its relationship to supervised learning
- J. Si, A. Barto, W. Powell, D. Wunsch, Wiley-IEEE Press New York
- A.G. Barto, and T.G. Dietterich Reinforcement learning and its relationship to supervised learning J. Si, A. Barto, W. Powell, D. Wunsch, Handbook of Learning and Approximate Dynamic Programming 2004 Wiley-IEEE Press New York
- (2004) Handbook of Learning and Approximate Dynamic Programming
- Barto, A.G.¹ Dietterich, T.G.²

15
- 0141988716
- Recent advances in hierarchical reinforcement learning
- A.G. Barto, and S. Mahadevan Recent advances in hierarchical reinforcement learning Discrete Event Dynamic Systems-Theory and Applications 13 1-2 2003 41 77
- (2003) Discrete Event Dynamic Systems-Theory and Applications , vol.13 , Issue.12 , pp. 41-77
- Barto, A.G.¹ Mahadevan, S.²

16
- 0020970738
- Neuron-like adaptive elements that can solve difficult learning control problems
- A.G. Barto, R. Sutton, and C.W. Anderson Neuron-like adaptive elements that can solve difficult learning control problems IEEE Transactions on Systems, Man, and Cybernetics 13 5 1983 834 846
- (1983) IEEE Transactions on Systems, Man, and Cybernetics , vol.13 , Issue.5 , pp. 834-846
- Barto, A.G.¹ Sutton, R.² Anderson, C.W.³

17
- 0013535965
- Infinite-horizon policy-gradient estimation
- J. Baxter, and P.L. Bartlett Infinite-horizon policy-gradient estimation Journal of Artificial Intelligence Research 15 2001 319 350
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.L.²

18
- 0003487482
- Athena Scientific
- D.P. Bertsekas, and J. Tsitsiklis Neuro-Dynamic Programming 1996 Athena Scientific
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.²

19
- 70349984547
- Natural actor-critic algorithms
- S. Bhatnagar, R.S. Sutton, M. Ghavamzadeh, and M. Lee Natural actor-critic algorithms Automatica 45 11 2009 2471 2482
- (2009) Automatica , vol.45 , Issue.11 , pp. 2471-2482
- Bhatnagar, S.¹ Sutton, R.S.² Ghavamzadeh, M.³ Lee, M.⁴

20
- 0031076413
- Stochastic approximation with two time scales
- V.S. Borkar Stochastic approximation with two time scales Systems & Control Letters 29 5 1997 291 294
- (1997) Systems & Control Letters , vol.29 , Issue.5 , pp. 291-294
- Borkar, V.S.¹

21
- 58849087743
- Cambridge University Press
- V.S. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint 2008 Cambridge University Press
- (2008) Stochastic Approximation: A Dynamical Systems Viewpoint
- Borkar, V.S.¹

22
- 85153940465
- Generalization in reinforcement learning: Safely approximating the value function
- J. Boyan, A.W. Moore, Generalization in reinforcement learning: safely approximating the value function, in: Advances in Neural Information Processing Systems, 1995, pp. 369-376.
- (1995) Advances in Neural Information Processing Systems , pp. 369-376
- Boyan, J.¹ Moore, A.W.²

23
- 0036832950
- Technical update: Least-squares temporal difference learning
- J. Boyan Technical update: least-squares temporal difference learning Machine Learning 49 2-3 2002 233 246
- (2002) Machine Learning , vol.49 , Issue.23 , pp. 233-246
- Boyan, J.¹

24
- 0000719863
- Packet routing in dynamically changing networks: A reinforcement learning approach
- NIPS 1994
- J. Boyan, and M. Littman Packet routing in dynamically changing networks: a reinforcement learning approach Advances in neural information processing systems 6 NIPS 1994 1994
- (1994) Advances in Neural Information Processing Systems , vol.6
- Boyan, J.¹ Littman, M.²

25
- 0345062525
- Univ. Massachusetts, Amherst, MA, Tech. Rep. CMPSCI-94-49, June
- S. Bradtke, B. Ydstie, A. Barto, Adaptive linear quadratic control using policy iteration, Univ. Massachusetts, Amherst, MA, Tech. Rep. CMPSCI-94-49, June 1994.
- (1994) Adaptive Linear Quadratic Control Using Policy Iteration
- Bradtke, S.¹ Ydstie, B.² Barto, A.³

26
- 6344250104
- Ph.D. thesis, University of Massachusetts, Computer Science Dept. Tech. Rep.
- S. Bradtke, Incremental Dynamic Programming for On-Line Adaptive Optimal Control, Ph.D. thesis, University of Massachusetts, Computer Science Dept. Tech. Rep., 1994, pp. 94-62.
- (1994) Incremental Dynamic Programming for On-Line Adaptive Optimal Control , pp. 94-62
- Bradtke, S.¹

27
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- S.J. Brartke, and A. Barto Linear least-squares algorithms for temporal difference learning Machine Learning 22 1996 33 57
- (1996) Machine Learning , vol.22 , pp. 33-57
- Brartke, S.J.¹ Barto, A.²

28
- 85046476577
- CRC Press NY
- L. Busoniu, R. Babuska, B. De Schutter, and D. Ernst Reinforcement Learning and Dynamic Programming Using Function Approximators 2010 CRC Press NY
- (2010) Reinforcement Learning and Dynamic Programming Using Function Approximators
- Busoniu, L.¹ Babuska, R.² De Schutter, B.³ Ernst, D.⁴

29
- 52349102480
- Springer-Verlag Berlin
- X. Cao Stochastic Learning and Optimization 2009 Springer-Verlag Berlin
- (2009) Stochastic Learning and Optimization
- Cao, X.¹

30
- 27844448118
- A behavior-based scheme using reinforcement learning for autonomous underwater vehicles
- M. Carreras, and J. Yuh et al. A behavior-based scheme using reinforcement learning for autonomous underwater vehicles IEEE Journal of Oceanic Engineering 30 2 2005 416 427
- (2005) IEEE Journal of Oceanic Engineering , vol.30 , Issue.2 , pp. 416-427
- Carreras, M.¹ Yuh, J.²

31
- 0032208335
- Elevator group control using multiple reinforcement learning agents
- R.H. Crites, and A.G. Barto Elevator group control using multiple reinforcement learning agents Machine Learning 33 2-3 1998 235 262
- (1998) Machine Learning , vol.33 , Issue.23 , pp. 235-262
- Crites, R.H.¹ Barto, A.G.²

32
- 0003259931
- Improving elevator performance using reinforcement learning
- NIPS 1995
- R.H. Crites, and A.G. Barto Improving elevator performance using reinforcement learning Advances in Neural Information Processing Systems 8 NIPS 1995 1996
- (1996) Advances in Neural Information Processing Systems , vol.8
- Crites, R.H.¹ Barto, A.G.²

33
- 0000595242
- Note on learning rate schedules for stochastic optimization
- Lippman, et al. (Eds.)
- C. Darken, J. Moody, Note on learning rate schedules for stochastic optimization, in: Lippman, et al. (Eds.), Advances in Neural Information Processing Systems, vol. 3, 1991, pp. 1009-1016.
- (1991) Advances in Neural Information Processing Systems , vol.3 , pp. 1009-1016
- Darken, C.¹ Moody, J.²

34
- 0000430514
- The convergence of TD(λ) for general λ
- P. Dayan The convergence of TD(λ) for general λ Machine Learning 8 1992 341 362
- (1992) Machine Learning , vol.8 , pp. 341-362
- Dayan, P.¹

35
- 0028388685
- TD(λ) converges with probability 1
- P. Dayan, and T.J. Sejnowski TD(λ) converges with probability 1 Machine Learning 14 1994 295 301
- (1994) Machine Learning , vol.14 , pp. 295-301
- Dayan, P.¹ Sejnowski, T.J.²

36
- 84899029004
- Batch value function approximation via support vectors
- MIT Press Cambridge, MA
- T.G. Dietterich, and X. Wang Batch value function approximation via support vectors Advances in Neural Information Processing Systems vol. 14 2002 MIT Press Cambridge, MA 1491 1498
- (2002) Advances in Neural Information Processing Systems , vol.14 , pp. 1491-1498
- Dietterich, T.G.¹ Wang, X.²

37
- 0002278788
- Hierarchical reinforcement learning with the Max-Q value function decomposition
- T.G. Dietterich Hierarchical reinforcement learning with the Max-Q value function decomposition Journal of Artificial Intelligence Research 13 2000 227 303
- (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
- Dietterich, T.G.¹

38
- 0003506152
- State abstraction in MAXQ hierarchical reinforcement learning
- S.A. Solla, T.K. Leen, K.R. Muller (Eds.) NIPS
- T.G. Dietterich, State abstraction in MAXQ hierarchical reinforcement learning, in: S.A. Solla, T.K. Leen, K.R. Muller (Eds.), Advances in Neural Information Processing Systems, NIPS, 2000, pp. 994-1000.
- (2000) Advances in Neural Information Processing Systems , pp. 994-1000
- Dietterich, T.G.¹

39
- 4444312102
- Integrating guidance into relational reinforcement learning
- K. Driessens, and S. Dzeroski Integrating guidance into relational reinforcement learning Machine Learning 57 2004 271 304
- (2004) Machine Learning , vol.57 , pp. 271-304
- Driessens, K.¹ Dzeroski, S.²

40
- 84948172455
- Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner
- L. De Raedt, P. Flach, Lecture Notes in Artificial Intelligence Springer-Verlag
- K. Driessens, J. Ramon, and H. Blockeel Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner L. De Raedt, P. Flach, Proceedings of the 13th European Conference on Machine Learning Lecture Notes in Artificial Intelligence vol. 2167 2001 Springer-Verlag 97 108
- (2001) Proceedings of the 13th European Conference on Machine Learning , vol.2167 , pp. 97-108
- Driessens, K.¹ Ramon, J.² Blockeel, H.³

41
- 1942421161
- Relational instance based regression for relational reinforcement learning
- AAAI Press
- K. Driessens, and J. Ramon Relational instance based regression for relational reinforcement learning Proceedings of the Twentieth International Conference on Machine Learning 2003 AAAI Press 123 130
- (2003) Proceedings of the Twentieth International Conference on Machine Learning , pp. 123-130
- Driessens, K.¹ Ramon, J.²

42
- 1942421151
- "Bayes meets bellman: The Gaussian Process approach to temporal difference learning
- Washington, DC
- Y. Engel, S. Mannor, R. Meir, "Bayes meets bellman: the Gaussian Process approach to temporal difference learning, in: Proceedings of the Twentieth International Conference of Machine Learning, Washington, DC, 2003, pp. 154-161.
- (2003) Proceedings of the Twentieth International Conference of Machine Learning , pp. 154-161
- Engel, Y.¹ Mannor, S.² Meir, R.³

43
- 0043026775
- Helicopter trimming and tracking control using direct neural dynamic programming
- R. Enns, and J. Si Helicopter trimming and tracking control using direct neural dynamic programming IEEE Transactions on Neural Networks 14 4 2003 929 939
- (2003) IEEE Transactions on Neural Networks , vol.14 , Issue.4 , pp. 929-939
- Enns, R.¹ Si, J.²

44
- 21844465127
- Tree-based batch mode reinforcement learning
- D. Ernst, P. Geurts, and L. Wehenkel Tree-based batch mode reinforcement learning Journal of Machine Learning Research 6 2005 503 556
- (2005) Journal of Machine Learning Research , vol.6 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

45
- 1442265466
- Power systems stability control: Reinforcement learning framework
- D. Ernst, and M. Glavic et al. Power systems stability control: reinforcement learning framework IEEE Transactions on Power Systems 19 1 2004 427 435
- (2004) IEEE Transactions on Power Systems , vol.19 , Issue.1 , pp. 427-435
- Ernst, D.¹ Glavic, M.²

46
- 83155175393
- Model selection in reinforcement learning
- A. Farahmand, and Cs. Szepesvári Model selection in reinforcement learning Machine Learning 85 3 2011 299 332
- (2011) Machine Learning , vol.85 , Issue.3 , pp. 299-332
- Farahmand, A.¹ Szepesvári, Cs.²

47
- 70049096468
- Regularized policy iteration
- A.m. Farahmand, M. Ghavamzadeh, Cs. Szepesvári, and S. Mannor Regularized policy iteration NIPS 2008 441 448
- (2008) NIPS , pp. 441-448
- Farahmand, A.M.¹ Ghavamzadeh, M.² Szepesvári, Cs.³ Mannor, S.⁴

48
- 77952245702
- Distributed Q-Learning for aggregated interference control in cognitive radio networks
- A. Galindo-Serrano, and L. Giupponi Distributed Q-Learning for aggregated interference control in cognitive radio networks IEEE Transactions on Vehicular Technology 59 4 2010 1823 1834
- (2010) IEEE Transactions on Vehicular Technology , vol.59 , Issue.4 , pp. 1823-1834
- Galindo-Serrano, A.¹ Giupponi, L.²

49
- 33748273074
- Graph kernels and Gaussian Processes for relational reinforcement learning
- K. Driessens, J. Ramon, and T. Gärtner Graph kernels and Gaussian Processes for relational reinforcement learning Machine Learning 64 1-3 2006 91 119
- (2006) Machine Learning , vol.64 , Issue.13 , pp. 91-119
- Driessens, K.¹ Ramon, J.² Gärtner, T.³

50
- 9444266406
- On graph kernels: Hardness results and efficient alternatives
- M.W.B. Scholkopf (Ed.)
- T. Gärtner, P. Flach, S. Wrobel, On graph kernels: hardness results and efficient alternatives, in: M.W.B. Scholkopf (Ed.), Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop, 2003, pp. 129-143.
- (2003) Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop , pp. 129-143
- Gärtner, T.¹

51
- 0742319170
- Reinforcement learning for long-run average cost
- A. Gosavi Reinforcement learning for long-run average cost European Journal of Operational Research 155 2004 654 674
- (2004) European Journal of Operational Research , vol.155 , pp. 654-674
- Gosavi, A.¹

52
- 33748998787
- Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
- A.P. George, and W.B. Powell Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming Machine Learning 65 2006 167 198
- (2006) Machine Learning , vol.65 , pp. 167-198
- George, A.P.¹ Powell, W.B.²

53
- 56449115872
- ILSTD: Eligibility traces and convergence analysis
- B. Scḧolkopf, J. Platt, T. Hoffman, MIT Press Cambridge, MA
- A. Geramifard, M. Bowling, M. Zinkevich, and R.S. Sutton iLSTD: eligibility traces and convergence analysis B. Scḧolkopf, J. Platt, T. Hoffman, Advances in Neural Information Processing Systems vol. 19 2007 MIT Press Cambridge, MA 441 448
- (2007) Advances in Neural Information Processing Systems , vol.19 , pp. 441-448
- Geramifard, A.¹ Bowling, M.² Zinkevich, M.³ Sutton, R.S.⁴

54
- 84864065133
- Bayesian policy gradient algorithms
- M. Ghavamzadeh, Y. Engel, Bayesian policy gradient algorithms, in: Advances in Neural Information Processing Systems, 2006, pp. 457-464.
- (2006) Advances in Neural Information Processing Systems , pp. 457-464
- Ghavamzadeh, M.¹ Engel, Y.²

55
- 36949027865
- Hierarchical average reward reinforcement learning
- M. Ghavamzadeh, and S. Mahadevan Hierarchical average reward reinforcement learning Journal of Machine Learning Research 8 2007 2629 2669
- (2007) Journal of Machine Learning Research , vol.8 , pp. 2629-2669
- Ghavamzadeh, M.¹ Mahadevan, S.²

56
- 0004019973
- Convolution Kernels on Discrete Structures
- University of California at Santa Cruz
- D. Haussler, Convolution Kernels on Discrete Structures, Technical Report, Department of Computer Science, University of California at Santa Cruz, 1999.
- (1999) Technical Report, Department of Computer Science
- Haussler, D.¹

57
- 38349050495
- Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning
- B. Hengst, Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning, in: AI 2007: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 4830, 2007, pp. 58-67.
- (2007) AI 2007: Advances in Artificial Intelligence, Lecture Notes in Computer Science , vol.4830 , pp. 58-67
- Hengst, B.¹

58
- 74349085491
- Springer
- Q.Y. Hu, and W.Y. Yue Markov Decision Processes with Their Applications 2008 Springer
- (2008) Markov Decision Processes with Their Applications
- Hu, Q.Y.¹ Yue, W.Y.²

59
- 79957981549
- Transformation invariant on-line target recognition
- K.M. Iftekharuddin Transformation invariant on-line target recognition IEEE Transactions on Neural Networks 22 6 2011 906 918
- (2011) IEEE Transactions on Neural Networks , vol.22 , Issue.6 , pp. 906-918
- Iftekharuddin, K.M.¹

60
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- T. Jaakkola, M. Jordan, and S. Singh On the convergence of stochastic iterative dynamic programming algorithms Neural Computation 6 6 1994 185 1201
- (1994) Neural Computation , vol.6 , Issue.6 , pp. 185-1201
- Jaakkola, T.¹ Jordan, M.² Singh, S.³

61
- 78049317641
- Reinforcement based mobile robot navigation in dynamic environment
- M.A.K. Jaradat, and M. AI-Rousan et al. Reinforcement based mobile robot navigation in dynamic environment Robotics and Computer-Integrated Manufacturing 27 2011 135 149
- (2011) Robotics and Computer-Integrated Manufacturing , vol.27 , pp. 135-149
- Jaradat, M.A.K.¹ Ai-Rousan, M.²

62
- 80053224177
- Efficient exploration in reinforcement learning-based cognitive radio spectrum sharing
- T. Jiang, and D. Grace et al. Efficient exploration in reinforcement learning-based cognitive radio spectrum sharing IET Communication 5 10 2011 1309 1317
- (2011) IET Communication , vol.5 , Issue.10 , pp. 1309-1317
- Jiang, T.¹ Grace, D.²

63
- 68949099445
- Hybrid least-squares algorithms for approximate policy evaluation
- J. Johns, M. Petrik, and S. Mahadevan Hybrid least-squares algorithms for approximate policy evaluation Machine Learning 76 2009 243 256
- (2009) Machine Learning , vol.76 , pp. 243-256
- Johns, J.¹ Petrik, M.² Mahadevan, S.³

64
- 84898930479
- A natural policy gradient
- S. Kakade A natural policy gradient Advances in Neural Information Processing Systems 2002 1531 1538
- (2002) Advances in Neural Information Processing Systems , pp. 1531-1538
- Kakade, S.¹

65
- 84858754385
- Policy search for motor primitives in robotics
- NIPS 2008
- J. Kober, and J. Peters Policy search for motor primitives in robotics Advances in Neural Information Processing Systems NIPS 2008 2008
- (2008) Advances in Neural Information Processing Systems
- Kober, J.¹ Peters, J.²

66
- 9444275934
- Machine learning for fast quadrupedal locomotion
- D.L. McGuinness, G. Ferguson (Eds.) AAAI Press, Menlo Park
- N. Kohl, P. Stone, Machine learning for fast quadrupedal locomotion, in: D.L. McGuinness, G. Ferguson (Eds.), Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI 2004), AAAI Press, Menlo Park, pp. 611-616.
- Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI 2004) , pp. 611-616
- Kohl, N.¹ Stone, P.²

67
- 84898938510
- Actor-critic algorithms
- MIT Press Cambridge, MA
- V.R. Konda, and J.N. Tsitsiklis Actor-critic algorithms Advances in Neural Information Processing Systems vol. 12 2000 MIT Press Cambridge, MA
- (2000) Advances in Neural Information Processing Systems , vol.12
- Konda, V.R.¹ Tsitsiklis, J.N.²

68
- 4644323293
- Least-squares policy iteration
- M.G. Lagoudakis, and R. Parr Least-squares policy iteration Journal of Machine Learning Research 4 2003 1107 1149
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

69
- 70349092184
- Special issue on approximate dynamic programming and reinforcement learning for feedback control
- F.L. Lewis, G. Lendaris, and D. Liu Special issue on approximate dynamic programming and reinforcement learning for feedback control IEEE Transactions on Systems, Man, and Cybernetics B 38 4 2008
- (2008) IEEE Transactions on Systems, Man, and Cybernetics B , vol.38 , Issue.4
- Lewis, F.L.¹ Lendaris, G.² Liu, D.³

70
- 70349116541
- Reinforcement learning and adaptive dynamic programming for feedback control
- F.L. Lewis, and D. Vrabie Reinforcement learning and adaptive dynamic programming for feedback control IEEE Circuits and Systems Magazine 9 3 2009 32 50
- (2009) IEEE Circuits and Systems Magazine , vol.9 , Issue.3 , pp. 32-50
- Lewis, F.L.¹ Vrabie, D.²

71
- 26844483839
- A self-learning call admission control scheme for CDMA cellular networks
- D. Liu, Y. Zhang, and H. Zhang A self-learning call admission control scheme for CDMA cellular networks IEEE Transactions on Neural Networks 16 5 2005 1219 1228
- (2005) IEEE Transactions on Neural Networks , vol.16 , Issue.5 , pp. 1219-1228
- Liu, D.¹ Zhang, Y.² Zhang, H.³

72
- 84979245843
- Convergent temporal-difference learning with arbitrary smooth function approximation
- J. Laferty, C. Williams, MIT Press Cambridge, MA, USA
- H.R. Maei, C. Szepesvári, S. Bhatnagar, D. Precup, and R.S. Sutton Convergent temporal-difference learning with arbitrary smooth function approximation J. Laferty, C. Williams, Advances in Neural Information Processing Systems vol. 22 2010 MIT Press Cambridge, MA, USA
- (2010) Advances in Neural Information Processing Systems , vol.22
- Maei, H.R.¹ Szepesvári, C.² Bhatnagar, S.³ Precup, D.⁴ Sutton, R.S.⁵

73
- 77956541799
- Toward off-policy learning control with function approximation
- J. Furnkranz, T. Joachims, Omnipress
- H.R. Maei, C. Szepesvári, S. Bhatnagar, and R. Sutton Toward off-policy learning control with function approximation J. Furnkranz, T. Joachims, ICML 2010 Omnipress 719 726
- (2010) ICML , pp. 719-726
- Maei, H.R.¹ Szepesvári, C.² Bhatnagar, S.³ Sutton, R.⁴

74
- 31844433360
- Proto-value functions: Developmental reinforcement learning
- S. Mahadevan, Proto-value functions: developmental reinforcement learning, in: Proceedings of the 22nd International Conference on Machine Learning, 2005, pp. 553-560.
- (2005) Proceedings of the 22nd International Conference on Machine Learning , pp. 553-560
- Mahadevan, S.¹

75
- 35748957806
- Proto-value functions: A laplacian framework for learning representation and control in markov decision processes
- S. Mahadevan, and M. Maggioni Proto-value functions: a laplacian framework for learning representation and control in markov decision processes Journal of Machine Learning Research 8 2007 2169 2231
- (2007) Journal of Machine Learning Research , vol.8 , pp. 2169-2231
- Mahadevan, S.¹ Maggioni, M.²

76
- 84867615954
- Tuning-free step-size adaptation
- Kyoto, Japan
- A.R. Mahmood, R. Sutton, T.Degris, P.M. Pilarski, Tuning-free step-size adaptation, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012.
- (2012) Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
- Mahmood, A.R.¹ Sutton, R.² Degris, T.³ Pilarski, P.M.⁴

77
- 79952925038
- Reinforcement learning in first person shooter games
- M. McPartl, and M. Gallagher Reinforcement learning in first person shooter games IEEE Transactions on Computational Intelligence and AI in Games 3 1 2011 43 56
- (2011) IEEE Transactions on Computational Intelligence and AI in Games , vol.3 , Issue.1 , pp. 43-56
- McPartl, M.¹ Gallagher, M.²

78
- 0013500961
- Ph.D. Thesis, Princeton University
- M.L. Minsky, Theory of Neural-Analog Reinforcement Systems and its Application to the Brain-Model Problem, Ph.D. Thesis, Princeton University, 1954.
- (1954) Theory of Neural-Analog Reinforcement Systems and Its Application to the Brain-Model Problem
- Minsky, M.L.¹

79
- 0033171330
- Cognitive radio: Making software radios more personal
- J. Mitola III, and G.Q. Maguire Cognitive radio: making software radios more personal IEEE Personal Communications 6 4 1999 13 18
- (1999) IEEE Personal Communications , vol.6 , Issue.4 , pp. 13-18
- Mitola III, J.¹ Maguire, G.Q.²

80
- 33847661590
- Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system
- S. Mohagheghi, and G.K. Venayagamoorthy et al. Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system IEEE Transactions on Power Systems 21 4 2006 1744 1754
- (2006) IEEE Transactions on Power Systems , vol.21 , Issue.4 , pp. 1744-1754
- Mohagheghi, S.¹ Venayagamoorthy, G.K.²

81
- 33947313024
- A reinforcement learning model to assess market power under auction-based energy pricingm
- V. Nanduri, and T.K. Das A reinforcement learning model to assess market power under auction-based energy pricingm IEEE Transactions on Power Systems 22 1 2007 85 95
- (2007) IEEE Transactions on Power Systems , vol.22 , Issue.1 , pp. 85-95
- Nanduri, V.¹ Das, T.K.²

82
- 0037288398
- Least squares policy evaluation algorithms with linear function approximation
- A. Nedic, and D.P. Bertsekas Least squares policy evaluation algorithms with linear function approximation Discrete Event Dynamic Systems 13 1 2003 79 110
- (2003) Discrete Event Dynamic Systems , vol.13 , Issue.1 , pp. 79-110
- Nedic, A.¹ Bertsekas, D.P.²

83
- 84898980684
- Autonomous helicopter flight via reinforcement learning
- NIPS 2003
- A.Y. Ng, and H.J. Kim et al. Autonomous helicopter flight via reinforcement learning Advances in Neural Information Processing Systems 16 NIPS 2003 2004
- (2004) Advances in Neural Information Processing Systems , vol.16
- Ng, A.Y.¹ Kim, H.J.²

84
- 33747452437
- Adaptive stock trading with dynamic asset allocation using reinforcement learning
- J.O.J. Lee, J.W. Lee, and B.-T. Zhang Adaptive stock trading with dynamic asset allocation using reinforcement learning Information Sciences 176 15 2006 2121 2147
- (2006) Information Sciences , vol.176 , Issue.15 , pp. 2121-2147
- Lee, J.O.J.¹ Lee, J.W.² Zhang, B.-T.³

85
- 0036832956
- Kernel-based reinforcement learning
- D. Ormoneit, and S. Sen Kernel-based reinforcement learning Machine Learning 49 2-3 2002 161 178
- (2002) Machine Learning , vol.49 , Issue.23 , pp. 161-178
- Ormoneit, D.¹ Sen, S.²

86
- 84898956770
- Reinforcement learning with hierarchies of machines
- MIT Press Cambridge, MA
- R. Parr, and S. Russell Reinforcement learning with hierarchies of machines Advances in Neural Information Processing Systems 1998 MIT Press Cambridge, MA 1043 1049
- (1998) Advances in Neural Information Processing Systems , pp. 1043-1049
- Parr, R.¹ Russell, S.²

87
- 0032136860
- Delayed reinforcement learning for adaptive image segmentation and feature extraction
- J. Peng, and B. Bhanu Delayed reinforcement learning for adaptive image segmentation and feature extraction IEEE Transactions on System Man and Cybernetics-Part C 28 3 1998 482 488
- (1998) IEEE Transactions on System Man and Cybernetics-Part C , vol.28 , Issue.3 , pp. 482-488
- Peng, J.¹ Bhanu, B.²

88
- 0031998786
- Closed-loop object recognition using reinforcement learning
- J. Peng, and B. Bhaun Closed-loop object recognition using reinforcement learning IEEE Transactions on Pattern Analysis and Machine Intelligence 20 2 1998 139 154
- (1998) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.20 , Issue.2 , pp. 139-154
- Peng, J.¹ Bhaun, B.²

89
- 40649106649
- Natural actor-critic
- J. Peters, and S. Schaal Natural actor-critic Neurocomputing 71 2008 1180 1190
- (2008) Neurocomputing , vol.71 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

90
- 34250635407
- Policy gradient methods for robotics
- Beijing, China
- J. Peters, S. Schaal, Policy gradient methods for robotics, in: Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots ans Systems, Beijing, China, 2006, pp. 2219-2225.
- (2006) Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots Ans Systems , pp. 2219-2225
- Peters, J.¹ Schaal, S.²

91
- 34447553096
- Reinforcement learning for humanoid robotics
- J. Peters, S. Vijayakumar, S. Schaal, Reinforcement learning for humanoid robotics, in: IEEE/RSJ International Conference on Humanoid Robotics, 2003.
- (2003) IEEE/RSJ International Conference on Humanoid Robotics
- Peters, J.¹ Vijayakumar, S.² Schaal, S.³

92
- 47349092417
- Wiley NY
- W.B. Powell Approximate Dynamic Programming: Solving the Curses of Dimensionality 2007 Wiley NY
- (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality
- Powell, W.B.¹

93
- 79958173163
- Reinforcement learning with function approximation for traffic signal control
- L.A. Prashanth, and S. Bhatnagar Reinforcement learning with function approximation for traffic signal control IEEE Transactions on Intelligence Transportation Systems 12 2 2011 412 421
- (2011) IEEE Transactions on Intelligence Transportation Systems , vol.12 , Issue.2 , pp. 412-421
- Prashanth, L.A.¹ Bhatnagar, S.²

94
- 0031236002
- Adaptive critic designs
- D.V. Prokhorov, and D.C. Wunsch Adaptive critic designs IEEE Transactions Neural Networks 8 5 1997 997 1007
- (1997) IEEE Transactions Neural Networks , vol.8 , Issue.5 , pp. 997-1007
- Prokhorov, D.V.¹ Wunsch, D.C.²

95
- 84899026055
- Gaussian processes in reinforcement learning
- S. Thrun, L.K. Saul, B. Schölkopf, MIT Press
- C.E. Rasmussen, and M. Kuss Gaussian processes in reinforcement learning S. Thrun, L.K. Saul, B. Schölkopf, Advances in Neural Information Processing Systems vol. 16 2004 MIT Press 751 759
- (2004) Advances in Neural Information Processing Systems , vol.16 , pp. 751-759
- Rasmussen, C.E.¹ Kuss, M.²

96
- 67650996818
- Reinforcement learning for robot soccer
- M. Riedmiller, and T. Gabel et al. Reinforcement learning for robot soccer Autonomous Robots 27 1 2009 55 74
- (2009) Autonomous Robots , vol.27 , Issue.1 , pp. 55-74
- Riedmiller, M.¹ Gabel, T.²

97
- 79958857418
- Learning to drive in 20 min
- Jeju, Korea
- M. Riedmiller, M. Montemerlo, et al., Learning to drive in 20 min, in: Proceedings of the FBIT 2007 Conference, Jeju, Korea, 2007.
- (2007) Proceedings of the FBIT 2007 Conference
- Riedmiller, M.¹ Montemerlo, M.²

98
- 84876850401
- Natural actor-critic for road traffic optimization
- S. Richter, D. Aberdeen, J. Yu, Natural actor-critic for road traffic optimisation, in: Advances in Neural Information Processing Systems, 2006, pp. 3522-3529.
- (2006) Advances in Neural Information Processing Systems , pp. 3522-3529
- Richter, S.¹ Aberdeen, D.² Yu, J.³

99
- 0001201756
- Some studies in machine learning using game of checkers
- A.L. Samuel Some studies in machine learning using game of checkers IBM Jounal on Research and Development 3 1959 211 229
- (1959) IBM Jounal on Research and Development , vol.3 , pp. 211-229
- Samuel, A.L.¹

100
- 0004094721
- MIT Press Cambridge, MA
- B. Schölkopf, and A. Smola Learning with Kernels 2002 MIT Press Cambridge, MA
- (2002) Learning with Kernels
- Schölkopf, B.¹ Smola, A.²

101
- 0032594954
- Input space vs feature space in kernel-based algorithms
- B. Schölkopf, S. Mika, C.J.C. Burges, P. Knirsch, K.-R. Müller, G. Rätsch, and A.J. Smola Input space vs feature space in kernel-based algorithms IEEE Transactions on Neural Networks 10 3 1999 1000 1017
- (1999) IEEE Transactions on Neural Networks , vol.10 , Issue.3 , pp. 1000-1017
- Schölkopf, B.¹ Mika, S.² Burges, C.J.C.³ Knirsch, P.⁴ Müller, K.-R.⁵ Rätsch, G.⁶ Smola, A.J.⁷

102
- 85152626183
- A reinforcement learning method for maximizing undiscounted rewards
- Morgan Kaufmann
- A. Schwartz A reinforcement learning method for maximizing undiscounted rewards Proceedings of the Tenth Annual Conference on Machine Learning 1993 Morgan Kaufmann 298 305
- (1993) Proceedings of the Tenth Annual Conference on Machine Learning , pp. 298-305
- Schwartz, A.¹

103
- 69249229732
- Predicting investment behavior: An augmented reinforcement learning model
- T. Shimokawa, and K. Suzuki et al. Predicting investment behavior: an augmented reinforcement learning model Neurocomputing 72 2009 3447 3461
- (2009) Neurocomputing , vol.72 , pp. 3447-3461
- Shimokawa, T.¹ Suzuki, K.²

104
- 84880900542
- Reinforcement learning of local shape in the game of Go
- D. Silver, and R.S. Sutton et al. Reinforcement learning of local shape in the game of Go Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI 2007) 2007 1053 1058
- (2007) Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI 2007) , pp. 1053-1058
- Silver, D.¹ Sutton, R.S.²

105
- 84898972974
- Reinforcement learning for dynamic channel allocation in cellular telephone systems
- NIPS 1996
- S.P. Singh, and D. Bertsekas Reinforcement learning for dynamic channel allocation in cellular telephone systems Advances in Neural Information Processsing Systems 9 NIPS 1996 1997 974 980
- (1997) Advances in Neural Information Processsing Systems , vol.9 , pp. 974-980
- Singh, S.P.¹ Bertsekas, D.²

106
- 0033901602
- Convergence results for single-step on-policy reinforcement-learning algorithms
- S.P. Singh, T. Jaakkola, M.L. Littman, and Cs. Szepesvari Convergence results for single-step on-policy reinforcement-learning algorithms Machine Learning 38 2000 287 308
- (2000) Machine Learning , vol.38 , pp. 287-308
- Singh, S.P.¹ Jaakkola, T.² Littman, M.L.³ Szepesvari, Cs.⁴

107
- 0028497385
- An upper bound on the loss from approximate optimal value functions
- S.P. Singh, and R.C. Yee An upper bound on the loss from approximate optimal value functions Machine Learning 16 3 1994 227 233
- (1994) Machine Learning , vol.16 , Issue.3 , pp. 227-233
- Singh, S.P.¹ Yee, R.C.²

108
- 0003540196
- Springer-Verlag Berlin
- T. Söderström, and P. Stoica Instrumental Variable Methods in System Identification 1983 Springer-Verlag Berlin
- (1983) Instrumental Variable Methods in System Identification
- Söderström, T.¹ Stoica, P.²

109
- 27544506565
- Reinforcement learning for RoboCup-soccer keepaway
- P. Stone, and R.S. Sutton et al. Reinforcement learning for RoboCup-soccer keepaway Adaptive Behavior 13 3 2005 165 188
- (2005) Adaptive Behavior , vol.13 , Issue.3 , pp. 165-188
- Stone, P.¹ Sutton, R.S.²

110
- 0004102479
- MIT Press Cambridge MA
- R. Sutton, and A.G. Barto Reinforcement Learning. An Introduction 1998 MIT Press Cambridge MA
- (1998) Reinforcement Learning. An Introduction
- Sutton, R.¹ Barto, A.G.²

111
- 0026852362
- Reinforcement learning is direct adaptive control
- R. Sutton, A.G. Barto, and R.J. Williams Reinforcement learning is direct adaptive control IEEE Control Systems 12 2 1992 19 22
- (1992) IEEE Control Systems , vol.12 , Issue.2 , pp. 19-22
- Sutton, R.¹ Barto, A.G.² Williams, R.J.³

112
- 0000580224
- A temporal-difference model of classical conditioning
- R. Sutton, A.G. Barto, A temporal-difference model of classical conditioning, in: Proceedings of the 9th Annual Conference Cognitive Science Society, 1987, pp. 355-378.
- (1987) Proceedings of the 9th Annual Conference Cognitive Science Society , pp. 355-378
- Sutton, R.¹ Barto, A.G.²

113
- 33847202724
- Learning to predict by the method of temporal differences
- R. Sutton Learning to predict by the method of temporal differences Machine Learning 3 1988 9 44
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.¹

114
- 71149099079
- Fast gradient-descent methods for temporal-difference learning with linear function approximation
- R. Sutton, H.R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvari, E. Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation, in: Proceedings of the 26th Annual International Conference on Machine Learning (ICML-09), 2009, pp. 993-1000.
- (2009) Proceedings of the 26th Annual International Conference on Machine Learning (ICML-09) , pp. 993-1000
- Sutton, R.¹ Maei, H.R.² Precup, D.³ Bhatnagar, S.⁴ Silver, D.⁵ Szepesvari, C.⁶ Wiewiora, E.⁷

115
- 77956513316
- A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation
- MIT Press Cambridge, MA, USA
- R. Sutton, C. Szepesvari, and H.R. Maei A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation Advances in Neural Information Processing Systems vol. 21 2009 MIT Press Cambridge, MA, USA 1609 1616
- (2009) Advances in Neural Information Processing Systems , vol.21 , pp. 1609-1616
- Sutton, R.¹ Szepesvari, C.² Maei, H.R.³

116
- 0026971570
- Adapting bias by gradient descent: An incremental version of delta-bar-delta
- R. Sutton, Adapting bias by gradient descent: an incremental version of delta-bar-delta, in: Proceedings of the 10th National Conference on Artificial Intelligence, 1992, pp. 171-176.
- (1992) Proceedings of the 10th National Conference on Artificial Intelligence , pp. 171-176
- Sutton, R.¹

117
- 0033170372
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
- R. Sutton, D. Precup, and S. Singh Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning Artificial Intelligence 112 1999 181 211
- (1999) Artificial Intelligence , vol.112 , pp. 181-211
- Sutton, R.¹ Precup, D.² Singh, S.³

118
- 80053284668
- UAI
- R. Sutton, Cs. Szepesvári, A. Geramifard, M. Bowling, Dyna-style planning with linear function approximation and prioritized sweeping, UAI, 2008, pp. 528-536.
- (2008) Dyna-style Planning with Linear Function Approximation and Prioritized Sweeping , pp. 528-536
- Sutton, R.¹

119
- 79955859296
- Morgan and Claypool
- Cs. Szepesvári Algorithms for Reinforcement Learning 2010 Morgan and Claypool
- (2010) Algorithms for Reinforcement Learning
- Szepesvári, Cs.¹

120
- 0000985504
- TD-Gammon, a self-teaching backgammon program, achieves master-level play
- G. Tesauro TD-Gammon, a self-teaching backgammon program, achieves master-level play Neural Computation 6 1994 215 219
- (1994) Neural Computation , vol.6 , pp. 215-219
- Tesauro, G.¹

121
- 0003270924
- Issues in using function approximation for reinforcement learning
- S.Thrun, A. Schwartz, Issues in using function approximation for reinforcement learning, in: Proceedings of the Fourth Connectionist Models Summer School, 1993, pp. 255-263.
- (1993) Proceedings of the Fourth Connectionist Models Summer School , pp. 255-263
- Thrun, S.¹ Schwartz, A.²

122
- 27144547178
- Asynchronous Stochastic Approximation and Q-learning
- MIT, Cambridge, MA
- J.N. Tsitsiklis, Asynchronous Stochastic Approximation and Q-learning, Technical Report LIDS-P-2172, Laboratory for Information and Decision Systems, MIT, Cambridge, MA, 1993.
- (1993) Technical Report LIDS-P-2172, Laboratory for Information and Decision Systems
- Tsitsiklis, J.N.¹

123
- 0031143730
- An analysis of temporal difference learning with function approximation
- J.N. Tsitsiklis, and B.V. Roy An analysis of temporal difference learning with function approximation IEEE Transactions on Automatic Control 42 5 1997 674 690
- (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Roy, B.V.²

124
- 0031636218
- Tree based discretization for continuous state space reinforcement learning
- W.T.B. Uther, M.M. Veloso, Tree based discretization for continuous state space reinforcement learning, in: Proceedings of AAAI-98, 1998, pp. 769-774.
- (1998) Proceedings of AAAI-98 , pp. 769-774
- Uther, W.T.B.¹ Veloso, M.M.²

125
- 77950630017
- Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
- K.G. Vamvoudakis, and Frank L. Lewis Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem Automatica 46 5 2010 878 888
- (2010) Automatica , vol.46 , Issue.5 , pp. 878-888
- Vamvoudakis, K.G.¹ Lewis, F.L.²

126
- 79960897012
- Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations
- K.G. Vamvoudakis, and Frank L. Lewis Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations Automatica 47 8 2011 1556 1569
- (2011) Automatica , vol.47 , Issue.8 , pp. 1556-1569
- Vamvoudakis, K.G.¹ Lewis, F.L.²

127
- 0003991806
- Wiley Interscience NewYork
- V. Vapnik Statistical Learning Theory 1998 Wiley Interscience NewYork
- (1998) Statistical Learning Theory
- Vapnik, V.¹

128
- 0036565019
- Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator
- G.K. Venayagamoorthy, R.G. Harley, and D.C. Wunsch Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator IEEE Transactions on Neural Networks 13 3 2002 764 773
- (2002) IEEE Transactions on Neural Networks , vol.13 , Issue.3 , pp. 764-773
- Venayagamoorthy, G.K.¹ Harley, R.G.² Wunsch, D.C.³

129
- 4143065966
- Reinforcement learning for reactive power control
- J.G. Vlachogiannis, and N.D. Hatziargyriou Reinforcement learning for reactive power control IEEE Transactions on Power Systems 19 3 2004 1225 1317
- (2004) IEEE Transactions on Power Systems , vol.19 , Issue.3 , pp. 1225-1317
- Vlachogiannis, J.G.¹ Hatziargyriou, N.D.²

130
- 58349110975
- Adaptive optimal control for continuous-time linear systems based on policy iteration
- D. Vrabie, F. Lewis, and M. Abu-Khalaf Adaptive optimal control for continuous-time linear systems based on policy iteration Automatica 45 2 2009 477 484
- (2009) Automatica , vol.45 , Issue.2 , pp. 477-484
- Vrabie, D.¹ Lewis, F.² Abu-Khalaf, M.³

131
- 67349145396
- Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
- D. Vrabie, and F. Lewis Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems Neural Networks 22 3 2009 237 246
- (2009) Neural Networks , vol.22 , Issue.3 , pp. 237-246
- Vrabie, D.¹ Lewis, F.²

132
- 34250731840
- A fuzzy actor-critic reinforcement learning network
- X. Wang, Y. Cheng, and J.-Q. Yi A fuzzy actor-critic reinforcement learning network Information Sciences 177 18 2007 3764 3781
- (2007) Information Sciences , vol.177 , Issue.18 , pp. 3764-3781
- Wang, X.¹ Cheng, Y.² Yi, J.-Q.³

133
- 66449130966
- Adaptive dynamic programming: An introduction
- F.Y. Wang, H. Zhang, and D. Liu Adaptive dynamic programming: an introduction IEEE Computational Intelligence Magazine 2009 39 47
- (2009) IEEE Computational Intelligence Magazine , pp. 39-47
- Wang, F.Y.¹ Zhang, H.² Liu, D.³

134
- 0004049893
- Ph.D. thesis, Cambridge Univ., Cambridge, England
- C. Watkins, Learning from Delayed Rewards, Ph.D. thesis, Cambridge Univ., Cambridge, England, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

135
- 34249833101
- Q-Learning
- C. Watkins, and P. Dayan Q-Learning Machine Learning 8 1992 279 292
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

136
- 67349247013
- Intelligence in the brain: A theory of how it works and how to build it
- P.J. Werbos Intelligence in the brain: a theory of how it works and how to build it Neural Networks 2009 200 212
- (2009) Neural Networks , pp. 200-212
- Werbos, P.J.¹

137
- 34548766755
- Using ADP to understand and replicate brain intelligence: The next level design
- P.J. Werbos, Using ADP to understand and replicate brain intelligence: the next level design, in: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007, pp. 209-216.
- (2007) IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning , pp. 209-216
- Werbos, P.J.¹

138
- 0003529238
- Ph.D. thesis, Committee Appl. Math. Harvard Univ.
- P.J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavior Sciences, Ph.D. thesis, Committee Appl. Math. Harvard Univ., 1974.
- (1974) Beyond Regression: New Tools for Prediction and Analysis in the Behavior Sciences
- Werbos, P.J.¹

139
- 0015667648
- Punish/reward: Learning with a critic in adaptive threshold systems
- B. Widrow, N. Gupta, and S. Maitra Punish/reward: Learning with a critic in adaptive threshold systems IEEE Transactions on Systems, Man, and Cybernetics SMC-3 5 1973 455 465
- (1973) IEEE Transactions on Systems, Man, and Cybernetics , vol.SMC-3 , Issue.5 , pp. 455-465
- Widrow, B.¹ Gupta, N.² Maitra, S.³

140
- 84861703734
- Science Press Beijing
- X. Xu Reinforcement Learning and Approximate Dynamic Programming 2010 Science Press Beijing
- (2010) Reinforcement Learning and Approximate Dynamic Programming
- Xu, X.¹

141
- 0041345290
- Efficient reinforcement learning using recursive least-squares methods
- X. Xu, H.G. He, and D.W. Hu Efficient reinforcement learning using recursive least-squares methods Journal of Artificial Intelligence Research 16 2002 259 292
- (2002) Journal of Artificial Intelligence Research , vol.16 , pp. 259-292
- Xu, X.¹ He, H.G.² Hu, D.W.³

142
- 33750328566
- Kernel least-squares temporal difference learning
- X. Xu, T. Xie, D.W. Hu, and X.C. Lu Kernel least-squares temporal difference learning International Journal of Information Technology 11 9 2005 54 63
- (2005) International Journal of Information Technology , vol.11 , Issue.9 , pp. 54-63
- Xu, X.¹ Xie, T.² Hu, D.W.³ Lu, X.C.⁴

143
- 34547098844
- Kernel based least-squares policy iteration for reinforcement learning
- X. Xu, D.W. Hu, and X.C. Lu Kernel based least-squares policy iteration for reinforcement learning IEEE Transactions on Neural Networks 18 4 2007 973 992
- (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 973-992
- Xu, X.¹ Hu, D.W.² Lu, X.C.³

144
- 77649270156
- Sequential anomaly detection based on temporal-difference learning: Principles, models and case studies
- X. Xu Sequential anomaly detection based on temporal-difference learning: principles, models and case studies Applied Soft Computing 10 3 2010 859 867
- (2010) Applied Soft Computing , vol.10 , Issue.3 , pp. 859-867
- Xu, X.¹

145
- 83855164075
- Hierarchical approximate policy iteration with binary-tree state space decomposition
- X. Xu, C. Liu, S. Yang, and D. Hu Hierarchical approximate policy iteration with binary-tree state space decomposition IEEE Transactions on Neural Networks 22 12 2011 1863 1877
- (2011) IEEE Transactions on Neural Networks , vol.22 , Issue.12 , pp. 1863-1877
- Xu, X.¹ Liu, C.² Yang, S.³ Hu, D.⁴

146
- 79956192776
- Continuous-action reinforcement learning with fast policy search and adaptive basis function selection
- X. Xu, C. Liu, and D. Hu Continuous-action reinforcement learning with fast policy search and adaptive basis function selection Soft Computing - A Fusion of Foundations, Methodologies and Applications 15 6 2011 1055 1070
- (2011) Soft Computing - A Fusion of Foundations, Methodologies and Applications , vol.15 , Issue.6 , pp. 1055-1070
- Xu, X.¹ Liu, C.² Hu, D.³

147
- 84884922436
- Online learning control using adaptive critic designs with sparse kernel machines
- X. Xu, Z. Hou, C. Lian, and H. He Online learning control using adaptive critic designs with sparse kernel machines IEEE Transactions on Neural Networks and Learning Systems 24 5 2013 762 775
- (2013) IEEE Transactions on Neural Networks and Learning Systems , vol.24 , Issue.5 , pp. 762-775
- Xu, X.¹ Hou, Z.² Lian, C.³ He, H.⁴

148
- 77954930789
- Applications of reinforcement learning to cognitive radio networks
- K.-L.A. Yau, P. Komisarczuk, et al., Applications of reinforcement learning to cognitive radio networks, 2010 IEEE International Conference on Communication Workshops (ICC), 2010, pp. 1-6.
- (2010) 2010 IEEE International Conference on Communication Workshops (ICC) , pp. 1-6
- Yau, K.-L.A.¹ Komisarczuk, P.²

149
- 27644480261
- Integrating relevance feedback techniques for image retrieval using reinforcement learning
- P. Yin, and B. Bhanu et al. Integrating relevance feedback techniques for image retrieval using reinforcement learning IEEE Transactions on Pattern Analysis and Machine Intelligence 27 10 2005 1536 1551
- (2005) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.27 , Issue.10 , pp. 1536-1551
- Yin, P.¹ Bhanu, B.²

150
- 79960889687
- Stochastic optimal relaxed automatic generation control in non-Markov environment based on multi-step Q(λ) learning
- T. Yu, and B. Zhou et al. Stochastic optimal relaxed automatic generation control in non-Markov environment based on multi-step Q(λ) learning IEEE Transactions on Power Systems 26 3 2011 1272 1282
- (2011) IEEE Transactions on Power Systems , vol.26 , Issue.3 , pp. 1272-1282
- Yu, T.¹ Zhou, B.²

151
- 83655163786
- Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method
- H. Zhang, L. Cui, X. Zhang, and Y. Luo Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method IEEE Transactions on Neural Networks 22 12 2011 2226 2236
- (2011) IEEE Transactions on Neural Networks , vol.22 , Issue.12 , pp. 2226-2236
- Zhang, H.¹ Cui, L.² Zhang, X.³ Luo, Y.⁴

152
- 84918834208
- A reinforcement learning approach to job-shop scheduling
- W. Zhang, T.G. Dietterich. A reinforcement learning approach to job-shop scheduling, in: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI 1995), 1995, pp. 1114-1120.
- (1995) Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI 1995) , pp. 1114-1120
- Zhang, W.¹ Dietterich, T.G.²

153
- 84855426943
- Reinforcement learning for repeated power control game in cognitive radio networks
- P. Zhou, and Y. Chang et al. Reinforcement learning for repeated power control game in cognitive radio networks IEEE Journal on Selected Areas in Communications 30 1 2012 54 69
- (2012) IEEE Journal on Selected Areas in Communications , vol.30 , Issue.1 , pp. 54-69
- Zhou, P.¹ Chang, Y.²

154
- 0036696369
- Robot learning with GA-based fuzzy reinforcement learning agents
- C. Zhou Robot learning with GA-based fuzzy reinforcement learning agents Information Sciences 145 2002 45 68
- (2002) Information Sciences , vol.145 , pp. 45-68
- Zhou, C.¹

155
- 0242445843
- Relational reinforcement Learning
- Morgan Kaufmann
- S. Dzeroski, L. De Raedt, and H. Blockeel Relational reinforcement Learning Proceedings of the 15th International Conference on Machine Learning 1998 Morgan Kaufmann 136 143
- (1998) Proceedings of the 15th International Conference on Machine Learning , pp. 136-143
- Dzeroski, S.¹ De Raedt, L.² Blockeel, H.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.