-
1
-
-
0037616356
-
Reinforcement learning for true adaptive traffic signal control
-
B. Abdulhai, and R. Pringle et al. Reinforcement learning for true adaptive traffic signal control Journal of Transportation Engineering 129 3 2003 278 285
-
(2003)
Journal of Transportation Engineering
, vol.129
, Issue.3
, pp. 278-285
-
-
Abdulhai, B.1
Pringle, R.2
-
2
-
-
33846781129
-
Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
-
A. Al-Tamimi, F.L. Lewis, and M. Abu-Khalaf Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control Automatica 43 2007 473 481
-
(2007)
Automatica
, vol.43
, pp. 473-481
-
-
Al-Tamimi, A.1
Lewis, F.L.2
Abu-Khalaf, M.3
-
4
-
-
0000396062
-
Natural gradient works efficiently in learning
-
S. Amari Natural gradient works efficiently in learning Neural Computation 10 2 1998 251 276
-
(1998)
Neural Computation
, vol.10
, Issue.2
, pp. 251-276
-
-
Amari, S.1
-
5
-
-
70449644892
-
Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems
-
St. Louis, MO, USA, June 10-12
-
A. Antos, R. Munos, C. Szepesvari, Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems, in: 2009 American Control Conference, Hyatt Regency Riverfront, St. Louis, MO, USA, June 10-12, pp. 725-730.
-
2009 American Control Conference, Hyatt Regency Riverfront
, pp. 725-730
-
-
Antos, A.1
Munos, R.2
Szepesvari, C.3
-
7
-
-
77953015097
-
Reinforcement learning-based multi-agent system for network traffic signal control
-
I. Arel, and C. Liu et al. Reinforcement learning-based multi-agent system for network traffic signal control IET Intelligent Transport Systems 4 2 2010 128 135
-
(2010)
IET Intelligent Transport Systems
, vol.4
, Issue.2
, pp. 128-135
-
-
Arel, I.1
Liu, C.2
-
9
-
-
0034859944
-
Autonomous helicopter control using reinforcement learning policy search methods
-
Seoul, Korea
-
J.A. Bagnell, J.G. Schneider, Autonomous helicopter control using reinforcement learning policy search methods, in: Proceedings of the 2001 IEEE International Conference on Robotics & Automation, Seoul, Korea, 2001, pp. 1615-1620.
-
(2001)
Proceedings of the 2001 IEEE International Conference on Robotics & Automation
, pp. 1615-1620
-
-
Bagnell, J.A.1
Schneider, J.G.2
-
10
-
-
84858765598
-
Covariant policy search
-
G. Gottlob, T. Walsh, Morgan Kaufmann San Francisco, CA, USA
-
J.A. Bagnell, and J.G. Schneider Covariant policy search G. Gottlob, T. Walsh, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03) 2003 Morgan Kaufmann San Francisco, CA, USA 1019 1024
-
(2003)
Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03)
, pp. 1019-1024
-
-
Bagnell, J.A.1
Schneider, J.G.2
-
12
-
-
77956287357
-
Urban traffic signal control using reinforcement learning agents
-
P.G. Balaji, and X. German et al. Urban traffic signal control using reinforcement learning agents IET Intelligent Transport Systems 4 3 2010 177 188
-
(2010)
IET Intelligent Transport Systems
, vol.4
, Issue.3
, pp. 177-188
-
-
Balaji, P.G.1
German, X.2
-
13
-
-
0030196717
-
Adaptive-critic-based neural networks for aircraft optimal control
-
S.N. Balakrishnan, and V. Biega Adaptive-critic-based neural networks for aircraft optimal control Journal of Guidance, Control, Dynamics 19 4 1996 893 898
-
(1996)
Journal of Guidance, Control, Dynamics
, vol.19
, Issue.4
, pp. 893-898
-
-
Balakrishnan, S.N.1
Biega, V.2
-
14
-
-
84986214645
-
Reinforcement learning and its relationship to supervised learning
-
J. Si, A. Barto, W. Powell, D. Wunsch, Wiley-IEEE Press New York
-
A.G. Barto, and T.G. Dietterich Reinforcement learning and its relationship to supervised learning J. Si, A. Barto, W. Powell, D. Wunsch, Handbook of Learning and Approximate Dynamic Programming 2004 Wiley-IEEE Press New York
-
(2004)
Handbook of Learning and Approximate Dynamic Programming
-
-
Barto, A.G.1
Dietterich, T.G.2
-
16
-
-
0020970738
-
Neuron-like adaptive elements that can solve difficult learning control problems
-
A.G. Barto, R. Sutton, and C.W. Anderson Neuron-like adaptive elements that can solve difficult learning control problems IEEE Transactions on Systems, Man, and Cybernetics 13 5 1983 834 846
-
(1983)
IEEE Transactions on Systems, Man, and Cybernetics
, vol.13
, Issue.5
, pp. 834-846
-
-
Barto, A.G.1
Sutton, R.2
Anderson, C.W.3
-
19
-
-
70349984547
-
Natural actor-critic algorithms
-
S. Bhatnagar, R.S. Sutton, M. Ghavamzadeh, and M. Lee Natural actor-critic algorithms Automatica 45 11 2009 2471 2482
-
(2009)
Automatica
, vol.45
, Issue.11
, pp. 2471-2482
-
-
Bhatnagar, S.1
Sutton, R.S.2
Ghavamzadeh, M.3
Lee, M.4
-
20
-
-
0031076413
-
Stochastic approximation with two time scales
-
V.S. Borkar Stochastic approximation with two time scales Systems & Control Letters 29 5 1997 291 294
-
(1997)
Systems & Control Letters
, vol.29
, Issue.5
, pp. 291-294
-
-
Borkar, V.S.1
-
22
-
-
85153940465
-
Generalization in reinforcement learning: Safely approximating the value function
-
J. Boyan, A.W. Moore, Generalization in reinforcement learning: safely approximating the value function, in: Advances in Neural Information Processing Systems, 1995, pp. 369-376.
-
(1995)
Advances in Neural Information Processing Systems
, pp. 369-376
-
-
Boyan, J.1
Moore, A.W.2
-
23
-
-
0036832950
-
Technical update: Least-squares temporal difference learning
-
J. Boyan Technical update: least-squares temporal difference learning Machine Learning 49 2-3 2002 233 246
-
(2002)
Machine Learning
, vol.49
, Issue.23
, pp. 233-246
-
-
Boyan, J.1
-
24
-
-
0000719863
-
Packet routing in dynamically changing networks: A reinforcement learning approach
-
NIPS 1994
-
J. Boyan, and M. Littman Packet routing in dynamically changing networks: a reinforcement learning approach Advances in neural information processing systems 6 NIPS 1994 1994
-
(1994)
Advances in Neural Information Processing Systems
, vol.6
-
-
Boyan, J.1
Littman, M.2
-
25
-
-
0345062525
-
-
Univ. Massachusetts, Amherst, MA, Tech. Rep. CMPSCI-94-49, June
-
S. Bradtke, B. Ydstie, A. Barto, Adaptive linear quadratic control using policy iteration, Univ. Massachusetts, Amherst, MA, Tech. Rep. CMPSCI-94-49, June 1994.
-
(1994)
Adaptive Linear Quadratic Control Using Policy Iteration
-
-
Bradtke, S.1
Ydstie, B.2
Barto, A.3
-
26
-
-
6344250104
-
-
Ph.D. thesis, University of Massachusetts, Computer Science Dept. Tech. Rep.
-
S. Bradtke, Incremental Dynamic Programming for On-Line Adaptive Optimal Control, Ph.D. thesis, University of Massachusetts, Computer Science Dept. Tech. Rep., 1994, pp. 94-62.
-
(1994)
Incremental Dynamic Programming for On-Line Adaptive Optimal Control
, pp. 94-62
-
-
Bradtke, S.1
-
27
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
S.J. Brartke, and A. Barto Linear least-squares algorithms for temporal difference learning Machine Learning 22 1996 33 57
-
(1996)
Machine Learning
, vol.22
, pp. 33-57
-
-
Brartke, S.J.1
Barto, A.2
-
30
-
-
27844448118
-
A behavior-based scheme using reinforcement learning for autonomous underwater vehicles
-
M. Carreras, and J. Yuh et al. A behavior-based scheme using reinforcement learning for autonomous underwater vehicles IEEE Journal of Oceanic Engineering 30 2 2005 416 427
-
(2005)
IEEE Journal of Oceanic Engineering
, vol.30
, Issue.2
, pp. 416-427
-
-
Carreras, M.1
Yuh, J.2
-
31
-
-
0032208335
-
Elevator group control using multiple reinforcement learning agents
-
R.H. Crites, and A.G. Barto Elevator group control using multiple reinforcement learning agents Machine Learning 33 2-3 1998 235 262
-
(1998)
Machine Learning
, vol.33
, Issue.23
, pp. 235-262
-
-
Crites, R.H.1
Barto, A.G.2
-
33
-
-
0000595242
-
Note on learning rate schedules for stochastic optimization
-
Lippman, et al. (Eds.)
-
C. Darken, J. Moody, Note on learning rate schedules for stochastic optimization, in: Lippman, et al. (Eds.), Advances in Neural Information Processing Systems, vol. 3, 1991, pp. 1009-1016.
-
(1991)
Advances in Neural Information Processing Systems
, vol.3
, pp. 1009-1016
-
-
Darken, C.1
Moody, J.2
-
34
-
-
0000430514
-
The convergence of TD(λ) for general λ
-
P. Dayan The convergence of TD(λ) for general λ Machine Learning 8 1992 341 362
-
(1992)
Machine Learning
, vol.8
, pp. 341-362
-
-
Dayan, P.1
-
35
-
-
0028388685
-
TD(λ) converges with probability 1
-
P. Dayan, and T.J. Sejnowski TD(λ) converges with probability 1 Machine Learning 14 1994 295 301
-
(1994)
Machine Learning
, vol.14
, pp. 295-301
-
-
Dayan, P.1
Sejnowski, T.J.2
-
36
-
-
84899029004
-
Batch value function approximation via support vectors
-
MIT Press Cambridge, MA
-
T.G. Dietterich, and X. Wang Batch value function approximation via support vectors Advances in Neural Information Processing Systems vol. 14 2002 MIT Press Cambridge, MA 1491 1498
-
(2002)
Advances in Neural Information Processing Systems
, vol.14
, pp. 1491-1498
-
-
Dietterich, T.G.1
Wang, X.2
-
37
-
-
0002278788
-
Hierarchical reinforcement learning with the Max-Q value function decomposition
-
T.G. Dietterich Hierarchical reinforcement learning with the Max-Q value function decomposition Journal of Artificial Intelligence Research 13 2000 227 303
-
(2000)
Journal of Artificial Intelligence Research
, vol.13
, pp. 227-303
-
-
Dietterich, T.G.1
-
38
-
-
0003506152
-
State abstraction in MAXQ hierarchical reinforcement learning
-
S.A. Solla, T.K. Leen, K.R. Muller (Eds.) NIPS
-
T.G. Dietterich, State abstraction in MAXQ hierarchical reinforcement learning, in: S.A. Solla, T.K. Leen, K.R. Muller (Eds.), Advances in Neural Information Processing Systems, NIPS, 2000, pp. 994-1000.
-
(2000)
Advances in Neural Information Processing Systems
, pp. 994-1000
-
-
Dietterich, T.G.1
-
39
-
-
4444312102
-
Integrating guidance into relational reinforcement learning
-
K. Driessens, and S. Dzeroski Integrating guidance into relational reinforcement learning Machine Learning 57 2004 271 304
-
(2004)
Machine Learning
, vol.57
, pp. 271-304
-
-
Driessens, K.1
Dzeroski, S.2
-
40
-
-
84948172455
-
Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner
-
L. De Raedt, P. Flach, Lecture Notes in Artificial Intelligence Springer-Verlag
-
K. Driessens, J. Ramon, and H. Blockeel Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner L. De Raedt, P. Flach, Proceedings of the 13th European Conference on Machine Learning Lecture Notes in Artificial Intelligence vol. 2167 2001 Springer-Verlag 97 108
-
(2001)
Proceedings of the 13th European Conference on Machine Learning
, vol.2167
, pp. 97-108
-
-
Driessens, K.1
Ramon, J.2
Blockeel, H.3
-
42
-
-
1942421151
-
"Bayes meets bellman: The Gaussian Process approach to temporal difference learning
-
Washington, DC
-
Y. Engel, S. Mannor, R. Meir, "Bayes meets bellman: the Gaussian Process approach to temporal difference learning, in: Proceedings of the Twentieth International Conference of Machine Learning, Washington, DC, 2003, pp. 154-161.
-
(2003)
Proceedings of the Twentieth International Conference of Machine Learning
, pp. 154-161
-
-
Engel, Y.1
Mannor, S.2
Meir, R.3
-
43
-
-
0043026775
-
Helicopter trimming and tracking control using direct neural dynamic programming
-
R. Enns, and J. Si Helicopter trimming and tracking control using direct neural dynamic programming IEEE Transactions on Neural Networks 14 4 2003 929 939
-
(2003)
IEEE Transactions on Neural Networks
, vol.14
, Issue.4
, pp. 929-939
-
-
Enns, R.1
Si, J.2
-
45
-
-
1442265466
-
Power systems stability control: Reinforcement learning framework
-
D. Ernst, and M. Glavic et al. Power systems stability control: reinforcement learning framework IEEE Transactions on Power Systems 19 1 2004 427 435
-
(2004)
IEEE Transactions on Power Systems
, vol.19
, Issue.1
, pp. 427-435
-
-
Ernst, D.1
Glavic, M.2
-
46
-
-
83155175393
-
Model selection in reinforcement learning
-
A. Farahmand, and Cs. Szepesvári Model selection in reinforcement learning Machine Learning 85 3 2011 299 332
-
(2011)
Machine Learning
, vol.85
, Issue.3
, pp. 299-332
-
-
Farahmand, A.1
Szepesvári, Cs.2
-
48
-
-
77952245702
-
Distributed Q-Learning for aggregated interference control in cognitive radio networks
-
A. Galindo-Serrano, and L. Giupponi Distributed Q-Learning for aggregated interference control in cognitive radio networks IEEE Transactions on Vehicular Technology 59 4 2010 1823 1834
-
(2010)
IEEE Transactions on Vehicular Technology
, vol.59
, Issue.4
, pp. 1823-1834
-
-
Galindo-Serrano, A.1
Giupponi, L.2
-
49
-
-
33748273074
-
Graph kernels and Gaussian Processes for relational reinforcement learning
-
K. Driessens, J. Ramon, and T. Gärtner Graph kernels and Gaussian Processes for relational reinforcement learning Machine Learning 64 1-3 2006 91 119
-
(2006)
Machine Learning
, vol.64
, Issue.13
, pp. 91-119
-
-
Driessens, K.1
Ramon, J.2
Gärtner, T.3
-
51
-
-
0742319170
-
Reinforcement learning for long-run average cost
-
A. Gosavi Reinforcement learning for long-run average cost European Journal of Operational Research 155 2004 654 674
-
(2004)
European Journal of Operational Research
, vol.155
, pp. 654-674
-
-
Gosavi, A.1
-
52
-
-
33748998787
-
Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
-
A.P. George, and W.B. Powell Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming Machine Learning 65 2006 167 198
-
(2006)
Machine Learning
, vol.65
, pp. 167-198
-
-
George, A.P.1
Powell, W.B.2
-
53
-
-
56449115872
-
ILSTD: Eligibility traces and convergence analysis
-
B. Scḧolkopf, J. Platt, T. Hoffman, MIT Press Cambridge, MA
-
A. Geramifard, M. Bowling, M. Zinkevich, and R.S. Sutton iLSTD: eligibility traces and convergence analysis B. Scḧolkopf, J. Platt, T. Hoffman, Advances in Neural Information Processing Systems vol. 19 2007 MIT Press Cambridge, MA 441 448
-
(2007)
Advances in Neural Information Processing Systems
, vol.19
, pp. 441-448
-
-
Geramifard, A.1
Bowling, M.2
Zinkevich, M.3
Sutton, R.S.4
-
56
-
-
0004019973
-
Convolution Kernels on Discrete Structures
-
University of California at Santa Cruz
-
D. Haussler, Convolution Kernels on Discrete Structures, Technical Report, Department of Computer Science, University of California at Santa Cruz, 1999.
-
(1999)
Technical Report, Department of Computer Science
-
-
Haussler, D.1
-
57
-
-
38349050495
-
Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning
-
B. Hengst, Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning, in: AI 2007: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 4830, 2007, pp. 58-67.
-
(2007)
AI 2007: Advances in Artificial Intelligence, Lecture Notes in Computer Science
, vol.4830
, pp. 58-67
-
-
Hengst, B.1
-
59
-
-
79957981549
-
Transformation invariant on-line target recognition
-
K.M. Iftekharuddin Transformation invariant on-line target recognition IEEE Transactions on Neural Networks 22 6 2011 906 918
-
(2011)
IEEE Transactions on Neural Networks
, vol.22
, Issue.6
, pp. 906-918
-
-
Iftekharuddin, K.M.1
-
60
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
T. Jaakkola, M. Jordan, and S. Singh On the convergence of stochastic iterative dynamic programming algorithms Neural Computation 6 6 1994 185 1201
-
(1994)
Neural Computation
, vol.6
, Issue.6
, pp. 185-1201
-
-
Jaakkola, T.1
Jordan, M.2
Singh, S.3
-
62
-
-
80053224177
-
Efficient exploration in reinforcement learning-based cognitive radio spectrum sharing
-
T. Jiang, and D. Grace et al. Efficient exploration in reinforcement learning-based cognitive radio spectrum sharing IET Communication 5 10 2011 1309 1317
-
(2011)
IET Communication
, vol.5
, Issue.10
, pp. 1309-1317
-
-
Jiang, T.1
Grace, D.2
-
63
-
-
68949099445
-
Hybrid least-squares algorithms for approximate policy evaluation
-
J. Johns, M. Petrik, and S. Mahadevan Hybrid least-squares algorithms for approximate policy evaluation Machine Learning 76 2009 243 256
-
(2009)
Machine Learning
, vol.76
, pp. 243-256
-
-
Johns, J.1
Petrik, M.2
Mahadevan, S.3
-
66
-
-
9444275934
-
Machine learning for fast quadrupedal locomotion
-
D.L. McGuinness, G. Ferguson (Eds.) AAAI Press, Menlo Park
-
N. Kohl, P. Stone, Machine learning for fast quadrupedal locomotion, in: D.L. McGuinness, G. Ferguson (Eds.), Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI 2004), AAAI Press, Menlo Park, pp. 611-616.
-
Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI 2004)
, pp. 611-616
-
-
Kohl, N.1
Stone, P.2
-
69
-
-
70349092184
-
Special issue on approximate dynamic programming and reinforcement learning for feedback control
-
F.L. Lewis, G. Lendaris, and D. Liu Special issue on approximate dynamic programming and reinforcement learning for feedback control IEEE Transactions on Systems, Man, and Cybernetics B 38 4 2008
-
(2008)
IEEE Transactions on Systems, Man, and Cybernetics B
, vol.38
, Issue.4
-
-
Lewis, F.L.1
Lendaris, G.2
Liu, D.3
-
70
-
-
70349116541
-
Reinforcement learning and adaptive dynamic programming for feedback control
-
F.L. Lewis, and D. Vrabie Reinforcement learning and adaptive dynamic programming for feedback control IEEE Circuits and Systems Magazine 9 3 2009 32 50
-
(2009)
IEEE Circuits and Systems Magazine
, vol.9
, Issue.3
, pp. 32-50
-
-
Lewis, F.L.1
Vrabie, D.2
-
71
-
-
26844483839
-
A self-learning call admission control scheme for CDMA cellular networks
-
D. Liu, Y. Zhang, and H. Zhang A self-learning call admission control scheme for CDMA cellular networks IEEE Transactions on Neural Networks 16 5 2005 1219 1228
-
(2005)
IEEE Transactions on Neural Networks
, vol.16
, Issue.5
, pp. 1219-1228
-
-
Liu, D.1
Zhang, Y.2
Zhang, H.3
-
72
-
-
84979245843
-
Convergent temporal-difference learning with arbitrary smooth function approximation
-
J. Laferty, C. Williams, MIT Press Cambridge, MA, USA
-
H.R. Maei, C. Szepesvári, S. Bhatnagar, D. Precup, and R.S. Sutton Convergent temporal-difference learning with arbitrary smooth function approximation J. Laferty, C. Williams, Advances in Neural Information Processing Systems vol. 22 2010 MIT Press Cambridge, MA, USA
-
(2010)
Advances in Neural Information Processing Systems
, vol.22
-
-
Maei, H.R.1
Szepesvári, C.2
Bhatnagar, S.3
Precup, D.4
Sutton, R.S.5
-
73
-
-
77956541799
-
Toward off-policy learning control with function approximation
-
J. Furnkranz, T. Joachims, Omnipress
-
H.R. Maei, C. Szepesvári, S. Bhatnagar, and R. Sutton Toward off-policy learning control with function approximation J. Furnkranz, T. Joachims, ICML 2010 Omnipress 719 726
-
(2010)
ICML
, pp. 719-726
-
-
Maei, H.R.1
Szepesvári, C.2
Bhatnagar, S.3
Sutton, R.4
-
75
-
-
35748957806
-
Proto-value functions: A laplacian framework for learning representation and control in markov decision processes
-
S. Mahadevan, and M. Maggioni Proto-value functions: a laplacian framework for learning representation and control in markov decision processes Journal of Machine Learning Research 8 2007 2169 2231
-
(2007)
Journal of Machine Learning Research
, vol.8
, pp. 2169-2231
-
-
Mahadevan, S.1
Maggioni, M.2
-
76
-
-
84867615954
-
Tuning-free step-size adaptation
-
Kyoto, Japan
-
A.R. Mahmood, R. Sutton, T.Degris, P.M. Pilarski, Tuning-free step-size adaptation, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012.
-
(2012)
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
-
-
Mahmood, A.R.1
Sutton, R.2
Degris, T.3
Pilarski, P.M.4
-
79
-
-
0033171330
-
Cognitive radio: Making software radios more personal
-
J. Mitola III, and G.Q. Maguire Cognitive radio: making software radios more personal IEEE Personal Communications 6 4 1999 13 18
-
(1999)
IEEE Personal Communications
, vol.6
, Issue.4
, pp. 13-18
-
-
Mitola III, J.1
Maguire, G.Q.2
-
80
-
-
33847661590
-
Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system
-
S. Mohagheghi, and G.K. Venayagamoorthy et al. Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system IEEE Transactions on Power Systems 21 4 2006 1744 1754
-
(2006)
IEEE Transactions on Power Systems
, vol.21
, Issue.4
, pp. 1744-1754
-
-
Mohagheghi, S.1
Venayagamoorthy, G.K.2
-
81
-
-
33947313024
-
A reinforcement learning model to assess market power under auction-based energy pricingm
-
V. Nanduri, and T.K. Das A reinforcement learning model to assess market power under auction-based energy pricingm IEEE Transactions on Power Systems 22 1 2007 85 95
-
(2007)
IEEE Transactions on Power Systems
, vol.22
, Issue.1
, pp. 85-95
-
-
Nanduri, V.1
Das, T.K.2
-
82
-
-
0037288398
-
Least squares policy evaluation algorithms with linear function approximation
-
A. Nedic, and D.P. Bertsekas Least squares policy evaluation algorithms with linear function approximation Discrete Event Dynamic Systems 13 1 2003 79 110
-
(2003)
Discrete Event Dynamic Systems
, vol.13
, Issue.1
, pp. 79-110
-
-
Nedic, A.1
Bertsekas, D.P.2
-
84
-
-
33747452437
-
Adaptive stock trading with dynamic asset allocation using reinforcement learning
-
J.O.J. Lee, J.W. Lee, and B.-T. Zhang Adaptive stock trading with dynamic asset allocation using reinforcement learning Information Sciences 176 15 2006 2121 2147
-
(2006)
Information Sciences
, vol.176
, Issue.15
, pp. 2121-2147
-
-
Lee, J.O.J.1
Lee, J.W.2
Zhang, B.-T.3
-
85
-
-
0036832956
-
Kernel-based reinforcement learning
-
D. Ormoneit, and S. Sen Kernel-based reinforcement learning Machine Learning 49 2-3 2002 161 178
-
(2002)
Machine Learning
, vol.49
, Issue.23
, pp. 161-178
-
-
Ormoneit, D.1
Sen, S.2
-
87
-
-
0032136860
-
Delayed reinforcement learning for adaptive image segmentation and feature extraction
-
J. Peng, and B. Bhanu Delayed reinforcement learning for adaptive image segmentation and feature extraction IEEE Transactions on System Man and Cybernetics-Part C 28 3 1998 482 488
-
(1998)
IEEE Transactions on System Man and Cybernetics-Part C
, vol.28
, Issue.3
, pp. 482-488
-
-
Peng, J.1
Bhanu, B.2
-
89
-
-
40649106649
-
Natural actor-critic
-
J. Peters, and S. Schaal Natural actor-critic Neurocomputing 71 2008 1180 1190
-
(2008)
Neurocomputing
, vol.71
, pp. 1180-1190
-
-
Peters, J.1
Schaal, S.2
-
90
-
-
34250635407
-
Policy gradient methods for robotics
-
Beijing, China
-
J. Peters, S. Schaal, Policy gradient methods for robotics, in: Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots ans Systems, Beijing, China, 2006, pp. 2219-2225.
-
(2006)
Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots Ans Systems
, pp. 2219-2225
-
-
Peters, J.1
Schaal, S.2
-
95
-
-
84899026055
-
Gaussian processes in reinforcement learning
-
S. Thrun, L.K. Saul, B. Schölkopf, MIT Press
-
C.E. Rasmussen, and M. Kuss Gaussian processes in reinforcement learning S. Thrun, L.K. Saul, B. Schölkopf, Advances in Neural Information Processing Systems vol. 16 2004 MIT Press 751 759
-
(2004)
Advances in Neural Information Processing Systems
, vol.16
, pp. 751-759
-
-
Rasmussen, C.E.1
Kuss, M.2
-
96
-
-
67650996818
-
Reinforcement learning for robot soccer
-
M. Riedmiller, and T. Gabel et al. Reinforcement learning for robot soccer Autonomous Robots 27 1 2009 55 74
-
(2009)
Autonomous Robots
, vol.27
, Issue.1
, pp. 55-74
-
-
Riedmiller, M.1
Gabel, T.2
-
97
-
-
79958857418
-
Learning to drive in 20 min
-
Jeju, Korea
-
M. Riedmiller, M. Montemerlo, et al., Learning to drive in 20 min, in: Proceedings of the FBIT 2007 Conference, Jeju, Korea, 2007.
-
(2007)
Proceedings of the FBIT 2007 Conference
-
-
Riedmiller, M.1
Montemerlo, M.2
-
98
-
-
84876850401
-
Natural actor-critic for road traffic optimization
-
S. Richter, D. Aberdeen, J. Yu, Natural actor-critic for road traffic optimisation, in: Advances in Neural Information Processing Systems, 2006, pp. 3522-3529.
-
(2006)
Advances in Neural Information Processing Systems
, pp. 3522-3529
-
-
Richter, S.1
Aberdeen, D.2
Yu, J.3
-
99
-
-
0001201756
-
Some studies in machine learning using game of checkers
-
A.L. Samuel Some studies in machine learning using game of checkers IBM Jounal on Research and Development 3 1959 211 229
-
(1959)
IBM Jounal on Research and Development
, vol.3
, pp. 211-229
-
-
Samuel, A.L.1
-
101
-
-
0032594954
-
Input space vs feature space in kernel-based algorithms
-
B. Schölkopf, S. Mika, C.J.C. Burges, P. Knirsch, K.-R. Müller, G. Rätsch, and A.J. Smola Input space vs feature space in kernel-based algorithms IEEE Transactions on Neural Networks 10 3 1999 1000 1017
-
(1999)
IEEE Transactions on Neural Networks
, vol.10
, Issue.3
, pp. 1000-1017
-
-
Schölkopf, B.1
Mika, S.2
Burges, C.J.C.3
Knirsch, P.4
Müller, K.-R.5
Rätsch, G.6
Smola, A.J.7
-
103
-
-
69249229732
-
Predicting investment behavior: An augmented reinforcement learning model
-
T. Shimokawa, and K. Suzuki et al. Predicting investment behavior: an augmented reinforcement learning model Neurocomputing 72 2009 3447 3461
-
(2009)
Neurocomputing
, vol.72
, pp. 3447-3461
-
-
Shimokawa, T.1
Suzuki, K.2
-
105
-
-
84898972974
-
Reinforcement learning for dynamic channel allocation in cellular telephone systems
-
NIPS 1996
-
S.P. Singh, and D. Bertsekas Reinforcement learning for dynamic channel allocation in cellular telephone systems Advances in Neural Information Processsing Systems 9 NIPS 1996 1997 974 980
-
(1997)
Advances in Neural Information Processsing Systems
, vol.9
, pp. 974-980
-
-
Singh, S.P.1
Bertsekas, D.2
-
106
-
-
0033901602
-
Convergence results for single-step on-policy reinforcement-learning algorithms
-
S.P. Singh, T. Jaakkola, M.L. Littman, and Cs. Szepesvari Convergence results for single-step on-policy reinforcement-learning algorithms Machine Learning 38 2000 287 308
-
(2000)
Machine Learning
, vol.38
, pp. 287-308
-
-
Singh, S.P.1
Jaakkola, T.2
Littman, M.L.3
Szepesvari, Cs.4
-
107
-
-
0028497385
-
An upper bound on the loss from approximate optimal value functions
-
S.P. Singh, and R.C. Yee An upper bound on the loss from approximate optimal value functions Machine Learning 16 3 1994 227 233
-
(1994)
Machine Learning
, vol.16
, Issue.3
, pp. 227-233
-
-
Singh, S.P.1
Yee, R.C.2
-
109
-
-
27544506565
-
Reinforcement learning for RoboCup-soccer keepaway
-
P. Stone, and R.S. Sutton et al. Reinforcement learning for RoboCup-soccer keepaway Adaptive Behavior 13 3 2005 165 188
-
(2005)
Adaptive Behavior
, vol.13
, Issue.3
, pp. 165-188
-
-
Stone, P.1
Sutton, R.S.2
-
113
-
-
33847202724
-
Learning to predict by the method of temporal differences
-
R. Sutton Learning to predict by the method of temporal differences Machine Learning 3 1988 9 44
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.1
-
114
-
-
71149099079
-
Fast gradient-descent methods for temporal-difference learning with linear function approximation
-
R. Sutton, H.R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvari, E. Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation, in: Proceedings of the 26th Annual International Conference on Machine Learning (ICML-09), 2009, pp. 993-1000.
-
(2009)
Proceedings of the 26th Annual International Conference on Machine Learning (ICML-09)
, pp. 993-1000
-
-
Sutton, R.1
Maei, H.R.2
Precup, D.3
Bhatnagar, S.4
Silver, D.5
Szepesvari, C.6
Wiewiora, E.7
-
115
-
-
77956513316
-
A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation
-
MIT Press Cambridge, MA, USA
-
R. Sutton, C. Szepesvari, and H.R. Maei A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation Advances in Neural Information Processing Systems vol. 21 2009 MIT Press Cambridge, MA, USA 1609 1616
-
(2009)
Advances in Neural Information Processing Systems
, vol.21
, pp. 1609-1616
-
-
Sutton, R.1
Szepesvari, C.2
Maei, H.R.3
-
117
-
-
0033170372
-
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
-
R. Sutton, D. Precup, and S. Singh Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning Artificial Intelligence 112 1999 181 211
-
(1999)
Artificial Intelligence
, vol.112
, pp. 181-211
-
-
Sutton, R.1
Precup, D.2
Singh, S.3
-
120
-
-
0000985504
-
TD-Gammon, a self-teaching backgammon program, achieves master-level play
-
G. Tesauro TD-Gammon, a self-teaching backgammon program, achieves master-level play Neural Computation 6 1994 215 219
-
(1994)
Neural Computation
, vol.6
, pp. 215-219
-
-
Tesauro, G.1
-
122
-
-
27144547178
-
Asynchronous Stochastic Approximation and Q-learning
-
MIT, Cambridge, MA
-
J.N. Tsitsiklis, Asynchronous Stochastic Approximation and Q-learning, Technical Report LIDS-P-2172, Laboratory for Information and Decision Systems, MIT, Cambridge, MA, 1993.
-
(1993)
Technical Report LIDS-P-2172, Laboratory for Information and Decision Systems
-
-
Tsitsiklis, J.N.1
-
123
-
-
0031143730
-
An analysis of temporal difference learning with function approximation
-
J.N. Tsitsiklis, and B.V. Roy An analysis of temporal difference learning with function approximation IEEE Transactions on Automatic Control 42 5 1997 674 690
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Roy, B.V.2
-
124
-
-
0031636218
-
Tree based discretization for continuous state space reinforcement learning
-
W.T.B. Uther, M.M. Veloso, Tree based discretization for continuous state space reinforcement learning, in: Proceedings of AAAI-98, 1998, pp. 769-774.
-
(1998)
Proceedings of AAAI-98
, pp. 769-774
-
-
Uther, W.T.B.1
Veloso, M.M.2
-
125
-
-
77950630017
-
Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
-
K.G. Vamvoudakis, and Frank L. Lewis Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem Automatica 46 5 2010 878 888
-
(2010)
Automatica
, vol.46
, Issue.5
, pp. 878-888
-
-
Vamvoudakis, K.G.1
Lewis, F.L.2
-
126
-
-
79960897012
-
Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations
-
K.G. Vamvoudakis, and Frank L. Lewis Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations Automatica 47 8 2011 1556 1569
-
(2011)
Automatica
, vol.47
, Issue.8
, pp. 1556-1569
-
-
Vamvoudakis, K.G.1
Lewis, F.L.2
-
128
-
-
0036565019
-
Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator
-
G.K. Venayagamoorthy, R.G. Harley, and D.C. Wunsch Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator IEEE Transactions on Neural Networks 13 3 2002 764 773
-
(2002)
IEEE Transactions on Neural Networks
, vol.13
, Issue.3
, pp. 764-773
-
-
Venayagamoorthy, G.K.1
Harley, R.G.2
Wunsch, D.C.3
-
130
-
-
58349110975
-
Adaptive optimal control for continuous-time linear systems based on policy iteration
-
D. Vrabie, F. Lewis, and M. Abu-Khalaf Adaptive optimal control for continuous-time linear systems based on policy iteration Automatica 45 2 2009 477 484
-
(2009)
Automatica
, vol.45
, Issue.2
, pp. 477-484
-
-
Vrabie, D.1
Lewis, F.2
Abu-Khalaf, M.3
-
131
-
-
67349145396
-
Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
-
D. Vrabie, and F. Lewis Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems Neural Networks 22 3 2009 237 246
-
(2009)
Neural Networks
, vol.22
, Issue.3
, pp. 237-246
-
-
Vrabie, D.1
Lewis, F.2
-
132
-
-
34250731840
-
A fuzzy actor-critic reinforcement learning network
-
X. Wang, Y. Cheng, and J.-Q. Yi A fuzzy actor-critic reinforcement learning network Information Sciences 177 18 2007 3764 3781
-
(2007)
Information Sciences
, vol.177
, Issue.18
, pp. 3764-3781
-
-
Wang, X.1
Cheng, Y.2
Yi, J.-Q.3
-
134
-
-
0004049893
-
-
Ph.D. thesis, Cambridge Univ., Cambridge, England
-
C. Watkins, Learning from Delayed Rewards, Ph.D. thesis, Cambridge Univ., Cambridge, England, 1989.
-
(1989)
Learning from Delayed Rewards
-
-
Watkins, C.1
-
136
-
-
67349247013
-
Intelligence in the brain: A theory of how it works and how to build it
-
P.J. Werbos Intelligence in the brain: a theory of how it works and how to build it Neural Networks 2009 200 212
-
(2009)
Neural Networks
, pp. 200-212
-
-
Werbos, P.J.1
-
139
-
-
0015667648
-
Punish/reward: Learning with a critic in adaptive threshold systems
-
B. Widrow, N. Gupta, and S. Maitra Punish/reward: Learning with a critic in adaptive threshold systems IEEE Transactions on Systems, Man, and Cybernetics SMC-3 5 1973 455 465
-
(1973)
IEEE Transactions on Systems, Man, and Cybernetics
, vol.SMC-3
, Issue.5
, pp. 455-465
-
-
Widrow, B.1
Gupta, N.2
Maitra, S.3
-
141
-
-
0041345290
-
Efficient reinforcement learning using recursive least-squares methods
-
X. Xu, H.G. He, and D.W. Hu Efficient reinforcement learning using recursive least-squares methods Journal of Artificial Intelligence Research 16 2002 259 292
-
(2002)
Journal of Artificial Intelligence Research
, vol.16
, pp. 259-292
-
-
Xu, X.1
He, H.G.2
Hu, D.W.3
-
142
-
-
33750328566
-
Kernel least-squares temporal difference learning
-
X. Xu, T. Xie, D.W. Hu, and X.C. Lu Kernel least-squares temporal difference learning International Journal of Information Technology 11 9 2005 54 63
-
(2005)
International Journal of Information Technology
, vol.11
, Issue.9
, pp. 54-63
-
-
Xu, X.1
Xie, T.2
Hu, D.W.3
Lu, X.C.4
-
143
-
-
34547098844
-
Kernel based least-squares policy iteration for reinforcement learning
-
X. Xu, D.W. Hu, and X.C. Lu Kernel based least-squares policy iteration for reinforcement learning IEEE Transactions on Neural Networks 18 4 2007 973 992
-
(2007)
IEEE Transactions on Neural Networks
, vol.18
, Issue.4
, pp. 973-992
-
-
Xu, X.1
Hu, D.W.2
Lu, X.C.3
-
144
-
-
77649270156
-
Sequential anomaly detection based on temporal-difference learning: Principles, models and case studies
-
X. Xu Sequential anomaly detection based on temporal-difference learning: principles, models and case studies Applied Soft Computing 10 3 2010 859 867
-
(2010)
Applied Soft Computing
, vol.10
, Issue.3
, pp. 859-867
-
-
Xu, X.1
-
145
-
-
83855164075
-
Hierarchical approximate policy iteration with binary-tree state space decomposition
-
X. Xu, C. Liu, S. Yang, and D. Hu Hierarchical approximate policy iteration with binary-tree state space decomposition IEEE Transactions on Neural Networks 22 12 2011 1863 1877
-
(2011)
IEEE Transactions on Neural Networks
, vol.22
, Issue.12
, pp. 1863-1877
-
-
Xu, X.1
Liu, C.2
Yang, S.3
Hu, D.4
-
146
-
-
79956192776
-
Continuous-action reinforcement learning with fast policy search and adaptive basis function selection
-
X. Xu, C. Liu, and D. Hu Continuous-action reinforcement learning with fast policy search and adaptive basis function selection Soft Computing - A Fusion of Foundations, Methodologies and Applications 15 6 2011 1055 1070
-
(2011)
Soft Computing - A Fusion of Foundations, Methodologies and Applications
, vol.15
, Issue.6
, pp. 1055-1070
-
-
Xu, X.1
Liu, C.2
Hu, D.3
-
147
-
-
84884922436
-
Online learning control using adaptive critic designs with sparse kernel machines
-
X. Xu, Z. Hou, C. Lian, and H. He Online learning control using adaptive critic designs with sparse kernel machines IEEE Transactions on Neural Networks and Learning Systems 24 5 2013 762 775
-
(2013)
IEEE Transactions on Neural Networks and Learning Systems
, vol.24
, Issue.5
, pp. 762-775
-
-
Xu, X.1
Hou, Z.2
Lian, C.3
He, H.4
-
149
-
-
27644480261
-
Integrating relevance feedback techniques for image retrieval using reinforcement learning
-
P. Yin, and B. Bhanu et al. Integrating relevance feedback techniques for image retrieval using reinforcement learning IEEE Transactions on Pattern Analysis and Machine Intelligence 27 10 2005 1536 1551
-
(2005)
IEEE Transactions on Pattern Analysis and Machine Intelligence
, vol.27
, Issue.10
, pp. 1536-1551
-
-
Yin, P.1
Bhanu, B.2
-
150
-
-
79960889687
-
Stochastic optimal relaxed automatic generation control in non-Markov environment based on multi-step Q(λ) learning
-
T. Yu, and B. Zhou et al. Stochastic optimal relaxed automatic generation control in non-Markov environment based on multi-step Q(λ) learning IEEE Transactions on Power Systems 26 3 2011 1272 1282
-
(2011)
IEEE Transactions on Power Systems
, vol.26
, Issue.3
, pp. 1272-1282
-
-
Yu, T.1
Zhou, B.2
-
151
-
-
83655163786
-
Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method
-
H. Zhang, L. Cui, X. Zhang, and Y. Luo Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method IEEE Transactions on Neural Networks 22 12 2011 2226 2236
-
(2011)
IEEE Transactions on Neural Networks
, vol.22
, Issue.12
, pp. 2226-2236
-
-
Zhang, H.1
Cui, L.2
Zhang, X.3
Luo, Y.4
-
153
-
-
84855426943
-
Reinforcement learning for repeated power control game in cognitive radio networks
-
P. Zhou, and Y. Chang et al. Reinforcement learning for repeated power control game in cognitive radio networks IEEE Journal on Selected Areas in Communications 30 1 2012 54 69
-
(2012)
IEEE Journal on Selected Areas in Communications
, vol.30
, Issue.1
, pp. 54-69
-
-
Zhou, P.1
Chang, Y.2
-
154
-
-
0036696369
-
Robot learning with GA-based fuzzy reinforcement learning agents
-
C. Zhou Robot learning with GA-based fuzzy reinforcement learning agents Information Sciences 145 2002 45 68
-
(2002)
Information Sciences
, vol.145
, pp. 45-68
-
-
Zhou, C.1
|