-
7
-
-
77955814101
-
-
Taylor & Francis CRC Press
-
L. Buşoniu, R. Babuška, B. De Schutter, and D. Ernst, Reinforcement Learning and Dynamic Programming Using Function Approximators, ser. Automation and Control Engineering. Taylor & Francis CRC Press, 2010.
-
(2010)
Reinforcement Learning and Dynamic Programming Using Function Approximators, Ser. Automation and Control Engineering
-
-
Buşoniu, L.1
Babuška, R.2
De Schutter, B.3
Ernst, D.4
-
8
-
-
44949241322
-
Reinforcement learning of motor skills with policy gradients
-
J. Peters and S. Schaal, "Reinforcement learning of motor skills with policy gradients," Neural Networks, vol. 21, pp. 682-697, 2008.
-
(2008)
Neural Networks
, vol.21
, pp. 682-697
-
-
Peters, J.1
Schaal, S.2
-
9
-
-
84864030941
-
An application of reinforcement learning to aerobatic helicopter flight
-
B. Schölkopf, J. C. Platt, and T. Hoffman, Eds MIT Press
-
P. Abbeel, A. Coates, M. Quigley, and A. Y. Ng, "An application of reinforcement learning to aerobatic helicopter flight," in Advances in Neural Information Processing Systems 19, B. Schölkopf, J. C. Platt, and T. Hoffman, Eds. MIT Press, 2007, pp. 1-8.
-
(2007)
Advances in Neural Information Processing Systems
, vol.19
, pp. 1-8
-
-
Abbeel, P.1
Coates, A.2
Quigley, M.3
Ng, A.Y.4
-
10
-
-
60549097572
-
Coadaptive brain-machine interface via reinforcement learning
-
J. DiGiovanna, B. Mahmoudi, J. Fortes, J. C. Principe, and J. C. Sanchez, "Coadaptive brain-machine interface via reinforcement learning," IEEE Transactions on Biomedical Engineering, vol. 56, no. 1, pp. 54-64, 2009.
-
(2009)
IEEE Transactions on Biomedical Engineering
, vol.56
, Issue.1
, pp. 54-64
-
-
Digiovanna, J.1
Mahmoudi, B.2
Fortes, J.3
Principe, J.C.4
Sanchez, J.C.5
-
11
-
-
39649096058
-
Clinical data based optimal STI strategies for HIV: A reinforcement learning approach
-
4177178, Proceedings of the 45th IEEE Conference on Decision and Control 2006, CDC
-
D. Ernst, G.-B. Stan, J. Gonçalves, and L. Wehenkel, "Clinical data based optimal STI strategies for HIV: A reinforcement learning approach," in Proceedings 45th IEEE Conference on Decision & Control, San Diego, US, 13-15 December 2006, pp. 667-672. (Pubitemid 351283311)
-
(2006)
Proceedings of the IEEE Conference on Decision and Control
, pp. 667-672
-
-
Ernst, D.1
Stan, G.-B.2
Goncalves, J.3
Wehenkel, L.4
-
13
-
-
0029679044
-
Reinforcement learning: A survey
-
L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996. (Pubitemid 126646155)
-
(1996)
Journal of Artificial Intelligence Research
, vol.4
, pp. 237-285
-
-
Kaelbling, L.P.1
Littman, M.L.2
Moore, A.W.3
-
15
-
-
0003636089
-
-
Engineering Department, Cambridge University, UK, Tech. Rep. CUED/F-INFENG/TR166, September
-
G. A. Rummery and M. Niranjan, "On-line Q-learning using connectionist systems," Engineering Department, Cambridge University, UK, Tech. Rep. CUED/F-INFENG/TR166, September 1994, available at http://mi.eng.cam.ac.uk/reports/svr-ftp/rummery tr166.ps.Z.
-
(1994)
On-line Q-learning Using Connectionist Systems
-
-
Rummery, G.A.1
Niranjan, M.2
-
16
-
-
21844465127
-
Tree-based batch mode reinforcement learning
-
D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," Journal of Machine Learning Research, vol. 6, pp. 503-556, 2005.
-
(2005)
Journal of Machine Learning Research
, vol.6
, pp. 503-556
-
-
Ernst, D.1
Geurts, P.2
Wehenkel, L.3
-
17
-
-
33646398129
-
Neural fitted Q-iteration - First experiences with a data efficient neural reinforcement learning method
-
ser. Lecture Notes in Computer Science, Porto, Portugal, 3-7 October
-
M. Riedmiller, "Neural fitted Q-iteration - first experiences with a data efficient neural reinforcement learning method," in Proceedings 16th European Conference on Machine Learning (ECML-05), ser. Lecture Notes in Computer Science, vol. 3720, Porto, Portugal, 3-7 October 2005, pp. 317-328.
-
(2005)
Proceedings 16th European Conference on Machine Learning (ECML-05)
, vol.3720
, pp. 317-328
-
-
Riedmiller, M.1
-
18
-
-
85161978146
-
Fitted Q-iteration in continuous action-space MDPs
-
J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Eds. MIT Press
-
A. Antos, R. Munos, and Cs. Szepesvári, "Fitted Q-iteration in continuous action-space MDPs," in Advances in Neural Information Processing Systems 20, J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Eds. MIT Press, 2008, pp. 9-16.
-
(2008)
Advances in Neural Information Processing Systems
, vol.20
, pp. 9-16
-
-
Antos, A.1
Munos, R.2
Szepesvári, C.S.3
-
19
-
-
85153965130
-
Reinforcement learning with soft state aggregation
-
G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. MIT Press
-
S. P. Singh, T. Jaakkola, and M. I. Jordan, "Reinforcement learning with soft state aggregation," in Advances in Neural Information Processing Systems 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. MIT Press, 1995, pp. 361-368.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
, pp. 361-368
-
-
Singh, S.P.1
Jaakkola, T.2
Jordan, M.I.3
-
20
-
-
14344263882
-
Interpolation-based Q-learning
-
Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004
-
Cs. Szepesvári and W. D. Smart, "Interpolation-based Q-learning," in Proceedings 21st International Conference on Machine Learning (ICML- 04), Bannf, Canada, 4-8 July 2004, pp. 791-798. (Pubitemid 40290882)
-
(2004)
Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004
, pp. 791-798
-
-
Szepesvari, C.1
Smart, W.D.2
-
21
-
-
33845529505
-
Reinforcement learning: An overview
-
achen, Germany, 14-15 September
-
P. Y. Glorennec, "Reinforcement learning: An overview," in Proceedings European Symposium on Intelligent Techniques (ESIT-00), Aachen, Germany, 14-15 September 2000, pp. 17-35.
-
(2000)
Proceedings European Symposium on Intelligent Techniques (ESIT-00)
, pp. 17-35
-
-
Glorennec, P.Y.1
-
22
-
-
0000123778
-
Self-improving reactive agents based on reinforcement learning, planning and teaching
-
Aug. , special issue on reinforcement learning
-
L.-J. Lin, "Self-improving reactive agents based on reinforcement learning, planning and teaching," Machine Learning, vol. 8, no. 3-4, pp. 293-321, Aug. 1992, special issue on reinforcement learning.
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 293-321
-
-
Lin, L.-J.1
-
23
-
-
70049104729
-
Fitted Q-iteration by advantage weighted regression
-
D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. MIT Press
-
G. Neumann and J. Peters, "Fitted Q-iteration by advantage weighted regression," in Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. MIT Press, 2009, pp. 1177-1184.
-
(2009)
Advances in Neural Information Processing Systems
, vol.21
, pp. 1177-1184
-
-
Neumann, G.1
Peters, J.2
-
24
-
-
0036832956
-
Kernel-based reinforcement learning
-
DOI 10.1023/A:1017928328829
-
D. Ormoneit and S. Sen, "Kernel-based reinforcement learning," Machine Learning, vol. 49, no. 2-3, pp. 161-178, 2002. (Pubitemid 34325684)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 161-178
-
-
Ormoneit, D.1
Sen, A.2
-
26
-
-
70449644892
-
Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems
-
St. Louis, US, 10-12 June
-
A. M. Farahmand, M. Ghavamzadeh, Cs. Szepesvári, and S. Mannor, "Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems," in Proceedings 2009 American Control Conference (ACC-09), St. Louis, US, 10-12 June 2009, pp. 725-730.
-
(2009)
Proceedings 2009 American Control Conference (ACC-09)
, pp. 725-730
-
-
Farahmand, A.M.1
Ghavamzadeh, M.2
Szepesvári, C.S.3
Mannor, S.4
-
27
-
-
56449091120
-
An analysis of reinforcement learning with function approximation
-
Helsinki, Finland., 5-9 July
-
F. S. Melo, S. P. Meyn, and M. I. Ribeiro, "An analysis of reinforcement learning with function approximation," in Proceedings 25th International Conference on Machine Learning (ICML-08), Helsinki, Finland., 5-9 July 2008, pp. 664-671.
-
(2008)
Proceedings 25th International Conference on Machine Learning (ICML-08)
, pp. 664-671
-
-
Melo, F.S.1
Meyn, S.P.2
Ribeiro, M.I.3
-
28
-
-
33750356021
-
Approximate policy iteration for closed-loop learning of visual tasks
-
Machine Learning: ECML 2006 - 17th European Conference on Machine Learning, Proceedings
-
S. Jodogne, C. Briquet, and J. H. Piater, "Approximate policy iteration for closed-loop learning of visual tasks," in Proceedings 17th European Conference on Machine Learning (ECML-06), ser. Lecture Notes in Computer Science, vol. 4212, Berlin, Germany, 18-22 September 2006, pp. 210-221. (Pubitemid 44618833)
-
(2006)
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
, vol.4212
, pp. 210-221
-
-
Jodogne, S.1
Briquet, C.2
Piater, J.H.3
-
29
-
-
40849145988
-
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
-
A. Antos, Cs. Szepesvári, and R. Munos, "Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path," Machine Learning, vol. 71, no. 1, pp. 89-129, 2008.
-
(2008)
Machine Learning
, vol.71
, Issue.1
, pp. 89-129
-
-
Antos, A.1
Szepesvári, C.S.2
Munos, R.3
-
30
-
-
70049096468
-
Regularized policy iteration
-
D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. MIT Press
-
A. M. Farahmand, M. Ghavamzadeh, Cs. Szepesvári, and S. Mannor, "Regularized policy iteration," in Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. MIT Press, 2009, pp. 441-448.
-
(2009)
Advances in Neural Information Processing Systems
, vol.21
, pp. 441-448
-
-
Farahmand, A.M.1
Ghavamzadeh, M.2
Szepesvári, C.S.3
Mannor, S.4
-
32
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
S. J. Bradtke and A. G. Barto, "Linear least-squares algorithms for temporal difference learning," Machine Learning, vol. 22, no. 1-3, pp. 33-57, 1996. (Pubitemid 126724362)
-
(1996)
Machine Learning
, vol.22
, Issue.1-3
, pp. 33-57
-
-
Bradtke, S.J.1
-
33
-
-
77956525931
-
Least-squares λ policy iteration: Biasvariance trade-off in control problems
-
Haifa, Israel, 21-24 June
-
C. Thiery and B. Scherrer, "Least-squares λ policy iteration: Biasvariance trade-off in control problems," in Proceedings 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21-24 June 2010, pp. 1071-1078.
-
(2010)
Proceedings 27th International Conference on Machine Learning (ICML-10)
, pp. 1071-1078
-
-
Thiery, C.1
Scherrer, B.2
-
34
-
-
4243567726
-
-
Cambridge, US, Tech. Rep. LIDS-P-2349
-
D. P. Bertsekas and S. Ioffe, "Temporal differences-based policy iteration and applications in neuro-dynamic programming," Massachusetts Institute of Technology, Cambridge, US, Tech. Rep. LIDS-P-2349, 1996, available at http://web.mit.edu/dimitrib/www/Tempdif.pdf.
-
(1996)
Temporal Differences-based Policy Iteration and Applications in Neuro-dynamic Programming
-
-
Bertsekas, D.P.1
Ioffe, S.2
-
35
-
-
79953155554
-
-
Massachusetts Institute of Technology, Cambridge, US, Tech. Rep. LIDS 2833, July
-
D. P. Bertsekas, "Approximate policy iteration: A survey and some new methods," Massachusetts Institute of Technology, Cambridge, US, Tech. Rep. LIDS 2833, July 2010.
-
(2010)
Approximate Policy Iteration: A Survey and Some New Methods
-
-
Bertsekas, D.P.1
-
36
-
-
1942420814
-
Reinforcement learning as classification: Leveraging modern classifiers
-
Washington, US, 21-24 August
-
M. G. Lagoudakis and R. Parr, "Reinforcement learning as classification: Leveraging modern classifiers," in Proceedings 20th International Conference on Machine Learning (ICML-03). Washington, US, 21-24 August 2003, pp. 424-431.
-
(2003)
Proceedings 20th International Conference on Machine Learning (ICML-03)
, pp. 424-431
-
-
Lagoudakis, M.G.1
Parr, R.2
-
37
-
-
48349140736
-
Rollout sampling approximate policy iteration
-
C. Dimitrakakis and M. Lagoudakis, "Rollout sampling approximate policy iteration," Machine Learning, vol. 72, no. 3, pp. 157-171, 2008.
-
(2008)
Machine Learning
, vol.72
, Issue.3
, pp. 157-171
-
-
Dimitrakakis, C.1
Lagoudakis, M.2
-
38
-
-
70350680870
-
Learning RoboCup-keepaway with kernels
-
T. Jung and D. Polani, "Learning RoboCup-keepaway with kernels," in Gaussian Processes in Practice, ser. JMLR Workshop and Conference Proceedings, vol. 1, 2007, pp. 33-57.
-
(2007)
Gaussian Processes in Practice, Ser. JMLR Workshop and Conference Proceedings
, vol.1
, pp. 33-57
-
-
Jung, T.1
Polani, D.2
-
39
-
-
77957782880
-
Online leastsquares policy iteration for reinforcement learning control
-
Baltimore, US, 30 June - 2 July
-
L. Buşoniu, D. Ernst, B. De Schutter, and R. Babuška, "Online leastsquares policy iteration for reinforcement learning control," in Proceedings 2010 American Control Conference (ACC-10), Baltimore, US, 30 June - 2 July 2010, pp. 486-491.
-
(2010)
Proceedings 2010 American Control Conference (ACC-10)
, pp. 486-491
-
-
Buşoniu, L.1
Ernst, D.2
De Schutter, B.3
Babuška, R.4
-
40
-
-
85156221438
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. MIT Press
-
R. S. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," in Advances in Neural Information Processing Systems 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. MIT Press, 1996, pp. 1038-1044.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, pp. 1038-1044
-
-
Sutton, R.S.1
-
41
-
-
84899834143
-
Online exploration in least-squares policy iteration
-
Budapest, Hungary, 10-15 May
-
L. Li, M. L. Littman, and C. R. Mansley, "Online exploration in least-squares policy iteration," in Proceedings 8th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-09), vol. 2, Budapest, Hungary, 10-15 May 2009, pp. 733-739.
-
(2009)
Proceedings 8th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-09)
, vol.2
, pp. 733-739
-
-
Li, L.1
Littman, M.L.2
Mansley, C.R.3
-
42
-
-
71149099079
-
Fast gradient-descent methods for temporal-difference learning with linear function approximation
-
Montreal, Canada, 14-18 June
-
R. Sutton, H. Maei, D. Precup, S. Bhatnagar, D. Silver, Cs. Szepesvari, and E. Wiewiora, "Fast gradient-descent methods for temporal-difference learning with linear function approximation," in Proceedings 26th International Conference on Machine Learning (ICML-09), Montreal, Canada, 14-18 June 2009, pp. 993-1000.
-
(2009)
Proceedings 26th International Conference on Machine Learning (ICML-09)
, pp. 993-1000
-
-
Sutton, R.1
Maei, H.2
Precup, D.3
Bhatnagar, S.4
Silver, D.5
Szepesvari, C.S.6
Wiewiora, E.7
-
43
-
-
77956541799
-
Toward offpolicy learning control with function approximation
-
Haifa, Israel, 21-24 June
-
H. R. Maei, C. Szepesvári, S. Bhatnagar, and R. S. Sutton, "Toward offpolicy learning control with function approximation," in Proceedings 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21-24 June 2010, pp. 719-726.
-
(2010)
Proceedings 27th International Conference on Machine Learning (ICML-10)
, pp. 719-726
-
-
Maei, H.R.1
Szepesvári, C.2
Bhatnagar, S.3
Sutton, R.S.4
-
44
-
-
77956523230
-
Analysis of a classification-based policy iteration algorithm
-
Haifa, Israel, 21-24 June
-
A. Lazaric, M. Ghavamzadeh, and R. Munos, "Analysis of a classification-based policy iteration algorithm," in Proceedings 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21-24 June 2010, pp. 607-614.
-
(2010)
Proceedings 27th International Conference on Machine Learning (ICML-10)
, pp. 607-614
-
-
Lazaric, A.1
Ghavamzadeh, M.2
Munos, R.3
-
45
-
-
77956517288
-
Convergence of least squares temporal difference methods under general conditions
-
Haifa, Israel, 21-24 June
-
H. Yu, "Convergence of least squares temporal difference methods under general conditions," in Proceedings 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21-24 June 2010, pp. 1207-1214.
-
(2010)
Proceedings 27th International Conference on Machine Learning (ICML-10)
, pp. 1207-1214
-
-
Yu, H.1
-
46
-
-
77956551905
-
Should one compute the Temporal Difference fix point or minimize the Bellman Residual? the unified oblique projection view
-
Haifa, Israel, 21-24 June
-
B. Scherrer, "Should one compute the Temporal Difference fix point or minimize the Bellman Residual? the unified oblique projection view," in Proceedings 27th International Conference on Machine Learning (ICML- 10), Haifa, Israel, 21-24 June 2010, pp. 959-966.
-
(2010)
Proceedings 27th International Conference on Machine Learning (ICML-10)
, pp. 959-966
-
-
Scherrer, B.1
-
47
-
-
77956549349
-
Finite-sample analysis of LSTD
-
Haifa, Israel, 21-24 June
-
A. Lazaric, M. Ghavamzadeh, and R. Munos, "Finite-sample analysis of LSTD," in Proceedings 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21-24 June 2010, pp. 615-622.
-
(2010)
Proceedings 27th International Conference on Machine Learning (ICML-10)
, pp. 615-622
-
-
Lazaric, A.1
Ghavamzadeh, M.2
Munos, R.3
-
48
-
-
0037288469
-
Approximate gradient methods in policy-space optimization of Markov reward processes
-
P. Marbach and J. N. Tsitsiklis, "Approximate gradient methods in policy-space optimization of Markov reward processes," Discrete Event Dynamic Systems: Theory and Applications, vol. 13, no. 1-2, pp. 111- 148, 2003.
-
(2003)
Discrete Event Dynamic Systems: Theory and Applications
, vol.13
, Issue.1-2
, pp. 111-148
-
-
Marbach, P.1
Tsitsiklis, J.N.2
-
49
-
-
0020970738
-
Neuronlike adaptive elements that can solve difficult learning control problems
-
A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Transactions on Systems, Man, and Cybernetics, vol. 13, no. 5, pp. 833-846, 1983.
-
(1983)
IEEE Transactions on Systems, Man, and Cybernetics
, vol.13
, Issue.5
, pp. 833-846
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
50
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
S. A. Solla, T. K. Leen, and K.-R. Müller, Eds. MIT Press
-
R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in Neural Information Processing Systems 12, S. A. Solla, T. K. Leen, and K.-R. Müller, Eds. MIT Press, 2000, pp. 1057-1063.
-
(2000)
Advances in Neural Information Processing Systems
, vol.12
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.A.2
Singh, S.P.3
Mansour, Y.4
-
51
-
-
4043069840
-
On actor-critic algorithms
-
V. R. Konda and J. N. Tsitsiklis, "On actor-critic algorithms," SIAM Journal on Control and Optimization, vol. 42, no. 4, pp. 1143-1166, 2003.
-
(2003)
SIAM Journal on Control and Optimization
, vol.42
, Issue.4
, pp. 1143-1166
-
-
Konda, V.R.1
Tsitsiklis, J.N.2
-
52
-
-
33646243319
-
A natural policy gradient
-
T. G. Dietterich, S. Becker, and Z. Ghahramani, Eds. MIT Press
-
S. Kakade, "A natural policy gradient," in Advances in Neural Information Processing Systems 14, T. G. Dietterich, S. Becker, and Z. Ghahramani, Eds. MIT Press, 2001, pp. 1531-1538.
-
(2001)
Advances in Neural Information Processing Systems
, vol.14
, pp. 1531-1538
-
-
Kakade, S.1
-
53
-
-
40649106649
-
Natural actor-critic
-
J. Peters and S. Schaal, "Natural actor-critic," Neurocomputing, vol. 71, no. 7-9, pp. 1180-1190, 2008.
-
(2008)
Neurocomputing
, vol.71
, Issue.7-9
, pp. 1180-1190
-
-
Peters, J.1
Schaal, S.2
-
54
-
-
70349984547
-
Natural actorcritic algorithms
-
S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and M. Lee, "Natural actorcritic algorithms," Automatica, vol. 45, no. 11, pp. 2471-2482, 2009.
-
(2009)
Automatica
, vol.45
, Issue.11
, pp. 2471-2482
-
-
Bhatnagar, S.1
Sutton, R.2
Ghavamzadeh, M.3
Lee, M.4
-
55
-
-
33750374195
-
Efficient non-linear control through neuroevolution
-
Machine Learning: ECML 2006 - 17th European Conference on Machine Learning, Proceedings
-
F. J. Gomez, J. Schmidhuber, and R. Miikkulainen, "Efficient nonlinear control through neuroevolution," in Proceedings 17th European Conference on Machine Learning (ECML-06), ser. Lecture Notes in Computer Science, vol. 4212, Berlin, Germany, 18-22 September 2006, pp. 654-662. (Pubitemid 44618874)
-
(2006)
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
, vol.4212
, pp. 654-662
-
-
Gomez, F.1
Schmidhuber, J.2
Miikkulainen, R.3
-
57
-
-
1942516890
-
The cross-entropy method for fast policy search
-
Washington, US, 21-24 August
-
S. Mannor, R. Y. Rubinstein, and Y. Gat, "The cross-entropy method for fast policy search," in Proceedings 20th International Conference on Machine Learning (ICML-03), Washington, US, 21-24 August 2003, pp. 512-519.
-
(2003)
Proceedings 20th International Conference on Machine Learning (ICML-03)
, pp. 512-519
-
-
Mannor, S.1
Rubinstein, R.Y.2
Gat, Y.3
-
58
-
-
79551686776
-
Cross-entropy optimization of control policies with adaptive basis functions
-
accepted for publication, available online
-
L. Buşoniu, D. Ernst, B. De Schutter, and R. Babuška, "Cross-entropy optimization of control policies with adaptive basis functions," IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 41, no. 1, 2011, accepted for publication, available online.
-
(2011)
IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics
, vol.41
, Issue.1
-
-
Buşoniu, L.1
Ernst, D.2
De Schutter, B.3
Babuška, R.4
-
59
-
-
0029752470
-
Feature-based methods for large scale dynamic programming
-
J. N. Tsitsiklis and B. Van Roy, "Feature-based methods for large scale dynamic programming," Machine Learning, vol. 22, no. 1-3, pp. 59-94, 1996. (Pubitemid 126724363)
-
(1996)
Machine Learning
, vol.22
, Issue.1-3
, pp. 59-94
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
60
-
-
33646714634
-
Evolutionary function approximation for reinforcement learning
-
S. Whiteson and P. Stone, "Evolutionary function approximation for reinforcement learning," Journal of Machine Learning Research, vol. 7, pp. 877-917, 2006. (Pubitemid 43736560)
-
(2006)
Journal of Machine Learning Research
, vol.7
, pp. 877-917
-
-
Whiteson, S.1
Stone, P.2
-
61
-
-
33749263205
-
Automatic basis function construction for approximate dynamic programming and reinforcement learning
-
Pittsburgh, US, 25-29 June
-
P. W. Keller, S. Mannor, and D. Precup, "Automatic basis function construction for approximate dynamic programming and reinforcement learning," in Proceedings 23rd International Conference on Machine Learning (ICML-06), Pittsburgh, US, 25-29 June 2006, pp. 449-456.
-
(2006)
Proceedings 23rd International Conference on Machine Learning (ICML-06)
, pp. 449-456
-
-
Keller, P.W.1
Mannor, S.2
Precup, D.3
-
62
-
-
67650469386
-
Feature discovery in approximate dynamic programming
-
30 March - 2 April
-
P. Preux, S. Girgin, and M. Loth, "Feature discovery in approximate dynamic programming," in Proceedings IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09), 30 March - 2 April 2009, pp. 109-116.
-
(2009)
Proceedings IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09)
, pp. 109-116
-
-
Preux, P.1
Girgin, S.2
Loth, M.3
-
63
-
-
31844451013
-
Reinforcement learning with Gaussian processes
-
DOI 10.1145/1102351.1102377, ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
-
Y. Engel, S. Mannor, and R. Meir, "Reinforcement learning with Gaussian processes," in Proceedings 22nd International Conference on Machine Learning (ICML-05), Bonn, Germany, 7-11 August 2005, pp. 201-208. (Pubitemid 43183334)
-
(2005)
ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
, pp. 201-208
-
-
Engel, Y.1
Mannor, S.2
Meir, R.3
-
64
-
-
71149100225
-
Kernelized value function approximation for reinforcement learning
-
Montreal, Canada, 14-18 June
-
G. Taylor and R. Parr, "Kernelized value function approximation for reinforcement learning," in Proceedings 26th International Conference on Machine Learning (ICML-09), Montreal, Canada, 14-18 June 2009, pp. 1017-1024.
-
(2009)
Proceedings 26th International Conference on Machine Learning (ICML-09)
, pp. 1017-1024
-
-
Taylor, G.1
Parr, R.2
-
65
-
-
17444414191
-
Basis function adaptation in temporal difference reinforcement learning
-
DOI 10.1007/s10479-005-5732-z
-
I. Menache, S. Mannor, and N. Shimkin, "Basis function adaptation in temporal difference reinforcement learning," Annals of Operations Research, vol. 134, no. 1, pp. 215-238, 2005. (Pubitemid 40550047)
-
(2005)
Annals of Operations Research
, vol.134
, Issue.1
, pp. 215-238
-
-
Menache, I.1
Mannor, S.2
Shimkin, N.3
-
66
-
-
35748957806
-
Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes
-
S. Mahadevan and M. Maggioni, "Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes," Journal of Machine Learning Research, vol. 8, pp. 2169-2231, 2007. (Pubitemid 350046199)
-
(2007)
Journal of Machine Learning Research
, vol.8
, pp. 2169-2231
-
-
Mahadevan, S.1
Maggioni, M.2
-
67
-
-
71149121683
-
Regularization and feature selection in leastsquares temporal difference learning
-
Montreal, Canada, 14-18 June
-
J. Z. Kolter and A. Ng, "Regularization and feature selection in leastsquares temporal difference learning," in Proceedings 26th International Conference on Machine Learning (ICML-09), Montreal, Canada, 14-18 June 2009, pp. 521-528.
-
(2009)
Proceedings 26th International Conference on Machine Learning (ICML-09)
, pp. 521-528
-
-
Kolter, J.Z.1
Ng, A.2
-
68
-
-
56449092660
-
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
-
Helsinki, Finland, 5-9 July
-
R. Parr, L. Li, G. Taylor, C. Painter-Wakefield, and M. Littman, "An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning," in Proceedings 25th Annual International Conference on Machine Learning (ICML-08), Helsinki, Finland, 5-9 July 2008, pp. 752-759.
-
(2008)
Proceedings 25th Annual International Conference on Machine Learning (ICML-08)
, pp. 752-759
-
-
Parr, R.1
Li, L.2
Taylor, G.3
Painter-Wakefield, C.4
Littman, M.5
-
69
-
-
34548807200
-
Reinforcement learning in continuous action spaces
-
DOI 10.1109/ADPRL.2007.368199, 4220844, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
-
H. van Hasselt and M. Wiering, "Reinforcement learning in continuous action spaces," in Proceedings IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL-07), Honolulu, US, 1-5 April 2007, pp. 272-279. (Pubitemid 47431396)
-
(2007)
Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
, pp. 272-279
-
-
Van Hasselt, H.1
Wiering, M.A.2
-
70
-
-
71149094455
-
Binary action search for learning continuous-action control policies
-
Montreal, Canada, 14-18 June
-
J. Pazis and M. Lagoudakis, "Binary action search for learning continuous-action control policies," in Proceedings of the 26th International Conference on Machine Learning (ICML-09), Montreal, Canada, 14-18 June 2009, pp. 793-800.
-
(2009)
Proceedings of the 26th International Conference on Machine Learning (ICML-09)
, pp. 793-800
-
-
Pazis, J.1
Lagoudakis, M.2
-
71
-
-
85132026293
-
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
-
Austin, US, 21-23 June
-
R. S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," in Proceedings 7th International Conference on Machine Learning (ICML-90), Austin, US, 21-23 June 1990, pp. 216-224.
-
(1990)
Proceedings 7th International Conference on Machine Learning (ICML-90)
, pp. 216-224
-
-
Sutton, R.S.1
-
73
-
-
38149013086
-
Tuning bandit algorithms in stochastic environments
-
Sendai, Japan, 1-4 October
-
J.-Y. Audibert, R. Munos, and Cs. Szepesvári, "Tuning bandit algorithms in stochastic environments," in Proceedings 18th International Conference on Algorithmic Learning Theory (ALT-07), Sendai, Japan, 1-4 October 2007, pp. 150-165.
-
(2007)
Proceedings 18th International Conference on Algorithmic Learning Theory (ALT-07)
, pp. 150-165
-
-
Audibert, J.-Y.1
Munos, R.2
Szepesvári, C.S.3
-
74
-
-
77952027689
-
Online optimization in X-armed bandits
-
D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. MIT Press
-
S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, "Online optimization in X-armed bandits," in Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. MIT Press, 2009, pp. 201-208.
-
(2009)
Advances in Neural Information Processing Systems
, vol.21
, pp. 201-208
-
-
Bubeck, S.1
Munos, R.2
Stoltz, G.3
Szepesvári, C.4
-
75
-
-
33750586671
-
Solving factored MDPs with hybrid state and action variables
-
B. Kveton, M. Hauskrecht, and C. Guestrin, "Solving factored MDPs with hybrid state and action variables," Journal of Artificial Intelligence Research, vol. 27, pp. 153-201, 2006. (Pubitemid 44681376)
-
(2006)
Journal of Artificial Intelligence Research
, vol.27
, pp. 153-201
-
-
Kveton, B.1
Hauskrecht, M.2
Guestrin, C.3
-
76
-
-
49049110053
-
Guest editorial - Special issue on adaptive dynamic programming and reinforcement learning in feedback control
-
F. Lewis, D. Liu, and G. Lendaris, "Guest editorial - special issue on adaptive dynamic programming and reinforcement learning in feedback control," IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 38, no. 4, pp. 896-897, 2008.
-
(2008)
IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics
, vol.38
, Issue.4
, pp. 896-897
-
-
Lewis, F.1
Liu, D.2
Lendaris, G.3
-
79
-
-
0003565783
-
-
20 November, update of Chapter 6 in volume 2 of the book Dynamic Programming and Optimal Control
-
D. P. Bertsekas, "Approximate dynamic programming," 20 November 2010, update of Chapter 6 in volume 2 of the book Dynamic Programming and Optimal Control. Available at http://web.mit.edu/dimitrib/www/dpchapter.html.
-
(2010)
Approximate Dynamic Programming
-
-
Bertsekas, D.P.1
-
80
-
-
84860528287
-
Reinforcement learning in a nutshell
-
Bruges, Belgium, 25-27 April
-
V. Heidrich-Meisner, M. Lauer, C. Igel, and M. Riedmiller, "Reinforcement learning in a nutshell," in Proceedings 15th European Symposium on Artificial Neural Networks (ESANN-07), Bruges, Belgium, 25-27 April 2007, pp. 277-288.
-
(2007)
Proceedings 15th European Symposium on Artificial Neural Networks (ESANN-07)
, pp. 277-288
-
-
Heidrich-Meisner, V.1
Lauer, M.2
Igel, C.3
Riedmiller, M.4
-
81
-
-
33645410501
-
Dynamic programming and suboptimal control: A survey from ADP to MPC
-
special issue for the CDC-ECC-05 in Seville, Spain
-
D. P. Bertsekas, "Dynamic programming and suboptimal control: A survey from ADP to MPC," European Journal of Control, vol. 11, no. 4-5, pp. 310-334, 2005, special issue for the CDC-ECC-05 in Seville, Spain.
-
(2005)
European Journal of Control
, vol.11
, Issue.4-5
, pp. 310-334
-
-
Bertsekas, D.P.1
-
82
-
-
70350192140
-
Numerical dynamic programming in economics
-
H. M. Amman, D. A. Kendrick, and J. Rust, Eds. Elsevier, ch. 14
-
J. Rust, "Numerical dynamic programming in economics," in Handbook of Computational Economics, H. M. Amman, D. A. Kendrick, and J. Rust, Eds. Elsevier, 1996, vol. 1, ch. 14, pp. 619-729.
-
(1996)
Handbook of Computational Economics
, vol.1
, pp. 619-729
-
-
Rust, J.1
|