-
1
-
-
84890921683
-
-
Accessed: 20/08/2012
-
RL competition. http://www.rl-competition.org/, 2012. Accessed: 20/08/2012.
-
(2012)
RL Competition
-
-
-
2
-
-
84890955680
-
IFSA: Incremental feature-set augmentation for reinforcement learning tasks
-
New York, NY, USA
-
M. Ahmadi, M. E. Taylor, and P. Stone. IFSA: incremental feature-set augmentation for reorcement learning tasks. In International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 1-8, New York, NY, USA, 2007.
-
(2007)
International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS)
, pp. 1-8
-
-
Ahmadi, M.1
Taylor, M.E.2
Stone, P.3
-
4
-
-
40849145988
-
Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path
-
A. Antos, C. Szepesvári, and R. Munos. Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1):89-129, 2008.
-
(2008)
Machine Learning
, vol.71
, Issue.1
, pp. 89-129
-
-
Antos, A.1
Szepesvári, C.2
Munos, R.3
-
5
-
-
78649507911
-
A bayesian sampling approach to exploration in reinforcement learning
-
Arlington, Virginia, United States
-
J. Asmuth, L. Li, M. Littman, A. Nouri, and D. Wingate. A Bayesian sampling approach to exploration in reinforcement learning. In International Conference on Uncertainty in Artificial Intelligence (UAI), pages 19-26, Arlington, Virginia, United States, 2009.
-
(2009)
International Conference on Uncertainty in Artificial Intelligence (UAI)
, pp. 19-26
-
-
Asmuth, J.1
Li, L.2
Littman, M.3
Nouri, A.4
Wingate, D.5
-
6
-
-
85151728371
-
Residual algorithms: Reinforcement learning with function approximation
-
AUAI Press
-
AUAI Press. L. C. Baird. Residual algorithms: Reinforcement learning with function approximation. In ICML, pages 30-37, 1995.
-
(1995)
ICML
, pp. 30-37
-
-
Baird, L.C.1
-
7
-
-
38149008840
-
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
-
A. d. M. S. Barreto and C. W. Anderson. Restricted gradient-descent algorithm for value-function approximation in reinforcement learning. Artificial Intelligence, 172:454 - 482, 2008.
-
(2008)
Artificial Intelligence
, vol.172
, pp. 454-482
-
-
Barreto, A.D.M.S.1
Anderson, C.W.2
-
8
-
-
2442603180
-
Monte carlo matrix inversion and reinforcement learning
-
Morgan Kaufmann
-
A. Barto and M. Duff. Monte carlo matrix inversion and reinforcement learning. In Neural Information Processing Systems (NIPS), pages 687-694. Morgan Kaufmann, 1994.
-
(1994)
Neural Information Processing Systems (NIPS)
, pp. 687-694
-
-
Barto, A.1
Duff, M.2
-
9
-
-
0029210635
-
Learning to act using real-time dynamic programming
-
A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.
-
(1995)
Artificial Intelligence
, vol.72
, pp. 81-138
-
-
Barto, A.1
Bradtke, S.2
Singh, S.3
-
12
-
-
0003787146
-
-
Princeton University Press,Princeton, NJ
-
R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.
-
(1957)
Dynamic Programming
-
-
Bellman, R.E.1
-
15
-
-
0003487482
-
-
(Optimization and Neural Computation Series, 3) Athena Scientific, May
-
D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3). Athena Scientific, May 1996.
-
(1996)
Neuro-Dynamic Programming
-
-
Bertsekas, D.P.1
Tsitsiklis, J.N.2
-
17
-
-
70349982705
-
Incremental natural actorcritic algorithms
-
In J.C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors MIT Press
-
S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee. Incremental natural actorcritic algorithms. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, Advances in Neural Information Processing Systems (NIPS), pages 105-112. MIT Press, 2007.
-
(2007)
Advances in Neural Information Processing Systems (NIPS)
, pp. 105-112
-
-
Bhatnagar, S.1
Sutton, R.S.2
Ghavamzadeh, M.3
Lee, M.4
-
18
-
-
84899910885
-
Sigma point policy iteration
-
Richland, SC
-
M. Bowling, A. Geramifard, and D. Wingate. Sigma Point Policy Iteration. In International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), volume 1, pages 379-386, Richland, SC, 2008.
-
(2008)
International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), Volume 1
, pp. 379-386
-
-
Bowling, M.1
Geramifard, A.2
Wingate, D.3
-
19
-
-
85153940465
-
Generalization in reinforcement learning: Safely approximating the value function
-
In G. Tesauro, D. Touretzky, and T. Lee, editors Cambridge, MA
-
J. Boyan and A. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. Touretzky, and T. Lee, editors, Neural Information Processing Systems (NIPS), pages 369-376, Cambridge, MA, 1995.
-
(1995)
Neural Information Processing Systems (NIPS)
, pp. 369-376
-
-
Boyan, J.1
Moore, A.2
-
20
-
-
0038595396
-
Least-squares temporal difference learning
-
The MIT Press. Morgan Kaufmann, San Francisco, CA
-
The MIT Press. J. A. Boyan. Least-squares temporal difference learning. In International Conference on Machine Learning (ICML), pages 49-56. Morgan Kaufmann, San Francisco, CA, 1999.
-
(1999)
International Conference on Machine Learning (ICML)
, pp. 49-56
-
-
Boyan, J.A.1
-
21
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
S. J. Bradtke and A. G. Barto. Linear least-squares algorithms for temporal difference learning. Journal of Machine Learning Research (JMLR), 22:33-57, 1996. (Pubitemid 126724362)
-
(1996)
Machine Learning
, vol.22
, Issue.1-3
, pp. 33-57
-
-
Bradtke, S.J.1
-
23
-
-
85046476577
-
-
CRC Press, Boca Raton, Florida
-
L. Bu̧soniu, R. Babuška, B. De Schutter, and D. Ernst. Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton, Florida, 2010.
-
Reinforcement Learning and Dynamic Programming Using Function Approximators
, pp. 2010
-
-
Bu̧soniu, L.1
Babuška, R.2
De Schutter, B.3
Ernst, D.4
-
24
-
-
0002278788
-
Hierarchical reinforcement learning with the MAXQ value function decomposition
-
T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence and Research (JAIR), 13(1):227- 303, Nov. 2000. (Pubitemid 33682087)
-
(2000)
Journal of Artificial Intelligence Research
, vol.13
, pp. 227-303
-
-
Dietterich, T.G.1
-
25
-
-
70349417489
-
Reinforcement learning benchmarks and bake-offs II
-
A. Dutech, T. Edmunds, J. Kok, M. Lagoudakis, M. Littman, M. Riedmiller, B. Russell, B. Scherrer, R. Sutton, S. Timmer, et al. Reinforcement learning benchmarks and bake-offs II. In Advances in Neural Information Processing Systems (NIPS) 17 Workshop, 2005.
-
(2005)
Advances in Neural Information Processing Systems (NIPS) 17 Workshop
-
-
Dutech, A.1
Edmunds, T.2
Kok, J.3
Lagoudakis, M.4
Littman, M.5
Riedmiller, M.6
Russell, B.7
Scherrer, B.8
Sutton, R.9
Timmer, S.10
-
28
-
-
70049096468
-
Regularized policy iteration
-
In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors MIT Press
-
A. Farahmand, M. Ghavamzadeh, C. Szepesvári, and S. Mannor. Regularized policy iteration. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems (NIPS), pages 441-448. MIT Press, 2008.
-
(2008)
Advances in Neural Information Processing Systems (NIPS)
, pp. 441-448
-
-
Farahmand, A.1
Ghavamzadeh, M.2
Szepesvári, C.3
Mannor, S.4
-
30
-
-
80053456360
-
Online discovery of feature dependencies
-
In L. Getoor and T. Scheffer, editors ACM, June
-
A. Geramifard, F. Doshi, J. Redding, N. Roy, and J. How. Online discovery of feature dependencies. In L. Getoor and T. Scheffer, editors, International Conference on Machine Learning (ICML), pages 881-888. ACM, June 2011.
-
(2011)
International Conference on Machine Learning (ICML)
, pp. 881-888
-
-
Geramifard, A.1
Doshi, F.2
Redding, J.3
Roy, N.4
How, J.5
-
31
-
-
84869387619
-
Model estimation within planning and learning
-
June
-
A. Geramifard, J. Redding, J. Joseph, N. Roy, and J. P. How. Model estimation within planning and learning. In American Control Conference (ACC), June 2012.
-
(2012)
American Control Conference (ACC)
-
-
Geramifard, A.1
Redding, J.2
Joseph, J.3
Roy, N.4
How, J.P.5
-
33
-
-
84906570323
-
Batch iFDD: A scalable matching pursuit algorithm for solving MDPs
-
Bellevue, Washington, USA AUAI Press
-
Geramifard, T. J. Walsh, N. Roy, and J. How. Batch iFDD: A Scalable Matching Pursuit Algorithm for Solving MDPs. In Proceedings of the 29th Annual Conference on Uncertainty in Artificial Intelligence (UAI), Bellevue, Washington, USA, 2013b. AUAI Press.
-
(2013)
Proceedings of the 29th Annual Conference on Uncertainty in Artificial Intelligence (UAI)
-
-
Geramifard1
Walsh, T.J.2
Roy, N.3
How, J.4
-
34
-
-
70350172883
-
Feature discovery in reinforcement learning using genetic programming
-
INRIA
-
S. Girgin and P. Preux. Feature Discovery in Reinforcement Learning using Genetic Programming. Research Report RR-6358, INRIA, 2007.
-
(2007)
Research Report RR-6358
-
-
Girgin, S.1
Preux, P.2
-
36
-
-
84880694195
-
Stable function approximation in dynamic programming
-
Tahoe City, California, July 9-12
-
G. Gordon. Stable function approximation in dynamic programming. In International Conference on Machine Learning (ICML), page 261, Tahoe City, California, July 9-12 1995.
-
(1995)
International Conference on Machine Learning (ICML)
, pp. 261
-
-
Gordon, G.1
-
37
-
-
67649964731
-
Reinforcement learning: A tutorial survey and recent advances
-
April
-
Morgan Kaufmann. A. Gosavi. Reinforcement learning: A tutorial survey and recent advances. INFORMS J. on Computing, 21(2):178-192, April 2009.
-
(2009)
INFORMS J. on Computing
, vol.21
, Issue.2
, pp. 178-192
-
-
Kaufmann, M.1
Gosavi, A.2
-
38
-
-
57749096203
-
Adaptive importance sampling with automatic model selection in value function approximation
-
H. Hachiya, T. Akiyama, M. Sugiyama, and J. Peters. Adaptive importance sampling with automatic model selection in value function approximation. In Association for the Advancement of Artificial Intelligence (AAAI), pages 1351-1356, 2008.
-
(2008)
Association for the Advancement of Artificial Intelligence (AAAI)
, pp. 1351-1356
-
-
Hachiya, H.1
Akiyama, T.2
Sugiyama, M.3
Peters, J.4
-
41
-
-
4243385070
-
On the convergence of stochastic iterative dynamic programming algorithms
-
Cambridge, MA, August
-
T. Jaakkola, M. Jordan, and S. Singh. on the convergence of stochastic iterative dynamic programming algorithms. Technical report, Massachusetts Institute of Technology, Cambridge, MA, August 1993.
-
(1993)
Technical Report, Massachusetts Institute of Technology
-
-
Jaakkola, T.1
Jordan, M.2
Singh, S.3
-
44
-
-
84890925143
-
Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration
-
September
-
T. Jung and P. Stone. Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration. In European Conference on Machine Learning (ECML), September 2010.
-
(2010)
European Conference on Machine Learning (ECML)
-
-
Jung, T.1
Stone, P.2
-
46
-
-
79958852534
-
Characterizing reinforcement learning methods through parameterized learning problems
-
S. Kalyanakrishnan and P. Stone. Characterizing reinforcement learning methods through parameterized learning problems. Machine Learning, 2011.
-
(2011)
Machine Learning
-
-
Kalyanakrishnan, S.1
Stone, P.2
-
47
-
-
71149121683
-
Regularization and feature selection in least-squares temporal difference learning
-
New York, NY, USA
-
J. Z. Kolter and A. Y. Ng. Regularization and feature selection in least-squares temporal difference learning. In International Conference on Machine Learning (ICML), pages 521-528, New York, NY, USA, 2009.
-
(2009)
International Conference on Machine Learning (ICML)
, pp. 521-528
-
-
Kolter, J.Z.1
Ng, A.Y.2
-
48
-
-
0030721089
-
Comparison of cmacs and radial basis functions for local function approximators in reinforcement learning
-
ACM
-
ACM. R. Kretchmar and C. Anderson. Comparison of cmacs and radial basis functions for local function approximators in reinforcement learning. In International Conference on Neural Networks, volume 2, pages 834-837 vol.2, 1997.
-
(1997)
International Conference on Neural Networks, Volume 2
, vol.2
, pp. 834-837
-
-
Kretchmar, R.1
Anderson, C.2
-
52
-
-
84890939404
-
Sample complexity bounds of exploration
-
In M. Wiering and M. van Otterlo, editors Springer Verlag
-
L. Li. Sample complexity bounds of exploration. In M. Wiering and M. van Otterlo, editors, Reinforcement Learning: State of the Art. Springer Verlag, 2012.
-
(2012)
Reinforcement Learning: State of the Art
-
-
Li, L.1
-
53
-
-
84899834143
-
Online exploration in least-squares policy iteration
-
Richland, SC
-
L. Li, M. L. Littman, and C. R. Mansley. Online exploration in least-squares policy iteration. In International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 733-739, Richland, SC, 2009a.
-
(2009)
International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS)
, pp. 733-739
-
-
Li, L.1
Littman, M.L.2
Mansley, C.R.3
-
55
-
-
39649089144
-
The kernel least-mean-square algorithm
-
DOI 10.1109/TSP.2007.907881
-
W. Liu, P. Pokharel, and J. Principe. The kernel least-mean-square algorithm. IEEE Transactions on Signal Processing, 56(2):543-554, 2008. (Pubitemid 351285052)
-
(2008)
IEEE Transactions on Signal Processing
, vol.56
, Issue.2
, pp. 543-554
-
-
Liu, W.1
Pokharel, P.P.2
Principe, J.C.3
-
56
-
-
84888754031
-
-
Wiley, Hoboken, New Jersey
-
W. Liu, J. C. Principe, and S. Haykin. Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, Hoboken, New Jersey, 2010.
-
(2010)
Kernel Adaptive Filtering: A Comprehensive Introduction
-
-
Liu, W.1
Principe, J.C.2
Haykin, S.3
-
58
-
-
77956541799
-
Toward off-policy learning control with function approximation
-
In J. Fürnkranz and T. Joachims, editors Omnipress
-
H. R. Maei, C. Szepesvári, S. Bhatnagar, and R. S. Sutton. Toward off-policy learning control with function approximation. In J. Fürnkranz and T. Joachims, editors, International Conference on Machine Learning (ICML), pages 719-726. Omnipress, 2010.
-
(2010)
International Conference on Machine Learning (ICML)
, pp. 719-726
-
-
Maei, H.R.1
Szepesvári, C.2
Bhatnagar, S.3
Sutton, R.S.4
-
62
-
-
0036832952
-
Risk-sensitive reinforcement learning
-
DOI 10.1023/A:1017940631555
-
O. Mihatsch and R. Neuneier. Risk-sensitive reinforcement learning. Journal of Machine Learning Research (JMLR), 49(2-3):267-290, 2002. (Pubitemid 34325690)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 267-290
-
-
Mihatsch, O.1
Neuneier, R.2
-
63
-
-
0000672424
-
Fast learning in networks of locally-tuned processing units
-
June
-
J. Moody and C. J. Darken. Fast learning in networks of locally-tuned processing units. Neural Computation, 1(2):281-294, June 1989.
-
(1989)
Neural Computation
, vol.1
, Issue.2
, pp. 281-294
-
-
Moody, J.1
Darken, C.J.2
-
64
-
-
0027684215
-
Prioritized sweeping: Reinforcement learning with less data and less time
-
A.W. Moore and C. G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. In Machine Learning, pages 103-130, 1993.
-
(1993)
Machine Learning
, pp. 103-130
-
-
Moore, A.W.1
Atkeson, C.G.2
-
65
-
-
84858776393
-
Multi-resolution exploration in continuous spaces
-
In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors MIT Press
-
A. Nouri and M. L. Littman. Multi-resolution exploration in continuous spaces. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems (NIPS), pages 1209-1216. MIT Press, 2009.
-
(2009)
Advances in Neural Information Processing Systems (NIPS)
, pp. 1209-1216
-
-
Nouri, A.1
Littman, M.L.2
-
66
-
-
34547982545
-
Analyzing feature generation for value-function approximation
-
New York, NY, USA
-
R. Parr, C. Painter-Wakefield, L. Li, and M. Littman. Analyzing feature generation for value-function approximation. In International Conference on Machine Learning (ICML), pages 737-744, New York, NY, USA, 2007.
-
(2007)
International Conference on Machine Learning (ICML)
, pp. 737-744
-
-
Parr, R.1
Painter-Wakefield, C.2
Li, L.3
Littman, M.4
-
67
-
-
56449092660
-
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
-
ACM New York, NY, USA
-
ACM. R. Parr, L. Li, G. Taylor, C. Painter-Wakefield, and M. L. Littman. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In International Conference on Machine Learning (ICML), pages 752-759, New York, NY, USA, 2008.
-
(2008)
International Conference on Machine Learning (ICML)
, pp. 752-759
-
-
Parr, R.1
Li, L.2
Taylor, G.3
Painter-Wakefield, C.4
Littman, M.L.5
-
68
-
-
34250635407
-
Policy gradient methods for robotics
-
DOI 10.1109/IROS.2006.282564, 4058714, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006
-
ACM. J. Peters and S. Schaal. Policy gradient methods for robotics. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2219-2225. IEEE, October 2006. (Pubitemid 46928224)
-
(2006)
IEEE International Conference on Intelligent Robots and Systems
, pp. 2219-2225
-
-
Peters, J.1
Schaal, S.2
-
69
-
-
40649106649
-
Natural actor-critic
-
March
-
J. Peters and S. Schaal. Natural actor-critic. Neurocomputing, 71:1180-1190, March 2008.
-
(2008)
Neurocomputing
, vol.71
, pp. 1180-1190
-
-
Peters, J.1
Schaal, S.2
-
74
-
-
34548763245
-
Evaluation of policy gradient methods and variants on the cart-pole benchmark
-
DOI 10.1109/ADPRL.2007.368196, 4220841, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
-
M. Riedmiller, J. Peters, and S. Schaal. Evaluation of policy gradient methods and variants on the Cart-Pole benchmark. In IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), pages 254-261, April 2007. (Pubitemid 47431393)
-
(2007)
Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
, pp. 254-261
-
-
Riedmiller, M.1
Peters, J.2
Schaal, S.3
-
77
-
-
77956551905
-
Should one compute the temporal difference fix point or minimize the bellman residual the unified oblique projection view
-
B. Scherrer. Should one compute the temporal difference fix point or minimize the bellman residual the unified oblique projection view. In International Conference on Machine Learning (ICML), 2010.
-
(2010)
International Conference on Machine Learning (ICML)
-
-
Scherrer, B.1
-
78
-
-
0347243182
-
Nonlinear Component Analysis as a Kernel Eigenvalue Problem
-
B. Schölkopf and A. Smola. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computations, 10(5):1299-1319, 1998. (Pubitemid 128463674)
-
(1998)
Neural Computation
, vol.10
, Issue.5
, pp. 1299-1319
-
-
Scholkopf, B.1
Smola, A.2
Muller, K.-R.3
-
79
-
-
0003408420
-
-
MIT Press, Cambridge, MA
-
B. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2002.
-
(2002)
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and beyond
-
-
Schölkopf, B.1
Smola, A.2
-
81
-
-
56449110907
-
Sample-based learning and search with permanent and transient memories
-
New York, NY, USA
-
D. Silver, R. S. Sutton, and M. Müller. Sample-based learning and search with permanent and transient memories. In International Conference on Machine Learning (ICML), pages 968-975, New York, NY, USA, 2008.
-
(2008)
International Conference on Machine Learning (ICML)
, pp. 968-975
-
-
Silver, D.1
Sutton, R.S.2
Müller, M.3
-
82
-
-
84863416482
-
Temporal-difference search in computer go
-
ACM
-
ACM. D. Silver, R. S. Sutton, and M. Müller. Temporal-difference search in computer go. Machine Learning, 87(2):183-219, 2012.
-
(2012)
Machine Learning
, vol.87
, Issue.2
, pp. 183-219
-
-
Silver, D.1
Sutton, R.S.2
Müller, M.3
-
84
-
-
0033901602
-
Convergence results for single-step on-policy reinforcement-learning algorithms
-
DOI 10.1023/A:1007678930559
-
S. P. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári. Convergence results for single-step on-policy reinforcement-learning algorithms. Journal of Machine Learning Research (JMLR), 38:287-308, 2000. (Pubitemid 30572449)
-
(2000)
Machine Learning
, vol.38
, Issue.3
, pp. 287-308
-
-
Singh, S.1
Jaakkola, T.2
Littman, M.L.3
Szepesvari, C.4
-
85
-
-
27544506565
-
Reinforcement learning for RoboCupsoccer keepaway
-
P. Stone, R. S. Sutton, and G. Kuhlmann. Reinforcement learning for RoboCupsoccer keepaway. International Society for Adaptive Behavior, 13(3):165-188, 2005a.
-
(2005)
International Society for Adaptive Behavior
, vol.13
, Issue.3
, pp. 165-188
-
-
Stone, P.1
Sutton, R.S.2
Kuhlmann, G.3
-
86
-
-
27544506565
-
Reinforcement learning for RoboCup soccer keepaway
-
DOI 10.1177/105971230501300301
-
P. Stone, R. S. Sutton, and G. Kuhlmann. Reinforcement learning for RoboCup soccer keepaway. Adaptive Behavior, 13(3):165-188, September 2005b. (Pubitemid 41546119)
-
(2005)
Adaptive Behavior
, vol.13
, Issue.3
, pp. 165-188
-
-
Stone, P.1
Sutton, R.S.2
Kuhlmann, G.3
-
88
-
-
85156221438
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
The MIT Press
-
R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Neural Information Processing Systems (NIPS), pages 1038-1044.The MIT Press, 1996.
-
(1996)
Neural Information Processing Systems (NIPS)
, pp. 1038-1044
-
-
Sutton, R.S.1
-
90
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems (NIPS), 12(22):1057-1063, 2000.
-
(2000)
Advances in Neural Information Processing Systems (NIPS)
, vol.12
, Issue.22
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
91
-
-
71149099079
-
Fast gradient-descent methods for temporal-difference learning with linear function approximation
-
New York, NY, USA
-
R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E.Wiewiora. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In International Conference on Machine Learning (ICML), pages 993-1000, New York, NY, USA, 2009.
-
(2009)
International Conference on Machine Learning (ICML)
, pp. 993-1000
-
-
Sutton, R.S.1
Maei, H.R.2
Precup, D.3
Bhatnagar, S.4
Silver, D.5
Szepesvári, C.6
Wiewiora, E.7
-
93
-
-
77956520676
-
Model-based reinforcement learning with nearly tight exploration complexity bounds
-
I. Szita and C. Szepesvári. Model-based reinforcement learning with nearly tight exploration complexity bounds. In International Conference on Machine Learning (ICML), pages 1031-1038, 2010.
-
(2010)
International Conference on Machine Learning (ICML)
, pp. 1031-1038
-
-
Szita, I.1
Szepesvári, C.2
-
94
-
-
71149100225
-
Kernelized value function approximation for reinforcement learning
-
New York, NY, USA
-
G. Taylor and R. Parr. Kernelized value function approximation for reinforcement learning. In International Conference on Machine Learning (ICML), pages 1017- 1024, New York, NY, USA, 2009.
-
(2009)
International Conference on Machine Learning (ICML)
, pp. 1017-1024
-
-
Taylor, G.1
Parr, R.2
-
95
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
PII S0018928697034375
-
ACM. J. N. Tsitsiklis and B. V. Roy. An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674- 690, May 1997. (Pubitemid 127760263)
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
96
-
-
0033221519
-
Average cost temporal-difference learning
-
DOI 10.1016/S0005-1098(99)00099-0
-
J. N. Tsitsiklis and B. V. Roy. Average cost temporal-difference learning. Automatica, 35(11):1799 - 1808, 1999. (Pubitemid 32078092)
-
(1999)
Automatica
, vol.35
, Issue.11
, pp. 1799-1808
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
97
-
-
84880581275
-
Adaptive planning for markov decision processes with uncertain transition models via incremental feature dependency discovery
-
N. K. Ure, A. Geramifard, G. Chowdhary, and J. P. How. Adaptive Planning for Markov Decision Processes with Uncertain Transition Models via Incremental Feature Dependency Discovery. In European Conference on Machine Learning (ECML), 2012.
-
(2012)
European Conference on Machine Learning (ECML)
-
-
Ure, N.K.1
Geramifard, A.2
Chowdhary, G.3
How, P.J.4
-
99
-
-
34249833101
-
Q-learning
-
C. J. Watkins. Q-learning. Machine Learning, 8(3):279-292, 1992.
-
(1992)
Machine Learning
, vol.8
, Issue.3
, pp. 279-292
-
-
Watkins, C.J.1
-
102
-
-
79957667076
-
Introduction to the special issue on empirical evaluations in reinforcement learning
-
S. Whiteson and M. Littman. Introduction to the special issue on empirical evaluations in reinforcement learning. Machine Learning, pages 1-6, 2011.
-
(2011)
Machine Learning
, pp. 1-6
-
-
Whiteson, S.1
Littman, M.2
-
103
-
-
0001859165
-
Pattern-recognizing control systems
-
Washington DC
-
B. Widrow and F. Smith. Pattern-recognizing control systems. In Computer and Information Sciences: Collected Papers on Learning, Adaptation and Control in Information Systems, COINS symposium proceedings, volume 12, pages 288-317, Washington DC, 1964.
-
(1964)
Computer and Information Sciences: Collected Papers on Learning, Adaptation and Control in Information Systems, COINS Symposium Proceedings, Volume 12
, pp. 288-317
-
-
Widrow, B.1
Smith, F.2
-
104
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. In Machine Learning, pages 229-256, 1992.
-
(1992)
Machine Learning
, pp. 229-256
-
-
Williams, R.J.1
-
105
-
-
21844472209
-
Procedures as a representation for data in a computer program for understanding natural language
-
Massachusetts Institute Of Technology
-
T. Winograd. Procedures as a representation for data in a computer program for understanding natural language. Technical Report 235, Massachusetts Institute of Technology, 1971.
-
(1971)
Technical Report 235
-
-
Winograd, T.1
-
106
-
-
81855211901
-
The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate
-
Y. Ye. The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate. Math. Oper. Res., 36(4): 593-603, 2011.
-
(2011)
Math. Oper. Res.
, vol.36
, Issue.4
, pp. 593-603
-
-
Ye, Y.1
-
107
-
-
77953119098
-
Error bounds for approximations from projected linear equations
-
H. Yu and D. P. Bertsekas. Error bounds for approximations from projected linear equations. Math. Oper. Res., 35(2):306-329, 2010.
-
(2010)
Math. Oper. Res.
, vol.35
, Issue.2
, pp. 306-329
-
-
Yu, H.1
Bertsekas, D.P.2
|