-
2
-
-
84963721095
-
Speedy q-learning
-
MIT Press
-
M. Gheshlaghi Azar, R. Munos, M. Ghavamzadeh, and H. J. Kappen. Speedy Q-learning. In Advances in Neural Information Processing Systems 24, pages 2411-2419. MIT Press, 2012.
-
(2012)
Advances in Neural Information Processing Systems
, vol.2
, pp. 2411-2419
-
-
Gheshlaghi Azar, M.1
Munos, R.2
Ghavamzadeh, M.3
Kappen, H.J.4
-
5
-
-
0020970738
-
Neuronlike adaptive elements that can solve difficult learning control problems
-
A. G. Barto, R. S. Sutton, and C.W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. Systems, Man and Cybernetics, IEEE Transactions on, SMC-13(5): 834-846, 1983. (Pubitemid 14138646)
-
(1983)
IEEE Transactions on Systems, Man and Cybernetics
, vol.13
, Issue.5
, pp. 834-846
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
9
-
-
70349984547
-
Natural actor-critic algorithms
-
S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee. Natural actor-critic algorithms. Automatica, 45(11):2471-2482, 2009.
-
(2009)
Automatica
, vol.45
, Issue.11
, pp. 2471-2482
-
-
Bhatnagar, S.1
Sutton, R.S.2
Ghavamzadeh, M.3
Lee, M.4
-
12
-
-
0034342516
-
On the existence of fixed points for approximate value iteration and temporal-difference learning
-
D. P. de Farias and B. Van Roy. On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Optimization Theory and Applications, 105(3):589-608, 2000.
-
(2000)
Journal of Optimization Theory and Applications
, vol.105
, Issue.3
, pp. 589-608
-
-
De Farias, D.P.1
Van Roy, B.2
-
15
-
-
58449110583
-
Regularized fitted q-iteration: Application to planning
-
Springer
-
A. Farahmand, M. Ghavamzadeh, Cs. Szepesv́ari, and S. Mannor. Regularized fitted Q-iteration: Application to planning. In European Workshop on Reinforcement Learning, Lecture Notes in Computer Science, pages 55-68. Springer, 2008.
-
(2008)
European Workshop on Reinforcement Learning, Lecture Notes in Computer Science
, pp. 55-68
-
-
Farahmand, A.1
Ghavamzadeh, M.2
Szepesv́ari, Cs.3
Mannor, S.4
-
16
-
-
70049096468
-
Regularized policy iteration
-
Curran Associates, Inc
-
A. Farahmand, M. Ghavamzadeh, Cs. Szepesv'ari, and S. Mannor. Regularized policy iteration. In Advances in Neural Information Processing Systems 21, pages 441-448. Curran Associates, Inc., 2009.
-
(2009)
Advances in Neural Information Processing Systems
, vol.2
, pp. 441-448
-
-
Farahmand, A.1
Ghavamzadeh, M.2
Szepesv'ari, Cs.3
Mannor, S.4
-
18
-
-
0000121609
-
The law of large numbers and the central limit theorem in banach spaces
-
J. Hoffmann-Jørgensen and G. Pisier. The law of large numbers and the central limit theorem in banach spaces. The Annals of Probability, 4(4):587-599, 1976.
-
(1976)
The Annals of Probability
, vol.4
, Issue.4
, pp. 587-599
-
-
Hoffmann-Jørgensen, J.1
Pisier, G.2
-
19
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming
-
T. Jaakkola, M. I. Jordan, and S. Singh. On the convergence of stochastic iterative dynamic programming. Neural Computation, 6(6):1185-1201, 1994.
-
(1994)
Neural Computation
, vol.6
, Issue.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.3
-
22
-
-
29044440299
-
Path integrals and symmetry breaking for optimal control theory
-
H. J. Kappen. Path integrals and symmetry breaking for optimal control theory. Statistical Mechanics, 2005(11):P11011, 2005.
-
(2005)
Statistical Mechanics
, vol.11
-
-
Kappen, H.J.1
-
23
-
-
84899026236
-
Finite-sample convergence rates for q-learning and indirect algorithms
-
MIT Press
-
M. Kearns and S. Singh. Finite-sample convergence rates for Q-learning and indirect algorithms. In Advances in Neural Information Processing Systems 12, pages 996-1002. MIT Press, 1999.
-
(1999)
Advances in Neural Information Processing Systems
, vol.12
, pp. 996-1002
-
-
Kearns, M.1
Singh, S.2
-
29
-
-
77956541799
-
Toward off-policy learning control with function approximation
-
Omnipress
-
H. Maei, Cs. Szepesv'ari, S. Bhatnagar, and R. S. Sutton. Toward off-policy learning control with function approximation. In Proceedings of the 27th International Conference on Machine Learning, pages 719-726. Omnipress, 2010.
-
(2010)
Proceedings of the 27th International Conference on Machine Learning
, pp. 719-726
-
-
Maei, H.1
Szepesvari, Cs.2
Bhatnagar, S.3
Sutton, R.S.4
-
31
-
-
29344453913
-
Error bounds for approximate value iteration
-
Proceedings of the 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05
-
R. Munos. Error bounds for approximate value iteration. In Proceedings of the 20th national conference on Artificial intelligence - Volume 2, pages 1006-1011. AAAI Press, 2005. (Pubitemid 43006738)
-
(2005)
Proceedings of the National Conference on Artificial Intelligence
, vol.2
, pp. 1006-1011
-
-
Munos, R.1
-
34
-
-
40649106649
-
Natural actor-critic
-
J. Peters and S. Schaal. Natural actor-critic. Neurocomputing, 71(7-9):1180-1190, 2008.
-
(2008)
Neurocomputing
, vol.71
, Issue.7-9
, pp. 1180-1190
-
-
Peters, J.1
Schaal, S.2
-
36
-
-
0033901602
-
Convergence results for single-step on-policy reinforcement-learning algorithms
-
DOI 10.1023/A:1007678930559
-
S. Singh, T. Jaakkola, M.L. Littman, and Cs. Szepesv́ari. Convergence results for single-step onpolicy reinforcement-learning algorithms. Machine Learning, 38(3):287-308, 2000. (Pubitemid 30572449)
-
(2000)
Machine Learning
, vol.38
, Issue.3
, pp. 287-308
-
-
Singh, S.1
Jaakkola, T.2
Littman, M.L.3
Szepesvari, C.4
-
37
-
-
84865114997
-
An information-theoretic approach to curiosity-driven reinforcement learning
-
S. Still and D. Precup. An information-theoretic approach to curiosity-driven reinforcement learning. Theory in Biosciences, 131(3):139-148, 2012.
-
(2012)
Theory in Biosciences
, vol.131
, Issue.3
, pp. 139-148
-
-
Still, S.1
Precup, D.2
-
40
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
MIT Press
-
R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057-1063. MIT Press, 2000.
-
(2000)
Advances in Neural Information Processing Systems
, vol.12
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
46
-
-
84864055301
-
Linearly-solvable markov decision problems
-
MIT Press
-
E. Todorov. Linearly-solvable Markov decision problems. In Advances in Neural Information Processing Systems 19, pages 1369-1376. MIT Press, 2007.
-
(2007)
Advances in Neural Information Processing Systems
, vol.19
, pp. 1369-1376
-
-
Todorov, E.1
-
48
-
-
34548784027
-
Dual representations for dynamic programming and reinforcement learning
-
DOI 10.1109/ADPRL.2007.368168, 4220813, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
-
T. Wang, M. Bowling, and D. Schuurmans. Dual representations for dynamic programming and reinforcement learning. In IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages 44-51. IEEE Press, 2007. (Pubitemid 47431365)
-
(2007)
Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
, pp. 44-51
-
-
Wang, T.1
Bowlingm, M.2
Schuurmans, D.3
-
49
-
-
85161971158
-
Stable dual dynamic programming
-
MIT Press
-
T. Wang, D. Lizotte, M. Bowling, and D. Schuurmans. Stable dual dynamic programming. In Advances in Neural Information Processing Systems 20, pages 1569-1576. MIT Press, 2008.
-
(2008)
Advances in Neural Information Processing Systems
, vol.20
, pp. 1569-1576
-
-
Wang, T.1
Lizotte, D.2
Bowling, M.3
Schuurmans, D.4
|