-
1
-
-
33746032553
-
Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path
-
Learning Theory - 19th Annual Conference on Learning Theory, COLT 2006, Proceedings
-
A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. In COLT-19, pages 574-588, 2006. (Pubitemid 44072220)
-
(2006)
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
, vol.4005
, pp. 574-588
-
-
Antos, A.1
Szepesvari, C.2
Munos, R.3
-
2
-
-
85162071116
-
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
-
accepted
-
A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 2007. (accepted).
-
(2007)
Machine Learning
-
-
Antos, A.1
Szepesvári, Cs.2
Munos, R.3
-
3
-
-
34548752490
-
Value-iteration based fitted policy iteration: Learning with a single trajectory
-
DOI 10.1109/ADPRL.2007.368207, 4220852, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
-
A. Antos, Cs. Szepesvári, and R. Munos. Value-iteration based fitted policy iteration: learning with a single trajectory. In IEEE ADPRL, pages 330-337, 2007. (Pubitemid 47431404)
-
(2007)
Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
, pp. 330-337
-
-
Antos, A.1
Szepesvari, C.2
Munos, R.3
-
8
-
-
85153940465
-
Generalization in reinforcement learning: Safely approximating the value function
-
J.A. Boyan and A.W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In NIPS-7, pages 369-376, 1995.
-
(1995)
NIPS-7
, pp. 369-376
-
-
Boyan, J.A.1
Moore, A.W.2
-
9
-
-
0030165580
-
Fat-shattering and the learnability of real-valued functions
-
DOI 10.1006/jcss.1996.0033
-
P.L. Bartlett, P.M. Long, and R.C.Williamson. Fat-shattering and the learnability of real-valued functions. Journal of Computer and System Sciences, 52:434-452, 1996. (Pubitemid 126359770)
-
(1996)
Journal of Computer and System Sciences
, vol.52
, Issue.3
, pp. 434-452
-
-
Bartlett, P.L.1
Long, P.M.2
Williamson, R.C.3
-
11
-
-
40849114100
-
Finite time bounds for sampling based fitted value iteration
-
Research Institute of the Hungarian Academy of Sciences, Kende u, Budapest 1111, Hungary
-
R. Munos and Cs. Szepesvári. Finite time bounds for sampling based fitted value iteration. Technical report, Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, Budapest 1111, Hungary, 2006.
-
(2006)
Technical Report, Computer and Automation
, pp. 13-17
-
-
Munos, R.1
Szepesvári, Cs.2
-
13
-
-
77955430645
-
Sample complexity of policy search with known dynamics
-
MIT Press
-
P.L. Bartlett and A. Tewari. Sample complexity of policy search with known dynamics. In NIPS-19. MIT Press, 2007.
-
(2007)
NIPS-19
-
-
Bartlett, P.L.1
Tewari, A.2
-
15
-
-
33646398129
-
Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method
-
M. Riedmiller. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In 16th European Conference on Machine Learning, pages 317-328, 2005.
-
(2005)
16th European Conference on Machine Learning
, pp. 317-328
-
-
Riedmiller, M.1
-
16
-
-
60349130974
-
Batch reinforcement learning in a complex domain
-
S. Kalyanakrishnan and P. Stone. Batch reinforcement learning in a complex domain. In AAMAS-07, 2007.
-
(2007)
AAMAS-07
-
-
Kalyanakrishnan, S.1
Stone, P.2
|