-
1
-
-
40849145988
-
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
-
A. Antos, Cs. Szepesvari, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning Journal, 71:89-129, 2008.
-
(2008)
Machine Learning Journal
, vol.71
, pp. 89-129
-
-
Antos, A.1
Szepesvari, Cs.2
Munos, R.3
-
3
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
S. Bradtke and A. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22:33-57, 1996.
-
(1996)
Machine Learning
, vol.22
, pp. 33-57
-
-
Bradtke, S.1
Barto, A.2
-
4
-
-
70049096468
-
Regularized policy iteration
-
MIT Press
-
A. M. Farahmand, M. Ghavamzadeh, Cs. Szepesvári, and S. Mannor. Regularized policy iteration. In Proceedings of Advances in Neural Information Processing Systems 21, pages 441-448. MIT Press, 2008.
-
(2008)
Proceedings of Advances in Neural Information Processing Systems
, vol.21
, pp. 441-448
-
-
Farahmand, A.M.1
Ghavamzadeh, M.2
Szepesvári, Cs.3
Mannor, S.4
-
5
-
-
70449644892
-
Regularized fitted Qiteration for planning in continuous-space Markovian decision problems
-
A. M. Farahmand, M. Ghavamzadeh, Cs. Szepesvári, and S. Mannor. Regularized fitted Qiteration for planning in continuous-space Markovian decision problems. In Proceedings of the American Control Conference, pages 725-730, 2009.
-
(2009)
Proceedings of the American Control Conference
, pp. 725-730
-
-
Farahmand, A.M.1
Ghavamzadeh, M.2
Szepesvári, Cs.3
Mannor, S.4
-
6
-
-
85161982279
-
-
Technical Report inria-00530762, INRIA
-
M. Ghavamzadeh, A. Lazaric, O. Maillard, and R. Munos. LSPI with random projections. Technical Report inria-00530762, INRIA, 2010.
-
(2010)
LSPI with Random Projections
-
-
Ghavamzadeh, M.1
Lazaric, A.2
Maillard, O.3
Munos, R.4
-
16
-
-
17444414191
-
Basis function adaptation in temporal difference reinforcement learning
-
I. Menache, S. Mannor, and N. Shimkin. Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134:215-238, 2005.
-
(2005)
Annals of Operations Research
, vol.134
, pp. 215-238
-
-
Menache, I.1
Mannor, S.2
Shimkin, N.3
-
17
-
-
34547982545
-
Analyzing feature generation for valuefunction approximation
-
R. Parr, C. Painter-Wakefield, L. Li, and M. Littman. Analyzing feature generation for valuefunction approximation. In Proceedings of the Twenty-Fourth International Conference on Machine Learning, pages 737-744, 2007.
-
(2007)
Proceedings of the Twenty-Fourth International Conference on Machine Learning
, pp. 737-744
-
-
Parr, R.1
Painter-Wakefield, C.2
Li, L.3
Littman, M.4
-
18
-
-
77956538796
-
Feature selection using regularization in approximate linear programs for Markov decision processes
-
M. Petrik, G. Taylor, R. Parr, and S. Zilberstein. Feature selection using regularization in approximate linear programs for Markov decision processes. In Proceedings of the Twenty- Seventh International Conference on Machine Learning, pages 871-878, 2010.
-
(2010)
Proceedings of the Twenty- Seventh International Conference on Machine Learning
, pp. 871-878
-
-
Petrik, M.1
Taylor, G.2
Parr, R.3
Zilberstein, S.4
|