-
1
-
-
84897041261
-
Learning near-optimal policies with Bellmanresidual minimization based fitted policy iteration and a single sample path
-
A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellmanresidual minimization based fitted policy iteration and a single sample path. In Conference on Learning Theory (COLT), 2006.
-
(2006)
Conference on Learning Theory (COLT)
-
-
Antos, A.1
Szepesvári, C.2
Munos, R.3
-
4
-
-
4243567726
-
Temporal differences-based policy iteration and applications in neuro-dynamic programming
-
D.P. Bertsekas and S. Ioffe. Temporal differences-based policy iteration and applications in neuro-dynamic programming. Technical report, MIT, 1996.
-
(1996)
Technical Report MIT
-
-
Bertsekas, D.P.1
Ioffe, S.2
-
6
-
-
61849106433
-
Projected equation methods for approximate solution of large linear systems
-
D.P. Bertsekas and H. Yu. Projected equation methods for approximate solution of large linear systems. Journal of Computational and Applied Mathematics, 227:27-50, 2009.
-
(2009)
Journal of Computational and Applied Mathematics
, vol.227
, pp. 27-50
-
-
Bertsekas, D.P.1
Yu, H.2
-
7
-
-
84882309474
-
The tradeoffs of large scale learning
-
S. Sra, S. Nowozin, and S.J. Wright, editors, MIT Press
-
L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In S. Sra, S. Nowozin, and S.J. Wright, editors, Optimization for Machine Learning, pages 351-368. MIT Press, 2011.
-
(2011)
Optimization for Machine Learning
, pp. 351-368
-
-
Bottou, L.1
Bousquet, O.2
-
8
-
-
0036832950
-
Technical update: Least-squares temporal difference learning
-
DOI 10.1023/A:1017936530646
-
J.A. Boyan. Technical update: Least-squares temporal difference learning. Machine Learning, 49 (2-3):233-246, 1999. (Pubitemid 34325688)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 233-246
-
-
Boyan, J.A.1
-
9
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
S.J. Bradtke and A.G. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22 (1-3):33-57, 1996. (Pubitemid 126724362)
-
(1996)
Machine Learning
, vol.22
, Issue.1-3
, pp. 33-57
-
-
Bradtke, S.J.1
-
10
-
-
33646435300
-
A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
-
D. Choi and B. Van Roy. A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning. Discrete Event Dynamic Systems, 16:207-239, 2006.
-
(2006)
Discrete Event Dynamic Systems
, vol.16
, pp. 207-239
-
-
Choi, D.1
Van Roy, B.2
-
18
-
-
77954101982
-
GQ (-): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
-
H.R. Maei and R.S. Sutton. GQ (-): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In Conference on Artificial General Intelligence (AGI), 2010.
-
(2010)
Conference on Artificial General Intelligence (AGI)
-
-
Maei, H.R.1
Sutton, R.S.2
-
20
-
-
0037288398
-
Least squares policy evaluation algorithms with linear function approximation
-
A. Nedic and D.P. Bertsekas. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, 13:79-110, 2003.
-
(2003)
Discrete Event Dynamic Systems
, vol.13
, pp. 79-110
-
-
Nedic, A.1
Bertsekas, D.P.2
-
23
-
-
3042683131
-
Combining importance sampling and temporal difference control variates to simulate Markov chains
-
R.S. Randhawa and S. Juneja. Combining importance sampling and temporal difference control variates to simulate Markov chains. ACM Transactions on Modeling and Computer Simulation, 14 (1):1-30, 2004.
-
(2004)
ACM Transactions on Modeling and Computer Simulation
, vol.14
, Issue.1
, pp. 1-30
-
-
Randhawa, R.S.1
Juneja, S.2
-
25
-
-
77956551905
-
Should one compute the temporal difference fix point or minimize the Bellman residual? the unified oblique projection view
-
B. Scherrer. Should one compute the temporal difference fix point or minimize the Bellman residual? The unified oblique projection view. In International Conference on Machine Learning (ICML), 2010.
-
(2010)
International Conference on Machine Learning (ICML)
-
-
Scherrer, B.1
-
29
-
-
71149099079
-
Fast gradient-descent methods for temporal-difference learning with linear function approximation
-
R.S. Sutton, H.R. Maei, D. Precup, S. Bhatnagar, D. Silver, Cs. Szepesvári, and E. Wiewiora. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In International Conference on Machine Learning (ICML), 2009.
-
(2009)
International Conference on Machine Learning (ICML)
-
-
Sutton, R.S.1
Maei, H.R.2
Precup, D.3
Bhatnagar, S.4
Silver, D.5
Szepesvári, C.6
Wiewiora, E.7
-
30
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
PII S0018928697034375
-
J.N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42 (5):674-690, 1997. (Pubitemid 127760263)
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
31
-
-
77956517288
-
Convergence of least-squares temporal difference methods under general conditions
-
H. Yu. Convergence of least-squares temporal difference methods under general conditions. In International Conference on Machine Learning (ICML), 2010a.
-
(2010)
International Conference on Machine Learning (ICML)
-
-
Yu, H.1
-
32
-
-
79960454066
-
Least squares temporal difference methods: An analysis under general condtions
-
University of Helsinki, September
-
H. Yu. Least squares temporal difference methods: An analysis under general condtions. Technical Report C-2010-39, University of Helsinki, September 2010b.
-
(2010)
Technical Report C-2010-2039
-
-
Yu, H.1
-
33
-
-
58449131194
-
New error bounds for approximations from projected linear equations
-
Dept. Computer Science, Univ. of Helsinki, July
-
H. Yu and D.P. Bertsekas. New error bounds for approximations from projected linear equations. Technical Report C-2008-43, Dept. Computer Science, Univ. of Helsinki, July 2008.
-
(2008)
Technical Report C-2008-2043
-
-
Yu, H.1
Bertsekas, D.P.2
|