-
3
-
-
0036832950
-
Technical update: Least-squares temporal difference learning
-
J. Boyan, "Technical update: Least-squares temporal difference learning," Machine Learning, vol. 49, pp. 233-246, 2002.
-
(2002)
Machine Learning
, vol.49
, pp. 233-246
-
-
Boyan, J.1
-
4
-
-
0037288398
-
Least-squares policy evaluation algorithms with linear function approximation
-
A. Nedić and D. P. Bertsekas, "Least-squares policy evaluation algorithms with linear function approximation," Discrete Event Dynamic Systems: Theory and Applications, vol. 13, no. 1-2, pp. 79-110, 2003.
-
(2003)
Discrete Event Dynamic Systems: Theory and Applications
, vol.13
, Issue.1-2
, pp. 79-110
-
-
Nedić, A.1
Bertsekas, D.P.2
-
6
-
-
21844465127
-
Tree-based batch mode reinforcement learning
-
D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," Journal ofMachine Learning Research, vol. 6, pp. 503-556, 2005.
-
(2005)
Journal OfMachine Learning Research
, vol.6
, pp. 503-556
-
-
Ernst, D.1
Geurts, P.2
Wehenkel, L.3
-
7
-
-
0042758707
-
-
Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, US
-
V. Konda, "Actor-critic algorithms," Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, US, 2002.
-
(2002)
Actor-critic Algorithms
-
-
Konda, V.1
-
8
-
-
67949109470
-
Convergence results for some temporal difference methods based on least squares
-
H. Yu and D. P. Bertsekas, "Convergence results for some temporal difference methods based on least squares," IEEE Transactions on Automatic Control, vol. 54, no. 7, pp. 1515-1531, 2009.
-
(2009)
IEEE Transactions on Automatic Control
, vol.54
, Issue.7
, pp. 1515-1531
-
-
Yu, H.1
Bertsekas, D.P.2
-
9
-
-
4043069840
-
On actor-critic algorithms
-
V. R. Konda and J. N. Tsitsiklis, "On actor-critic algorithms," SIAM Journal on Control and Optimization, vol. 42, no. 4, pp. 1143-1166, 2003.
-
(2003)
SIAM Journal on Control and Optimization
, vol.42
, Issue.4
, pp. 1143-1166
-
-
Konda, V.R.1
Tsitsiklis, J.N.2
-
10
-
-
34548765672
-
Kernelizing LSPE(λ)
-
Honolulu, US, 1-5 April
-
T. Jung and D. Polani, "Kernelizing LSPE(λ)," in Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL-07), Honolulu, US, 1-5 April 2007, pp. 338-345.
-
(2007)
Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL-07)
, pp. 338-345
-
-
Jung, T.1
Polani, D.2
-
11
-
-
84899834143
-
Online exploration in least-squares policy iteration
-
Budapest, Hungary, 10-15 May
-
L. Li, M. L. Littman, and C. R. Mansley, "Online exploration in least-squares policy iteration," in Proceedings 8th International Joint Conference on Autonomous Agents andMultiagent Systems (AAMAS- 09), vol. 2, Budapest, Hungary, 10-15 May 2009, pp. 733-739.
-
(2009)
Proceedings 8th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS- 09)
, vol.2
, pp. 733-739
-
-
Li, L.1
Littman, M.L.2
Mansley, C.R.3
-
12
-
-
4243567726
-
Temporal differences-based policy iteration and applications in neuro-dynamic programming
-
US, available at
-
D. P. Bertsekas and S. Ioffe, "Temporal differences-based policy iteration and applications in neuro-dynamic programming," Massachusetts Institute of Technology, Cambridge, US, Tech. Rep. LIDS-P-2349, 1996, available at http://web.mit.edu/dimitrib/www/Tempdif.pdf.
-
(1996)
Massachusetts Institute of Technology, Cambridge, Tech. Rep. LIDS-P-2349
-
-
Bertsekas, D.P.1
Ioffe, S.2
-
13
-
-
33847202724
-
Learning to predict by the method of temporal differences
-
R. S. Sutton, "Learning to predict by the method of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
14
-
-
0033901602
-
Convergence results for single-step on-policy reinforcement-learning algorithms
-
S. Singh, T. Jaakkola, M. L. Littman, and Cs. Szepesvári, "Convergence results for single-step on-policy reinforcement-learning algorithms," Machine Learning, vol. 38, no. 3, pp. 287-308, 2000.
-
(2000)
Machine Learning
, vol.38
, Issue.3
, pp. 287-308
-
-
Singh, S.1
Jaakkola, T.2
Littman, M.L.3
Szepesvári, Cs.4
|