-
1
-
-
17444414191
-
Basis function adaptation in temporal difference reinforcement learning
-
DOI 10.1007/s10479-005-5732-z
-
I. Menache, S. Mannor, and N. Shimkin, "Basis function adaptation in temporal difference reinforcement learning," Ann. Oper. Res., Vol. 134, no. 1, pp. 215-238, 2005. (Pubitemid 40550047)
-
(2005)
Annals of Operations Research
, vol.134
, Issue.1
, pp. 215-238
-
-
Menache, I.1
Mannor, S.2
Shimkin, N.3
-
2
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, Vol. 3, pp. 9-44, 1988.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
6
-
-
4043069840
-
Actor-critic algorithms
-
V. R. Konda and J. N. Tsitsiklis, "Actor-critic algorithms," SIAM J. Control Optim., Vol. 42, no. 4, pp. 1143-1166, 2003.
-
(2003)
SIAM J. Control Optim.
, vol.42
, Issue.4
, pp. 1143-1166
-
-
Konda, V.R.1
Tsitsiklis, J.N.2
-
9
-
-
28544451799
-
Stochastic approximation with 'controlled Markov' noise
-
V. S. Borkar, "Stochastic approximation with 'controlled Markov' noise," Systems Control Lett., Vol. 55, pp. 139-145, 2006.
-
(2006)
Systems Control Lett.
, vol.55
, pp. 139-145
-
-
Borkar, V.S.1
-
11
-
-
67650362344
-
Projected equation methods for approximate solution of large linear systems
-
to appear
-
D. P. Bertsekas and H. Yu, "Projected equation methods for approximate solution of large linear systems," J. Comput. Sci. Appl. Math., 2008, to appear.
-
(2008)
J. Comput. Sci. Appl. Math.
-
-
Bertsekas, D.P.1
Yu, H.2
-
12
-
-
0033351917
-
Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
-
DOI 10.1109/9.793723
-
J. N. Tsitsiklis and B. Van Roy, "Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives," IEEE Trans. Automat. Contr., Vol. 44, pp. 1840-1851, 1999. (Pubitemid 30546876)
-
(1999)
IEEE Transactions on Automatic Control
, vol.44
, Issue.10
, pp. 1840-1851
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
13
-
-
33646435300
-
A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
-
D. S. Choi and B. Van Roy, "A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning," Discrete Event Dyn. Syst., Vol. 16, no. 2, pp. 207-239, 2006.
-
(2006)
Discrete Event Dyn. Syst.
, vol.16
, Issue.2
, pp. 207-239
-
-
Choi, D.S.1
Van Roy, B.2
-
14
-
-
58849124361
-
A least squares Q-learning algorithm for optimal stopping problems
-
H. Yu and D. P. Bertsekas, "A least squares Q-learning algorithm for optimal stopping problems," MIT, LIDS Tech. Report 2731, 2006.
-
(2006)
MIT, LIDS Tech. Report
, vol.2731
-
-
Yu, H.1
Bertsekas, D.P.2
-
16
-
-
0000516813
-
An implicit-function theorem for a class of nonsmooth functions
-
S. M. Robinson, "An implicit-function theorem for a class of nonsmooth functions," Math. Oper. Res., Vol. 16, no. 2, pp. 292-309, 1991.
-
(1991)
Math. Oper. Res.
, vol.16
, Issue.2
, pp. 292-309
-
-
Robinson, S.M.1
-
17
-
-
46749106339
-
Robinson's implicit function theorem and its extensions
-
A. L. Dontchev and R. T. Rockafellar, "Robinson's implicit function theorem and its extensions," Math. Program. Ser. B, Vol. 117, no. 1, pp. 129-147, 2008.
-
(2008)
Math. Program. Ser. B
, vol.117
, Issue.1
, pp. 129-147
-
-
Dontchev, A.L.1
Rockafellar, R.T.2
|