-
1
-
-
0003778897
-
-
Springer-Verlag, New York
-
A. Benveniste, M. Metivier, and P. Priouret, Adaptive Algorithms and Stochastic Approximations, Springer-Verlag, New York, 1990.
-
(1990)
Adaptive Algorithms and Stochastic Approximations
-
-
Benveniste, A.1
Metivier, M.2
Priouret, P.3
-
2
-
-
4243567726
-
Temporal differences-based policy iteration and applications in neuro-dynamic programming
-
MIT, Cambridge, MA
-
D. P. Bertsekas and S. Ioffe, Temporal differences-based policy iteration and applications in neuro-dynamic programming, Lab. for Info, and Decision Systems Report LIDS-P-2349, MIT, Cambridge, MA, 1996.
-
(1996)
Lab. For Info, and Decision Systems Report LIDS-P-2349
-
-
Bertsekas, D.P.1
Ioffe, S.2
-
3
-
-
84980552700
-
-
2nd Edition, Athena Scientific, Belmont, MA
-
D. P. Bertsekas, Dynamic Programming and Optimal Control, 2nd Edition, Athena Scientific, Belmont, MA, 2001.
-
(2001)
Dynamic Programming and Optimal Control
-
-
Bertsekas, D.P.1
-
5
-
-
0034389611
-
Gradient convergence in gradient methods with errors
-
D. P. Bertsekas and J. N. Tsitsiklis, Gradient convergence in gradient methods with errors, SIAM Journal Optimization, vol. 10, pp. 627-642, 2000.
-
(2000)
SIAM Journal Optimization
, vol.10
, pp. 627-642
-
-
Bertsekas, D.P.1
Tsitsiklis, J.N.2
-
6
-
-
0036832950
-
Technical update: Least-squares temporal difference learning
-
J. A. Boyan, Technical update: Least-squares temporal difference learning, Machine Learning, vol. 49, pp. 1-15,2002.
-
(2002)
Machine Learning
, vol.49
, pp. 1-15
-
-
Boyan, J.A.1
-
7
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
S. J. Bradtke and A. G. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, vol. 22, pp. 33-57, 1996.
-
(1996)
Machine Learning
, vol.22
, pp. 33-57
-
-
Bradtke, S.J.1
Barto, A.G.2
-
8
-
-
0000430514
-
The convergence of TD(A) for general A
-
P. D. Dayan, The convergence of TD(A) for general A, Machine Learning, vol. 8, pp. 341-362,1992.
-
(1992)
Machine Learning
, vol.8
, pp. 341-362
-
-
Dayan, P.D.1
-
9
-
-
0034342516
-
On the existence of fixed points for approximate value iteration and temporal-difference learning
-
D. P. de Farias and B. Van Roy, On the existence of fixed points for approximate value iteration and temporal-difference learning, Journal of Optimization Theory and Applications, vol. 105,2000.
-
(2000)
Journal of Optimization Theory and Applications
, vol.105
-
-
De Farias, D.P.1
Van Roy, B.2
-
11
-
-
85036579695
-
The asymptotic mean squared error of temporal difference learning, Unpublished Report
-
MIT, Cambridge, MA
-
V. R. Konda and J. N. Tsitsiklis, The asymptotic mean squared error of temporal difference learning, Unpublished Report, Lab. for Information and Decision Systems, MIT, Cambridge, MA, 2003.
-
(2003)
Lab. For Information and Decision Systems
-
-
Konda, V.R.1
Tsitsiklis, J.N.2
-
12
-
-
0042758707
-
-
Ph.D. Thesis, Dept, of Electrical Engineering and Computer Science, MIT, Cambridge, MA
-
V. R. Konda, Actor-Critic Algorithms, Ph.D. Thesis, Dept, of Electrical Engineering and Computer Science, MIT, Cambridge, MA, 2002.
-
(2002)
Actor-Critic Algorithms
-
-
Konda, V.R.1
-
14
-
-
0003276733
-
Mean-field analysis for batched TD(A)
-
F. Pineda, Mean-field analysis for batched TD(A), Neural Computation, pp.1403-1419,1997.
-
(1997)
Neural Computation
, pp. 1403-1419
-
-
Pineda, F.1
-
17
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
R. S. Sutton, Learning to predict by the methods of temporal differences, Machine Learning, vol. 3, pp. 9-44,1988.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
18
-
-
0003787427
-
-
Ph.D. Thesis, MIT, Cambridge, MA
-
B. Van Roy, Learning and Value Function Approximation in Complex Decision Processes, Ph.D. Thesis, MIT, Cambridge, MA, 1998.
-
(1998)
Learning and Value Function Approximation in Complex Decision Processes
-
-
Van Roy, B.1
-
19
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
J. N. Tsitsiklis and B. Van Roy, An analysis of temporal-difference learning with function approximation, IEEE Trans, on Automatic Control, vol. 42, pp. 674-690,1997.
-
(1997)
IEEE Trans, on Automatic Control
, vol.42
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
|