-
1
-
-
4043069840
-
Actor-critic algorithms
-
V. R. Konda and J. N. Tsitsiklis, "Actor-critic algorithms," SIAM J. Control Optim., vol. 42, no. 4, pp. 1143-1166, 2003.
-
(2003)
SIAM J. Control Optim
, vol.42
, Issue.4
, pp. 1143-1166
-
-
Konda, V.R.1
Tsitsiklis, J.N.2
-
2
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
3
-
-
0000430514
-
The convergence of TD(λ) for general λ
-
P. D. Dayan, "The convergence of TD(λ) for general λ," Machine Learning, vol. 8, pp. 341-362, 1992.
-
(1992)
Machine Learning
, vol.8
, pp. 341-362
-
-
Dayan, P.D.1
-
4
-
-
0003786198
-
-
Princeton, NJ
-
L. Gurvits, L. J. Lin, and S. J. Hanson, Incremental Learning of Evaluation Functions for Absorbing Markov Chains: NewMethods and Theorems Siemans Corporate Research, Princeton, NJ, 1994.
-
(1994)
Incremental Learning of Evaluation Functions for Absorbing Markov Chains: NewMethods and Theorems Siemans Corporate Research
-
-
Gurvits, L.1
Lin, L.J.2
Hanson, S.J.3
-
5
-
-
0003276733
-
Mean-field analysis for batched TD(λ)
-
F. Pineda, "Mean-field analysis for batched TD(λ)," Neural Computation, pp. 1403-1419, 1997.
-
(1997)
Neural Computation
, pp. 1403-1419
-
-
Pineda, F.1
-
6
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
May
-
J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE Trans. Automat. Control vol. 42, no. 5, pp. 674-690, May 1997.
-
(1997)
IEEE Trans. Automat. Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
7
-
-
0033221519
-
Average cost temporal-difference learning
-
J. N. Tsitsiklis and B. Van Roy, "Average cost temporal-difference learning," Automatica, vol. 35, no. 11, pp. 1799-1808, 1999.
-
(1999)
Automatica
, vol.35
, Issue.11
, pp. 1799-1808
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
8
-
-
0036832957
-
On average versus discounted reward temporal-difference learning
-
J. N. Tsitsiklis and B. Van Roy, "On average versus discounted reward temporal-difference learning," Machine Learning, vol. 49, pp. 179-191, 2002.
-
(2002)
Machine Learning
, vol.49
, pp. 179-191
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
9
-
-
0035249254
-
Simulation-based optimization of Markov reward processes
-
Feb
-
P. Marbach and J. N. Tsitsiklis, "Simulation-based optimization of Markov reward processes," IEEE Trans. Automat. Control, vol. 46, no. 2, pp. 191-209, Feb. 2001.
-
(2001)
IEEE Trans. Automat. Control
, vol.46
, Issue.2
, pp. 191-209
-
-
Marbach, P.1
Tsitsiklis, J.N.2
-
10
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
S. J. Bradtke and A. G. Barto, "Linear least-squares algorithms for temporal difference learning," Machine Learning, vol. 22, no. 2, pp. 33-57, 1996.
-
(1996)
Machine Learning
, vol.22
, Issue.2
, pp. 33-57
-
-
Bradtke, S.J.1
Barto, A.G.2
-
12
-
-
67949102334
-
-
D. P. Bertsekas and S. Ioffe, Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming MIT, Cambridge, MA, LIDS Tech. Rep. LIDS-P-2349, 1996.
-
D. P. Bertsekas and S. Ioffe, Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming MIT, Cambridge, MA, LIDS Tech. Rep. LIDS-P-2349, 1996.
-
-
-
-
13
-
-
0042758707
-
-
Ph.D. dissertation, Dept. Comput. Sci. Elect. Eng, MIT, Cambridge, MA
-
V. R. Konda, "Actor-critic algorithms," Ph.D. dissertation, Dept. Comput. Sci. Elect. Eng., MIT, Cambridge, MA, 2002.
-
(2002)
Actor-critic algorithms
-
-
Konda, V.R.1
-
17
-
-
85036496976
-
-
D. P. Bertsekas, V. S. Borkar, and A. Nedic', Improved Temporal Difference Methods With Linear Function Approximation, in Learning and Approximate Dynamic Programming, A. Barto, W. Powell, and J. Si, Eds. New York: IEEE Press, 2004, LIDS Tech. Rep. 2573, 2003.
-
D. P. Bertsekas, V. S. Borkar, and A. Nedic', "Improved Temporal Difference Methods With Linear Function Approximation," in Learning and Approximate Dynamic Programming, A. Barto, W. Powell, and J. Si, Eds. New York: IEEE Press, 2004, LIDS Tech. Rep. 2573, 2003.
-
-
-
-
18
-
-
80053276028
-
A function approximation approach to estimation of policy gradient for POMDP with structured polices
-
H.Yu, "A function approximation approach to estimation of policy gradient for POMDP with structured polices," in Proc. 21st Conf. Uncertainty Artif. Intell., 2005, pp. 642-657.
-
(2005)
Proc. 21st Conf. Uncertainty Artif. Intell
, pp. 642-657
-
-
Yu, H.1
-
20
-
-
58449131194
-
New Error Bounds for Approximations From Projected Linear Equations Univ. Helsinki, Helsinki, Finland
-
Tech. Rep. C-2008-43
-
H. Yu and D. P. Bertsekas, New Error Bounds for Approximations From Projected Linear Equations Univ. Helsinki, Helsinki, Finland, Tech. Rep. C-2008-43, 2008.
-
(2008)
-
-
Yu, H.1
Bertsekas, D.P.2
-
24
-
-
0034342516
-
On the existence of fixed points for approximate value iteration and temporal-difference learning
-
D. P. de Farias and B. Van Roy, "On the existence of fixed points for approximate value iteration and temporal-difference learning," J. Optim. Theory Appl., vol. 105, no. 3, pp. 589-608, 2000.
-
(2000)
J. Optim. Theory Appl
, vol.105
, Issue.3
, pp. 589-608
-
-
de Farias, D.P.1
Van Roy, B.2
-
25
-
-
0033351917
-
Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives
-
Oct
-
J. N. Tsitsiklis and B. Van Roy, "Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives," IEEE Trans. Automat. Control, vol. 44, no. 10, pp. 1840-1851, Oct. 1999.
-
(1999)
IEEE Trans. Automat. Control
, vol.44
, Issue.10
, pp. 1840-1851
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
26
-
-
67949097658
-
-
H. Yu and D. P. Bertsekas, A Least Squares Q-Learning Algorithm for Optimal Stopping Problems MIT, Cambridge, MA, LIDS Tech. Rep. 2731, 2006.
-
H. Yu and D. P. Bertsekas, A Least Squares Q-Learning Algorithm for Optimal Stopping Problems MIT, Cambridge, MA, LIDS Tech. Rep. 2731, 2006.
-
-
-
-
27
-
-
84927748655
-
Q-learning algorithms for optimal stopping based on least squares
-
H. Yu and D. P. Bertsekas, "Q-learning algorithms for optimal stopping based on least squares," in Proc. Eur. Control Conf., 2007, pp. 2368-2375.
-
(2007)
Proc. Eur. Control Conf
, pp. 2368-2375
-
-
Yu, H.1
Bertsekas, D.P.2
-
28
-
-
28544451799
-
Stochastic approximation with 'controlled Markov' noise
-
V. S. Borkar, "Stochastic approximation with 'controlled Markov' noise," Syst. Control Lett., vol. 55, pp. 139-145, 2006.
-
(2006)
Syst. Control Lett
, vol.55
, pp. 139-145
-
-
Borkar, V.S.1
-
30
-
-
61849106433
-
Projected equation methods for approximate solution of large linear systems
-
May
-
D. P. Bertsekas and H. Yu, "Projected equation methods for approximate solution of large linear systems," J. Comput. Sci. Appl. Math., vol. 227, no. 1, pp. 27-50, May 2009.
-
(2009)
J. Comput. Sci. Appl. Math
, vol.227
, Issue.1
, pp. 27-50
-
-
Bertsekas, D.P.1
Yu, H.2
|