-
2
-
-
0029210635
-
Learning to act using real-time dynamic programming
-
A. G. Barto, S. J. Bradtke, and S. P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81-138, 1995.
-
(1995)
Artificial Intelligence
, vol.72
, Issue.1
, pp. 81-138
-
-
Barto, A. G.1
Bradtke, S. J.2
Singh, S. P.3
-
5
-
-
0034342516
-
On the existence of fixed points for approximate value iteration and temporal-difference learning
-
D. P. De Farias and B. Van Roy. On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Opt. Theory and Applications, 105(3), 2000.
-
(2000)
Journal of Opt. Theory and Applications
, vol.105
, Issue.3
-
-
De Farias, D. P.1
Van Roy, B.2
-
8
-
-
84898995808
-
Reinforcement learning with function approximation converges to a region
-
MIT Press
-
G. J. Gordon. Reinforcement learning with function approximation converges to a region. Advances in Neural Information Processing Systems 13, pages 1040-1046. MIT Press, 2001.
-
(2001)
Advances in Neural Information Processing Systems
, vol.13
, pp. 1040-1046
-
-
Gordon, G. J.1
-
12
-
-
0037886159
-
Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite markov chains
-
W. J. Stewart, editor, Dekker, NY
-
E. Seneta. Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite markov chains. In W. J. Stewart, editor, Numerical Solutions of Markov Chains. Dekker, NY, 1991.
-
(1991)
Numerical Solutions of Markov Chains
-
-
Seneta, E.1
-
13
-
-
0033901602
-
Convergence results for single-step on-policy reinforcement-learning algorithms
-
S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvari. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3):287-308, 2000.
-
(2000)
Machine Learning
, vol.38
, Issue.3
, pp. 287-308
-
-
Singh, S.1
Jaakkola, T.2
Littman, M. L.3
Szepesvari, C.4
-
15
-
-
0000985504
-
TD-Gammon, a self-teaching backgammon program, achieves master-level play
-
G. J. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.
-
(1994)
Neural Computation
, vol.6
, Issue.2
, pp. 215-219
-
-
Tesauro, G. J.1
-
16
-
-
0033351917
-
Optimal stopping of markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
-
J. N. Tsitsiklis and B. Van Roy. Optimal stopping of markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Transactions on Automatic Control, 44(10):1840-1851, 1999.
-
(1999)
IEEE Transactions on Automatic Control
, vol.44
, Issue.10
, pp. 1840-1851
-
-
Tsitsiklis, J. N.1
Van Roy, B.2
-
17
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674-690, 1997.
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J. N.1
Van Roy, B.2
|