-
1
-
-
40849145988
-
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
-
Antos, A., Szepesv́ari, Cs., Munos, R. (2008). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path.Machine Learning 71:89-129.
-
(2008)
Machine Learning
, vol.71
, pp. 89-129
-
-
Antos, A.1
Szepesv́ari, C.2
Munos, R.3
-
2
-
-
85151728371
-
Residual algorithms: Reinforcement learning with function approximation
-
Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th Int. Conf. on Machine Learning, pp. 30-37.
-
(1995)
Proceedings of the 12th Int. Conf. on Machine Learning
, pp. 30-37
-
-
Baird, L.C.1
-
5
-
-
0031076413
-
Stochastic approximation with two timescales
-
Borkar, V. S. (1997). Stochastic approximation with two timescales. Systems and Control Letters 29:291-294.
-
(1997)
Systems and Control Letters
, vol.29
, pp. 291-294
-
-
Borkar, V.S.1
-
6
-
-
0033876515
-
The ODE method for convergence of stochastic approximation and reinforcement learning
-
Borkar, V. S. and Meyn, S. P. (2000). The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control And Optimization 38(2):447-469.
-
(2000)
SIAM Journal on Control And Optimization
, vol.38
, Issue.2
, pp. 447-469
-
-
Borkar, V.S.1
Meyn, S.P.2
-
7
-
-
0036832950
-
Technical update: Least-squares temporal difference learning
-
Boyan, J. (2002). Technical update: Least-squares temporal difference learning. Machine Learning 49:233-246.
-
(2002)
Machine Learning
, vol.49
, pp. 233-246
-
-
Boyan, J.1
-
8
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
Bradtke, S., Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning 22:33-57.
-
(1996)
Machine Learning
, vol.22
, pp. 33-57
-
-
Bradtke, S.1
Barto, A.G.2
-
9
-
-
0000430514
-
The convergence of TD(λ) for general λ
-
Dayan, P. (1992). The convergence of TD(λ) for general λ. Machine Learning 8:341-362.
-
(1992)
Machine Learning
, vol.8
, pp. 341-362
-
-
Dayan, P.1
-
10
-
-
33750737011
-
Incremental least-square temporal difference learning
-
Geramifard, A., Bowling, M., Sutton, R. S. (2006). Incremental least-square temporal difference learning. Proceedings AAAI, pp. 356-361.
-
(2006)
Proceedings AAAI
, pp. 356-361
-
-
Geramifard, A.1
Bowling, M.2
Sutton, R.S.3
-
11
-
-
0024909476
-
Convergent activation dynamics in continuous time networks
-
Hirsch, M. W. (1989). Convergent activation dynamics in continuous time networks. Neural Networks 2:331-349.
-
(1989)
Neural Networks
, vol.2
, pp. 331-349
-
-
Hirsch, M.W.1
-
13
-
-
70049112133
-
Off-policy learning with recognizers
-
Precup, D., Sutton, R. S., Paduraru, C., Koop, A., Singh, S. (2006). Off-policy learning with recognizers. Advances in Neural Information Processing Systems 18.
-
(2006)
Advances in Neural Information Processing Systems
, vol.18
-
-
Precup, D.1
Sutton, R.S.2
Paduraru, C.3
Koop, A.4
Singh, S.5
-
14
-
-
84880900542
-
Reinforcement learning of local shape in the game of Go
-
Silver, D., Sutton, R. S., Müller, M. (2007). Reinforcement learning of local shape in the game of Go. Proceedings of the 20th IJCAI, pp. 1053-1058.
-
(2007)
Proceedings of the 20th IJCAI
, pp. 1053-1058
-
-
Silver, D.1
Sutton, R.S.2
Müller, M.3
-
16
-
-
33847202724
-
Learning to predict by the method of temporal differences
-
Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning 3:9-44.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
17
-
-
0033170372
-
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
-
Sutton, R. S., Precup D., and Singh, S (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112:181-211.
-
(1999)
Artificial Intelligence
, vol.112
, pp. 181-211
-
-
Sutton, R.S.1
Precup, D.2
Singh, S.3
|