-
2
-
-
0004870198
-
Experiments in parameter learning using temporal differences
-
Baxter, J., Tridgell, A., Weaver, L. (1998). Experiments in parameter learning using temporal differences. International Computer Chess Association Journal, 21, 84-99.
-
(1998)
International Computer Chess Association Journal
, vol.21
, pp. 84-99
-
-
Baxter, J.1
Tridgell, A.2
Weaver, L.3
-
4
-
-
0033876515
-
O.D.E. method for convergence of stochastic approximation and reinforcement learning
-
Borkar, V. S. and Meyn, S. P. (2000). The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control And Optimization, 38(2):447-469. (Pubitemid 30594546)
-
(2000)
SIAM Journal on Control and Optimization
, vol.38
, Issue.2
, pp. 447-469
-
-
Borkar, V.S.1
Meyn, S.P.2
-
5
-
-
0036832950
-
Technical update: Least-squares temporal difference learning
-
DOI 10.1023/A:1017936530646
-
Boyan, J. (2002). Technical update: Least-squares temporal difference learning. Machine Learning, 49:233-246. (Pubitemid 34325688)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 233-246
-
-
Boyan, J.A.1
-
6
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
Bradtke, S., Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22:33-57. (Pubitemid 126724362)
-
(1996)
Machine Learning
, vol.22
, Issue.1-3
, pp. 33-57
-
-
Bradtke, S.J.1
-
7
-
-
0000430514
-
The convergence of TD(λ) for general λ
-
Dayan, P. (1992). The convergence of TD(λ) for general λ. Machine Learning, 8:341-362.
-
(1992)
Machine Learning
, vol.8
, pp. 341-362
-
-
Dayan, P.1
-
8
-
-
33750737011
-
Incremental least-squares temporal difference learning
-
Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
-
Geramifard, A., Bowling, M., Sutton, R. S. (2006). Incremental least-square temporal difference learning. Proceedings of the National Conference on Artificial Intelligence, pp. 356-361. (Pubitemid 44705310)
-
(2006)
Proceedings of the National Conference on Artificial Intelligence
, vol.1
, pp. 356-361
-
-
Geramifard, A.1
Bowling, M.2
Sutton, R.S.3
-
13
-
-
70049112133
-
Off-policy learning with recognizers
-
Precup, D., Sutton, R. S., Paduraru, C., Koop, A., Singh, S. (2006). Off-policy Learning with Recognizers. Advances in Neural Information Processing Systems 18.
-
(2006)
Advances in Neural Information Processing Systems
, vol.18
-
-
Precup, D.1
Sutton, R.S.2
Paduraru, C.3
Koop, A.4
Singh, S.5
-
14
-
-
0242393653
-
Eligibility traces for off-policy policy evaluation
-
Morgan Kaufmann
-
Precup, D., Sutton, R. S., Singh, S. (2000). Eligibility traces for off-policy policy evaluation. Proceedings of the 17th International Conference on Machine Learning, pp. 759-766. Morgan Kaufmann.
-
(2000)
Proceedings of the 17th International Conference on Machine Learning
, pp. 759-766
-
-
Precup, D.1
Sutton, R.S.2
Singh, S.3
-
15
-
-
0038145011
-
Temporal difference learning applied to a high-performance gameplaying program
-
Schaeffer, J., Hlynka, M., Jussila, V. (2001). Temporal difference learning applied to a high-performance gameplaying program. Proceedings of the International Joint Conference on Artificial Intelligence, pp. 529-534.
-
(2001)
Proceedings of the International Joint Conference on Artificial Intelligence
, pp. 529-534
-
-
Schaeffer, J.1
Hlynka, M.2
Jussila, V.3
-
16
-
-
84880900542
-
Reinforcement learning of local shape in the game of Go
-
Silver, D., Sutton, R. S., Müller, M. (2007). Reinforcement learning of local shape in the game of Go. Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1053-1058.
-
(2007)
Proceedings of the 20th International Joint Conference on Artificial Intelligence
, pp. 1053-1058
-
-
Silver, D.1
Sutton, R.S.2
Müller, M.3
-
18
-
-
33847202724
-
Learning to predict by the method of temporal differences
-
Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3:9-44.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
20
-
-
0033170372
-
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
-
DOI 10.1016/S0004-3702(99)00052-1
-
Sutton, R.S., Precup D. and Singh, S (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181-211. (Pubitemid 32079890)
-
(1999)
Artificial Intelligence
, vol.112
, Issue.1
, pp. 181-211
-
-
Sutton, R.S.1
Precup, D.2
Singh, S.3
-
22
-
-
0035283402
-
On the convergence of temporal-difference learning with linear function approximation
-
DOI 10.1023/A:1007609817671
-
Tadic, V. (2001). On the convergence of temporal-difference learning with linear function approximation. In Machine Learning 42:241-267 (Pubitemid 32188797)
-
(2001)
Machine Learning
, vol.42
, Issue.3
, pp. 241-267
-
-
Tadic, V.1
-
23
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
PII S0018928697034375
-
Tsitsiklis, J. N., and Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42:674-690. (Pubitemid 127760263)
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
|