-
1
-
-
85156187730
-
Improving elevator performance using reinforcement learning
-
Barto A, Crites R 1996. Improving elevator performance using reinforcement learning, Adv Neural Inf Process Syst, 8:1017-1023.
-
(1996)
Adv Neural Inf Process Syst
, vol.8
, pp. 1017-1023
-
-
Barto, A.1
Crites, R.2
-
2
-
-
84968519017
-
Functional approximations and dynamic programming
-
Bellman R, Dreyfuss S 1959. Functional approximations and dynamic programming, Math Tables Other Aids Comput, 13:247-251.
-
(1959)
Math Tables Other Aids Comput
, vol.13
, pp. 247-251
-
-
Bellman, R.1
Dreyfuss, S.2
-
6
-
-
84898972974
-
Reinforcement learning for dynamic channel allocation in cellular telephone systems
-
MIT
-
Bertsekas DP, Singh S 1997. Reinforcement learning for dynamic channel allocation in cellular telephone systems. Adv Neural Inf Process Syst. MIT, vol. 9, p. 974.
-
(1997)
Adv Neural Inf Process Syst.
, vol.9
, pp. 974
-
-
Bertsekas, D.P.1
Singh, S.2
-
10
-
-
0036832950
-
Technical update: Least-squares temporal difference learning
-
Boyan J 2002. Technical update: least-squares temporal difference learning, Mach Learn, 49(2):233-246.
-
(2002)
Mach Learn
, vol.49
, Issue.2
, pp. 233-246
-
-
Boyan, J.1
-
11
-
-
0001771345
-
Linear least-squares algorithms for temporal-difference learning
-
Bradtke SJ, Barto AG 1996. Linear least-squares algorithms for temporal-difference learning, Mach Learn. 22:33-57.
-
(1996)
Mach Learn
, vol.22
, pp. 33-57
-
-
Bradtke, S.J.1
Barto, A.G.2
-
13
-
-
0000430514
-
The convergence of TD(λ) for general (λ)
-
Dayan PD 1992. The convergence of TD(λ) for general (λ), Mach Learn, 8:341-362.
-
(1992)
Mach Learn
, vol.8
, pp. 341-362
-
-
Dayan, P.D.1
-
14
-
-
0034342516
-
On the existence of fixed points for approximate value iteration and temporal-difference learning
-
de Farias DP, Van Roy B 2000. On the existence of fixed points for approximate value iteration and temporal-difference learning, J Optim Theory Appl, 105(3).
-
(2000)
J Optim Theory Appl
, vol.105
, Issue.3
-
-
De Farias, D.P.1
Van Roy, B.2
-
15
-
-
0003786198
-
Incremental learning of evaluation functions for absorbing markov chains
-
preprint
-
Gurvits L, Lin LJ, and Hanson SJ 1994. incremental learning of evaluation functions for absorbing markov chains: New Methods and Theorems, preprint.
-
(1994)
New Methods and Theorems
-
-
Gurvits, L.1
Lin, L.J.2
Hanson, S.J.3
-
18
-
-
33646436235
-
Policy evaluation algorithms with linear function approximation
-
MIT Laboratory for Information and Decision Systems, December 2001
-
Nedic A, Bertsekas DP 2001. Policy evaluation algorithms with linear function approximation. Tech. Rep. LIDS-P-2537, MIT Laboratory for Information and Decision Systems, December 2001.
-
(2001)
Tech. Rep.
, vol.LIDS-P-2537
-
-
Nedic, A.1
Bertsekas, D.P.2
-
19
-
-
0003276733
-
Mean-field analysis for batched TD(λ)
-
Pineda F 1997. Mean-field analysis for batched TD(λ). Neural Comput, 1403-1419.
-
(1997)
Neural Comput
, pp. 1403-1419
-
-
Pineda, F.1
-
20
-
-
33847202724
-
Learning to predict by the method of temporal differences
-
Sutton RS 1988. Learning to predict by the method of temporal differences, Mach Learn, 3:9-44.
-
(1988)
Mach Learn
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
21
-
-
0035283402
-
On the convergence of temporal-difference learning with linear function approximation
-
Tadić V 2001. On the convergence of temporal-difference learning with linear function approximation, Mach Learn, 42:241-267.
-
(2001)
Mach Learn
, vol.42
, pp. 241-267
-
-
Tadić, V.1
-
22
-
-
0029276036
-
Temporal difference learning and TD-gammon
-
Tesauro G 1995. Temporal difference learning and TD-gammon, Communications of the ACM, 38(3).
-
(1995)
Communications of the ACM
, vol.38
, Issue.3
-
-
Tesauro, G.1
-
23
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
Tsitsiklis JN, Van Roy B 1997. An analysis of temporal-difference learning with function approximation, IEEE Trans Automat Contr, 42:674-690.
-
(1997)
IEEE Trans Automat Contr
, vol.42
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
24
-
-
0033351917
-
Optimal stopping of markov processes: Hilbert Space Theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
-
Tsitsiklis JN, Van Roy B 1999. Optimal stopping of markov processes: Hilbert Space Theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives, IEEE Trans Automat Contr, 44(10):1840-1851.
-
(1999)
IEEE Trans Automat Contr
, vol.44
, Issue.10
, pp. 1840-1851
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
27
-
-
0022060331
-
Extensions of the multiarmed bandit problem: The discounted case
-
Varaiya P, Walrand J, and Buyukkoc C 1985. Extensions of the multiarmed bandit problem: the discounted case, IEEE Trans Automat Contr, 30(5).
-
(1985)
IEEE Trans Automat Contr
, vol.30
, Issue.5
-
-
Varaiya, P.1
Walrand, J.2
Buyukkoc, C.3
-
29
-
-
0013419177
-
On the worst-case analysis of temporal-difference learning algorithms
-
2
-
Warmuth M, Schapire R 1997. On the worst-case analysis of temporal-difference learning algorithms, Journal of Machine Learning, 22(1,2,3):95-121.
-
(1997)
Journal of Machine Learning
, vol.22
, Issue.1-3
, pp. 95-121
-
-
Warmuth, M.1
Schapire, R.2
|