-
1
-
-
0004030716
-
Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms
-
Laboratory for Information and Decision Systems, MIT, Cambridge, MA
-
Abounady, J., Bertsekas, D., & Borkar, V.S. (1998). Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms. Technical Report LIDS-P-2433, Laboratory for Information and Decision Systems, MIT, Cambridge, MA.
-
(1998)
Technical Report LIDS-P-2433
-
-
Abounady, J.1
Bertsekas, D.2
Borkar, V.S.3
-
2
-
-
0003874616
-
Learning algorithms for Markov decision processes with average cost
-
Laboratory for Information and Decision Systems, MIT, Cambridge, MA
-
Abounady, J., Bertsekas, D., & Borkar, V.S. (1998). Learning algorithms for Markov decision processes with average cost. Technical Report LIDS-P-2434, Laboratory for Information and Decision Systems, MIT, Cambridge, MA.
-
(1998)
Technical Report LIDS-P-2434
-
-
Abounady, J.1
Bertsekas, D.2
Borkar, V.S.3
-
3
-
-
0003796630
-
-
New York: Academic Press
-
Adams, R.A. (1975). Sobolev spaces. New York: Academic Press.
-
(1975)
Sobolev Spaces
-
-
Adams, R.A.1
-
9
-
-
0031076413
-
Stochastic approximation with two time scales
-
Borkar, V.S. (1997). Stochastic approximation with two time scales. Systems & Control Letters 29: 291-294.
-
(1997)
Systems & Control Letters
, vol.29
, pp. 291-294
-
-
Borkar, V.S.1
-
10
-
-
0347967095
-
The O.D.E. method for convergence of stochastic approximation and reinforcement learning
-
to appear
-
Borkar, V.S. & Meyn, S.P. (1998). The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM Journal of Control and Optimization (to appear).
-
(1998)
SIAM Journal of Control and Optimization
-
-
Borkar, V.S.1
Meyn, S.P.2
-
13
-
-
0024909476
-
Convergent activation dynamics in continuous time networks
-
Hirsch, M.W. (1987). Convergent activation dynamics in continuous time networks. Neural Networks 2: 331-349.
-
(1987)
Neural Networks
, vol.2
, pp. 331-349
-
-
Hirsch, M.W.1
-
14
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
Jaakola, T., Jordan, M., & Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6: 1185-1201.
-
(1994)
Neural Computation
, vol.6
, pp. 1185-1201
-
-
Jaakola, T.1
Jordan, M.2
Singh, S.3
-
16
-
-
0017526570
-
Analysis of recursive stochastic algorithms
-
Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control AC-22: 551-575.
-
(1977)
IEEE Transactions on Automatic Control
, vol.AC-22
, pp. 551-575
-
-
Ljung, L.1
-
20
-
-
0009656873
-
Between MDPs and semi-MDPs: Learning, planning and representing knowledge as multiple temporal scales
-
Sutton, R.S., Precup, D., & Singh, S.P. (1998). Between MDPs and semi-MDPs: Learning, planning and representing knowledge as multiple temporal scales. Journal of A.I. Research 1: 1-39.
-
(1998)
Journal of A.I. Research
, vol.1
, pp. 1-39
-
-
Sutton, R.S.1
Precup, D.2
Singh, S.P.3
-
21
-
-
0028497630
-
Asynchronous stochastic approximation and Q-learning
-
Tsitsiklis, J.N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning 16: 185-202.
-
(1994)
Machine Learning
, vol.16
, pp. 185-202
-
-
Tsitsiklis, J.N.1
-
22
-
-
0004049893
-
-
[Unpublished] Ph.D. thesis, Cambridge University, Cambridge, UK
-
Watkins, C. (1989). Learning from delayed rewards. [Unpublished] Ph.D. thesis, Cambridge University, Cambridge, UK.
-
(1989)
Learning from Delayed Rewards
-
-
Watkins, C.1
|