-
2
-
-
84899001914
-
A modified form of the iterative method of dynamic programming
-
Hordjik, A. and Tijms, H. (1975). A modified form of the iterative method of dynamic programming. Annals of Statistics, 3:203-208.
-
(1975)
Annals of Statistics
, vol.3
, pp. 203-208
-
-
Hordjik, A.1
Tijms, H.2
-
3
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
Jaakkola, T., Jordan, M., and Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185- 1201.
-
(1994)
Neural Computation
, vol.6
, Issue.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.2
Singh, S.3
-
4
-
-
85149834820
-
Markov games as a framework for multi-agent reinforcement learning
-
San Francisco, CA. Morgan Kauffman
-
Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proc. of the Eleventh International Conference on Machine Learning, pages 157-163, San Francisco, CA. Morgan Kauffman.
-
(1994)
Proc. of the Eleventh International Conference on Machine Learning
, pp. 157-163
-
-
Littman, M.1
-
5
-
-
0001961616
-
A generalized reinforcement learning model: Convergence and applications
-
Littman, M. and Szepesvri, C. (1996). A Generalized Reinforcement Learning Model: Convergence and applications. In Int. Conf. on Machine Learning. http://iserv.iki.kfki.hu/asl-publs.html.
-
(1996)
Int. Conf. on Machine Learning
-
-
Littman, M.1
Szepesvri, C.2
-
6
-
-
0010853273
-
To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning
-
San Francisco, CA. Morgan Kaufmann
-
Mahadevan, S. (1994). To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 164-172, San Francisco, CA. Morgan Kaufmann.
-
(1994)
Proceedings of the Eleventh International Conference on Machine Learning
, pp. 164-172
-
-
Mahadevan, S.1
-
7
-
-
0029752592
-
Average reward reinforcement learning: Foundations, algorithms, and empirical results
-
3
-
Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1, 2, 3) :124-158.
-
(1996)
Machine Learning
, vol.22
, Issue.1-2
, pp. 124-158
-
-
Mahadevan, S.1
-
8
-
-
84898952773
-
A law of the iterated logarithm for the robbins-monro method
-
Major, P. (1993). A law of the iterated logarithm for the Robbins-Monro method. Studia Scientiarum Mathematicarum Hungarica, 8:95-102.
-
(1993)
Studia Scientiarum Mathematicarum Hungarica
, vol.8
, pp. 95-102
-
-
Major, P.1
-
9
-
-
0010720865
-
Pseudogradient adaption and training algorithms
-
Poljak, B. and Tsypkin, Y. (1983). Pseudogradient adaption and training algorithms. Automation and Remote Control, 12:83-94.
-
(1983)
Automation and Remote Control
, vol.12
, pp. 83-94
-
-
Poljak, B.1
Tsypkin, Y.2
-
13
-
-
2342564758
-
On the convergence of single-step on-policy reinforcement-learning al gorithms
-
preparation
-
Singh, S., Jaakkola, T., Littman, M., and Csaba Szepesvari (1997). On the convergence of single-step on-policy reinforcement-learning al gorithms. Machine Learning, in preparation.
-
(1997)
Machine Learning
-
-
Singh, S.1
Jaakkola, T.2
Littman, M.3
Szepesvari, C.4
-
14
-
-
0003629453
-
Generalized markov decision processes: Dynamic programming and reinforcement learning algorithms
-
preparation, available as TR CS96-10, Brown Univ
-
Szepesvari, C. and Littman, M. (1996). Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms. Machine Learning. in preparation, available as TR CS96-10, Brown Univ.
-
(1996)
Machine Learning
-
-
Szepesvari, C.1
Littman, M.2
-
15
-
-
2342562099
-
Asynchronous stochastic approximation and Q-learning
-
Tsitsiklis, J. (1994). Asynchronous stochastic approximation and q-learning. Machine Learning, 8(3-4):257-277.
-
(1994)
Machine Learning
, vol.8
, Issue.3-4
, pp. 257-277
-
-
Tsitsiklis, J.1
-
17
-
-
0004049893
-
-
PhD thesis, King's College, Cambridge. QLEARNING
-
Watkins, C. (1990). Learning from Delayed Rewards. PhD thesis, King's College, Cambridge. QLEARNING.
-
(1990)
Learning from Delayed Rewards
-
-
Watkins, C.1
|