-
1
-
-
0029210635
-
Learning to act using real-time dynamic programming
-
Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1), 81-138.
-
(1995)
Artificial Intelligence
, vol.72
, Issue.1
, pp. 81-138
-
-
Barto, A.G.1
Bradtke, S.J.2
Singh, S.P.3
-
2
-
-
0003602259
-
-
(Tech. Rep. No. 89-95). Amherst, MA: Department of Computer and Information Science, University of Massachusetts
-
Barto, A. G., Sutton, R. S., & Watkins, C. J. C. H. (1989). Learning and sequential decision making (Tech. Rep. No. 89-95). Amherst, MA: Department of Computer and Information Science, University of Massachusetts.
-
(1989)
Learning and Sequential Decision Making
-
-
Barto, A.G.1
Sutton, R.S.2
Watkins, C.J.C.H.3
-
4
-
-
0024680419
-
Adaptive aggregation methods for infinite horizon dynamic programming
-
Bertsekas, D. P., & Castañon, D. A. (1989). Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control, 34(6), 589-598.
-
(1989)
IEEE Transactions on Automatic Control
, vol.34
, Issue.6
, pp. 589-598
-
-
Bertsekas, D.P.1
Castañon, D.A.2
-
7
-
-
84880694195
-
Stable function approximation in dynamic programming
-
A. Prieditis & S. Russell (Eds.), San Mateo: Morgan Kaufmann
-
Gordon, G. J. (1995). Stable function approximation in dynamic programming. In A. Prieditis & S. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 261-268). San Mateo: Morgan Kaufmann.
-
(1995)
Proceedings of the Twelfth International Conference on Machine Learning
, pp. 261-268
-
-
Gordon, G.J.1
-
8
-
-
0008929555
-
Embedding fields: A theory of learning with physiological implications
-
Grossberg, S. (1969). Embedding fields: A theory of learning with physiological implications. Journal of Mathematical Psychology, 6, 209-239.
-
(1969)
Journal of Mathematical Psychology
, vol.6
, pp. 209-239
-
-
Grossberg, S.1
-
9
-
-
0002357911
-
Convergence of indirect adaptive asynchronous value iteration algorithms
-
J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), San Mateo, CA: Morgan Kaufmann
-
Gullapalli, V., & Barto, A. G. (1994). Convergence of indirect adaptive asynchronous value iteration algorithms. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems, 6 (pp. 695-702). San Mateo, CA: Morgan Kaufmann.
-
(1994)
Advances in Neural Information Processing Systems
, vol.6
, pp. 695-702
-
-
Gullapalli, V.1
Barto, A.G.2
-
12
-
-
0000929496
-
Multiagent reinforcement learning: Theoretical framework and an algorithm
-
J. Shavlik (Ed.), San Mateo, CA: Morgan Kaufmann
-
Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In J. Shavlik (Ed.), Proceedings of the Fifteenth International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.
-
(1998)
Proceedings of the Fifteenth International Conference on Machine Learning
-
-
Hu, J.1
Wellman, M.P.2
-
13
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1185-1201.
-
(1994)
Neural Computation
, vol.6
, Issue.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.P.3
-
15
-
-
0029679044
-
Reinforcement learning: A survey
-
Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285.
-
(1996)
Journal of Artificial Intelligence Research
, vol.4
, pp. 237-285
-
-
Kaelbling, L.P.1
Littman, M.L.2
Moore, A.W.3
-
17
-
-
0025400088
-
Real-time heuristic search
-
Korf, R. E. (1990). Real-time heuristic search. Artificial Intelligence, 42, 189-211.
-
(1990)
Artificial Intelligence
, vol.42
, pp. 189-211
-
-
Korf, R.E.1
-
20
-
-
85149834820
-
Markov games as a framework for multi-agent reinforcement learning
-
San Mateo, CA: Morgan Kaufmann
-
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 157-163). San Mateo, CA: Morgan Kaufmann.
-
(1994)
Proceedings of the Eleventh International Conference on Machine Learning
, pp. 157-163
-
-
Littman, M.L.1
-
23
-
-
0017526570
-
Analysis of recursive stochastic algorithms
-
Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Trans. Automat. Control, 22, 551-575.
-
(1977)
IEEE Trans. Automat. Control
, vol.22
, pp. 551-575
-
-
Ljung, L.1
-
24
-
-
0029752592
-
Average reward reinforcement learning: Foundations, algorithms, and empirical results
-
2
-
Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1/2/3), 159-196.
-
(1996)
Machine Learning
, vol.22
, Issue.1-3
, pp. 159-196
-
-
Mahadevan, S.1
-
25
-
-
0027684215
-
Prioritized sweeping: Reinforcement learning with less data and less real time
-
Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103-130.
-
(1993)
Machine Learning
, vol.13
, pp. 103-130
-
-
Moore, A.W.1
Atkeson, C.G.2
-
28
-
-
0039753967
-
Attentional mechanisms as a strategy for generalisation in the Q-learning algorithm
-
Ribeiro, C. (1995). Attentional mechanisms as a strategy for generalisation in the Q-learning algorithm. In Proceedings of ICANN'95 (Vol. 1, pp. 455-460).
-
(1995)
Proceedings of ICANN'95
, vol.1
, pp. 455-460
-
-
Ribeiro, C.1
-
31
-
-
0002686402
-
A convergence theorem for non-negative almost supermartingales and some applications
-
J. Rustagi (Ed.), New York: Academic Press
-
Robbins, H., & Siegmund, D. (1971). A convergence theorem for non-negative almost supermartingales and some applications. In J. Rustagi (Ed.), Optimizing methods in statistics (pp. 235-257). New York: Academic Press.
-
(1971)
Optimizing Methods in Statistics
, pp. 235-257
-
-
Robbins, H.1
Siegmund, D.2
-
32
-
-
0003636089
-
-
(Tech. Rep. No. CUED/F-INFENG/TR 166). Cambridge: Cambridge University, Engineering Department
-
Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems (Tech. Rep. No. CUED/F-INFENG/TR 166). Cambridge: Cambridge University, Engineering Department.
-
(1994)
On-line Q-learning Using Connectionist Systems
-
-
Rummery, G.A.1
Niranjan, M.2
-
33
-
-
85152626183
-
A reinforcement learning method for maximizing undiscounted rewards
-
San Mateo, CA: Morgan Kaufmann
-
Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the Tenth International Conference on Machine Learning (pp. 298-305). San Mateo, CA: Morgan Kaufmann.
-
(1993)
Proceedings of the Tenth International Conference on Machine Learning
, pp. 298-305
-
-
Schwartz, A.1
-
34
-
-
0021594295
-
Aggregation methods for large Markov chains
-
G. Iazola, P. J. Coutois, & A. Hordijk (Eds.), Amsterdam: Elsevier
-
Schweitzer, P. J. (1984). Aggregation methods for large Markov chains. In G. Iazola, P. J. Coutois, & A. Hordijk (Eds.), Mathematical computer performance and reliability (pp. 275-302). Amsterdam: Elsevier.
-
(1984)
Mathematical Computer Performance and Reliability
, pp. 275-302
-
-
Schweitzer, P.J.1
-
35
-
-
85153965130
-
Reinforcement learning with soft state aggregation
-
G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Cambridge, MA: MIT Press
-
Singh, S., Jaakkola, T., & Jordan, M. (1995). Reinforcement learning with soft state aggregation. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems, 7 (pp. 361-368). Cambridge, MA: MIT Press.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
, pp. 361-368
-
-
Singh, S.1
Jaakkola, T.2
Jordan, M.3
-
36
-
-
0040560367
-
-
in press
-
Singh, S., Jaakkola, T., Littman, M. L., & Szepesvári, C. (in press). Convergence results for single-step on-policy reinforcement-learning algorithms.
-
Convergence Results for Single-step On-policy Reinforcement-learning Algorithms
-
-
Singh, S.1
Jaakkola, T.2
Littman, M.L.3
Szepesvári, C.4
-
37
-
-
0029753630
-
Reinforcement learning with replacing eligibility traces
-
2
-
Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22(1/2/3), 123-158.
-
(1996)
Machine Learning
, vol.22
, Issue.1-3
, pp. 123-158
-
-
Singh, S.P.1
Sutton, R.S.2
-
39
-
-
84898998140
-
The asymptotic convergence rate of Q-learning
-
M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Cambridge, MA: MIT Press
-
Szepesvári, M. (1998a). The asymptotic convergence rate of Q-learning. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems, 10. Cambridge, MA: MIT Press.
-
(1998)
Advances in Neural Information Processing Systems
, vol.10
-
-
Szepesvári, M.1
-
40
-
-
0008876345
-
-
Unpublished Ph.D. dissertation, Bolyai Institute of Mathematics, "József Attila" University, Szeged, Hungary
-
Szepesvári, C. (1998b). Static and dynamic aspects of optimal sequential decision making. Unpublished Ph.D. dissertation, Bolyai Institute of Mathematics, "József Attila" University, Szeged, Hungary.
-
(1998)
Static and Dynamic Aspects of Optimal Sequential Decision Making
-
-
Szepesvári, C.1
-
42
-
-
0028497630
-
Asynchronous stochastic approximation and Q-learning
-
Tsitsiklis, J. N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16(3), 185-202.
-
(1994)
Machine Learning
, vol.16
, Issue.3
, pp. 185-202
-
-
Tsitsiklis, J.N.1
-
43
-
-
0011628309
-
Fictitious play applied to sequences of games and discounted stochastic games
-
Vrieze, O. J., & Tijs, S. H. (1982). Fictitious play applied to sequences of games and discounted stochastic games. International Journal of Game Theory, 11(2), 71-85.
-
(1982)
International Journal of Game Theory
, vol.11
, Issue.2
, pp. 71-85
-
-
Vrieze, O.J.1
Tijs, S.H.2
-
44
-
-
0004049893
-
-
Unpublished Ph.D. dissertation, King's College, Cambridge
-
Watkins, C. J. C. H. (1989). Learning from delayed rewards. Unpublished Ph.D. dissertation, King's College, Cambridge.
-
(1989)
Learning from Delayed Rewards
-
-
Watkins, C.J.C.H.1
-
45
-
-
34249833101
-
Q-learning
-
Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3), 279-292.
-
(1992)
Machine Learning
, vol.8
, Issue.3
, pp. 279-292
-
-
Watkins, C.J.C.H.1
Dayan, P.2
|