-
2
-
-
0029679044
-
Reinforcement learning: A survey
-
May
-
L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," J. Artif. Intell. Res., vol. 4, pp. 237-285, May 1996.
-
(1996)
J. Artif. Intell. Res
, vol.4
, pp. 237-285
-
-
Kaelbling, L.P.1
Littman, M.L.2
Moore, A.W.3
-
3
-
-
0004049893
-
Learning from delayed rewards,
-
Ph.D. dissertation, King's College, Cambridge, U.K
-
C. J. C. H. Watkins, "Learning from delayed rewards," Ph.D. dissertation, King's College, Cambridge, U.K., 1989.
-
(1989)
-
-
Watkins, C.J.C.H.1
-
4
-
-
49049097809
-
-
G. Rummery and M. Niranjan, On-line Q-learning using connectionist systems, Cambridge Univ., Cambridge, U.K., Tech. Rep. CUED/F-INFENG-TR 166, 1994.
-
G. Rummery and M. Niranjan, "On-line Q-learning using connectionist systems," Cambridge Univ., Cambridge, U.K., Tech. Rep. CUED/F-INFENG-TR 166, 1994.
-
-
-
-
5
-
-
85156221438
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press
-
R. S. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," in Advances in Neural Information Processing Systems, vol. 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press, 1996, pp. 1038-1045.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, pp. 1038-1045
-
-
Sutton, R.S.1
-
7
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Cambridge, MA: MIT Press
-
R. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in Neural Information Processing Systems, vol. 12. Cambridge, MA: MIT Press, 2000, pp. 1057-1063.
-
(2000)
Advances in Neural Information Processing Systems
, vol.12
, pp. 1057-1063
-
-
Sutton, R.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
8
-
-
0013535965
-
Infinite-horizon policy-gradient estimation
-
J. Baxter and P. Bartlett, "Infinite-horizon policy-gradient estimation," J. Artif. Intell. Res., vol. 15, pp. 319-350, 2001.
-
(2001)
J. Artif. Intell. Res
, vol.15
, pp. 319-350
-
-
Baxter, J.1
Bartlett, P.2
-
9
-
-
0027684215
-
Prioritized sweeping: Reinforcement learning with less data and less time
-
Oct
-
A. W. Moore and C. G. Atkeson, "Prioritized sweeping: Reinforcement learning with less data and less time," Mach. Learn., vol. 13, no. 1, pp. 103-130, Oct. 1993.
-
(1993)
Mach. Learn
, vol.13
, Issue.1
, pp. 103-130
-
-
Moore, A.W.1
Atkeson, C.G.2
-
10
-
-
33646398129
-
Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method
-
M. Riedmiller, "Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method," in Proc. 16th ECML, 2005, pp. 317-328.
-
(2005)
Proc. 16th ECML
, pp. 317-328
-
-
Riedmiller, M.1
-
11
-
-
0030211964
-
Bagging predictors
-
Aug
-
L. Breiman, "Bagging predictors," Mach. Learn., vol. 24, no. 2, pp. 123-140, Aug. 1996.
-
(1996)
Mach. Learn
, vol.24
, Issue.2
, pp. 123-140
-
-
Breiman, L.1
-
13
-
-
0001940458
-
Adaptive mixtures of local experts
-
R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, "Adaptive mixtures of local experts," Neural Comput., vol. 3, no. 1, pp. 79-87, 1991.
-
(1991)
Neural Comput
, vol.3
, Issue.1
, pp. 79-87
-
-
Jacobs, R.A.1
Jordan, M.I.2
Nowlan, S.J.3
Hinton, G.E.4
-
14
-
-
0001652790
-
The efficient learning of multiple task sequences
-
J. Moody, S. Hanson, and R. Lippman, Eds. San Mateo, CA: Morgan Kaufmann
-
S. P. Singh, "The efficient learning of multiple task sequences," in Advances in Neural Information Processing Systems, vol. 4, J. Moody, S. Hanson, and R. Lippman, Eds. San Mateo, CA: Morgan Kaufmann, 1992, pp. 251-258.
-
(1992)
Advances in Neural Information Processing Systems
, vol.4
, pp. 251-258
-
-
Singh, S.P.1
-
15
-
-
0029390263
-
Reinforcement learning of multiple tasks using a hierarchical CMAC architecture
-
C. Tham, "Reinforcement learning of multiple tasks using a hierarchical CMAC architecture," Robot. Auton. Syst., vol. 15, no. 4, pp. 247-274, 1995.
-
(1995)
Robot. Auton. Syst
, vol.15
, Issue.4
, pp. 247-274
-
-
Tham, C.1
-
16
-
-
0032772352
-
Multi-agent reinforcement learning: Weighting and partitioning
-
Jun
-
R. Sun and T. Peterson, "Multi-agent reinforcement learning: Weighting and partitioning," Neural Netw., vol. 12, no. 4/5, pp. 727-753, Jun. 1999.
-
(1999)
Neural Netw
, vol.12
, Issue.4-5
, pp. 727-753
-
-
Sun, R.1
Peterson, T.2
-
17
-
-
21844465127
-
Tree-based batch mode reinforcement learning
-
Dec
-
D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," J. Mach. Learn. Res., vol. 6, pp. 503-556, Dec. 2005.
-
(2005)
J. Mach. Learn. Res
, vol.6
, pp. 503-556
-
-
Ernst, D.1
Geurts, P.2
Wehenkel, L.3
-
18
-
-
34249833101
-
Q-learning
-
C. J. C. H. Watkins and P. Dayan, "Q-learning," Mach. Learn., vol. 8, no. 3/4, pp. 279-292, 1992.
-
(1992)
Mach. Learn
, vol.8
, Issue.3-4
, pp. 279-292
-
-
Watkins, C.J.C.H.1
Dayan, P.2
-
19
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
Aug
-
R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, no. 1, pp. 9-44, Aug. 1988.
-
(1988)
Mach. Learn
, vol.3
, Issue.1
, pp. 9-44
-
-
Sutton, R.S.1
-
20
-
-
0016082525
-
Learning automata - A survey
-
Jul
-
K. S. Narendra and M. A. L. Thathatchar, "Learning automata - A survey," IEEE Trans. Syst., Man, Cybern., vol. SMC-4, no. 4, pp. 323-334, Jul. 1974.
-
(1974)
IEEE Trans. Syst., Man, Cybern
, vol.SMC-4
, Issue.4
, pp. 323-334
-
-
Narendra, K.S.1
Thathatchar, M.A.L.2
-
21
-
-
85153940465
-
Generalization in reinforcement learning: Safely approximating the value function
-
G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press
-
J. A. Boyan and A. W. Moore, "Generalization in reinforcement learning: Safely approximating the value function," in Advances in Neural Information Processing Systems, vol. 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press, 1995, pp. 369-376.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
, pp. 369-376
-
-
Boyan, J.A.1
Moore, A.W.2
-
22
-
-
0038595393
-
-
Carnegie Mellon Univ, Pittsburgh, PA, Tech. Rep. CMU-CS-95-103
-
G. Gordon, "Stable function approximation in dynamic programming," Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-CS-95-103, 1995.
-
(1995)
Stable function approximation in dynamic programming
-
-
Gordon, G.1
-
23
-
-
0030421566
-
Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot
-
P. Werbos and X. Pang, "Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot," in Proc. IEEE Int. Conf. Syst., Man, Cybern., 1996, vol. 3, pp. 1764-1769.
-
(1996)
Proc. IEEE Int. Conf. Syst., Man, Cybern
, vol.3
, pp. 1764-1769
-
-
Werbos, P.1
Pang, X.2
|