-
1
-
-
0033170372
-
Between MDP's and semi-MDPs: A framework for temporal abstraction in reinforcement learning
-
R. S. Sutton, D. Precup, and S. Singh, "Between MDP's and semi-MDPs: A framework for temporal abstraction in reinforcement learning," Artific. Intell., vol. 112, pp. 181-211, 1999.
-
(1999)
Artific. Intell.
, vol.112
, pp. 181-211
-
-
Sutton, R.S.1
Precup, D.2
Singh, S.3
-
2
-
-
4844229682
-
A summary on reinforcement learning
-
in Chinese
-
M. Z. Guo, B. Chen, X. L. Wang, and J. R. Hong, "A summary on reinforcement learning" (in Chinese), Comput. Sci., vol. 25, no. 3, pp. 13-15, 1998.
-
(1998)
Comput. Sci.
, vol.25
, Issue.3
, pp. 13-15
-
-
Guo, M.Z.1
Chen, B.2
Wang, X.L.3
Hong, J.R.4
-
3
-
-
33847202724
-
Learning to predict by the method of temporal difference
-
R. S. Sutton, "Learning to predict by the method of temporal difference," Mach. Learn., vol. 3, no. 1, pp. 9-44, 1988.
-
(1988)
Mach. Learn.
, vol.3
, Issue.1
, pp. 9-44
-
-
Sutton, R.S.1
-
4
-
-
0004049893
-
-
Ph.D dissertation, Psychol. Dept., Cambridge Univ., Cambridge, U.K
-
C. J. C. H. Watkins, "Learning from delayed rewards," Ph.D dissertation, Psychol. Dept., Cambridge Univ., Cambridge, U.K., 1989.
-
(1989)
Learning From Delayed Rewards
-
-
Watkins, C.J.C.H.1
-
5
-
-
34249833101
-
Q-learning
-
C. J. C. H. Watkins and P. Dayan, "Q-learning," Mach. Learn., vol. 8, no. 3, pp. 279-292, 1992.
-
(1992)
Mach. Learn.
, vol.8
, Issue.3
, pp. 279-292
-
-
Watkins, C.J.C.H.1
Dayan, P.2
-
6
-
-
4844228133
-
Combining the methods of temporal differences with neural network for real-time modeling and prediction of time series
-
in Chinese
-
L. Yang, J. R. Hong, and T. Y. Huang, "Combining the methods of temporal differences with neural network for real-time modeling and prediction of time series" (in Chinese), Chinese J. Comput., vol. 19, no. 9, pp. 695-700, 1996.
-
(1996)
Chinese J. Comput.
, vol.19
, Issue.9
, pp. 695-700
-
-
Yang, L.1
Hong, J.R.2
Huang, T.Y.3
-
7
-
-
0033148990
-
Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development
-
M. Asada, E. Uchibe, and K. Hosoda, "Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development," Intell., vol. 110, pp. 275-292, 1999.
-
(1999)
Intell.
, vol.110
, pp. 275-292
-
-
Asada, M.1
Uchibe, E.2
Hosoda, K.3
-
9
-
-
0029679044
-
Reinforcement learning. A survey
-
L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning. A survey," J. AI Res., vol. 4, pp. 237-285, 1996.
-
(1996)
J. AI Res.
, vol.4
, pp. 237-285
-
-
Kaelbling, L.P.1
Littman, M.L.2
Moore, A.W.3
-
10
-
-
5744249209
-
Equation of calculations by fast computing machines
-
N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, "Equation of calculations by fast computing machines," J. Chem. Phys., vol. 21, pp. 1087-1092, 1953.
-
(1953)
J. Chem. Phys.
, vol.21
, pp. 1087-1092
-
-
Metropolis, N.1
Rosenbluth, A.W.2
Rosenbluth, M.N.3
Teller, A.H.4
Teller, E.5
-
11
-
-
0033687233
-
Nature's way of optimizing
-
S. Boettcher and A. Percus, "Nature's way of optimizing," Artific. Intell., vol. 119, pp. 275-286, 2000.
-
(2000)
Artific. Intell.
, vol.119
, pp. 275-286
-
-
Boettcher, S.1
Percus, A.2
-
12
-
-
26444479778
-
Optimization by simulated annealing
-
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, "Optimization by simulated annealing," Science, vol. 220, pp. 671-680, 1983.
-
(1983)
Science
, vol.220
, pp. 671-680
-
-
Kirkpatrick, S.1
Gelatt, C.D.2
Vecchi, M.P.3
-
14
-
-
0031208987
-
Explanation-based learning and reinforcement learning: A unified view
-
T. G. Dietterich and N. S. Flann, "Explanation-based learning and reinforcement learning: A unified view," Mach. Learn., vol. 28, pp. 169-210, 1997.
-
(1997)
Mach. Learn.
, vol.28
, pp. 169-210
-
-
Dietterich, T.G.1
Flann, N.S.2
|