-
1
-
-
84898958374
-
Gradient descent for general, reinforcement learning
-
M. J. Kearns, S. A. Sofia, and D. A. Cohn, editors, The MIT Press
-
L. C. Baird and A. W. Moore. Gradient descent for general, reinforcement learning. In M. J. Kearns, S. A. Sofia, and D. A. Cohn, editors, Advances in Neural Information Processing Systems, volume 11, pages 968-974. The MIT Press, 1999.
-
(1999)
Advances in Neural Information Processing Systems
, vol.11
, pp. 968-974
-
-
Baird, L.C.1
Moore, A.W.2
-
2
-
-
85156187730
-
Improving elevator performance using reinforcement learning
-
D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Cambridge, MA, MIT Press
-
R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1017-1023, Cambridge, MA, 1996. MIT Press.
-
(1996)
Advances in Neural Information Processing Systems 8
, pp. 1017-1023
-
-
Crites, R.H.1
Barto, A.G.2
-
4
-
-
60349092048
-
-
T. Edmunds. An optimum agent for the discrete triathlon. In Reinforcement Learning Benchmarks and Bake-offs Il A workshop at the 2005 NIPS conference, 2005.
-
T. Edmunds. An optimum agent for the discrete triathlon. In Reinforcement Learning Benchmarks and Bake-offs Il A workshop at the 2005 NIPS conference, 2005.
-
-
-
-
8
-
-
0029732210
-
Creating advice-taking reinforcement learners
-
R. Maclin and J. W. Shavlik. Creating advice-taking reinforcement learners. Machine Learning, 22:251-282, 1996.
-
(1996)
Machine Learning
, vol.22
, pp. 251-282
-
-
Maclin, R.1
Shavlik, J.W.2
-
10
-
-
0003636089
-
On-line Q-learning using connectionist systems
-
Engineering Department, Cambridge University
-
G. A. Rummery and M. Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG-RT 116, Engineering Department, Cambridge University, 1994.
-
(1994)
Technical Report CUED/F-INFENG-RT
, vol.116
-
-
Rummery, G.A.1
Niranjan, M.2
-
11
-
-
0029753630
-
Reinforcement learning with replaceing eligibility traces
-
S. P. Singh and R. S. Sutton. Reinforcement learning with replaceing eligibility traces. Machine Learning, 22:123-158, 1996.
-
(1996)
Machine Learning
, vol.22
, pp. 123-158
-
-
Singh, S.P.1
Sutton, R.S.2
-
13
-
-
37249034293
-
Keepaway soccer: From machine learning testbed to benchmark
-
I. Noda, A. Jacoff, A. Bredenfeld, and Y. Takahashi, editors, Springer Verlag, Bertin
-
P. Stone, G. Kuhlmann, M. E. Taylor, and Y. Liu. Keepaway soccer: From machine learning testbed to benchmark. In I. Noda, A. Jacoff, A. Bredenfeld, and Y. Takahashi, editors, RoboCup-2005: Robot Soccer World Cup IX, volume 4020, pages 93-105. Springer Verlag, Bertin, 2006.
-
(2006)
RoboCup-2005: Robot Soccer World Cup IX
, vol.4020
, pp. 93-105
-
-
Stone, P.1
Kuhlmann, G.2
Taylor, M.E.3
Liu, Y.4
-
14
-
-
27544506565
-
Reinforcement learning for RoboCup-soccer keepaway
-
P. Stone, R. S. Sutton, and G. Kuhlmann. Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3): 165-188, 2005.
-
(2005)
Adaptive Behavior
, vol.13
, Issue.3
, pp. 165-188
-
-
Stone, P.1
Sutton, R.S.2
Kuhlmann, G.3
-
15
-
-
0033170372
-
Between mdps and semi-indps: A framework for temporal abstraction in reinforcement learning
-
R. Sutton, D. Precup, and S. Singh. Between mdps and semi-indps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181-2, 11, 1999.
-
(1999)
Artificial Intelligence
, vol.112
, Issue.181-182
, pp. 11
-
-
Sutton, R.1
Precup, D.2
Singh, S.3
-
17
-
-
60349087996
-
-
G. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.
-
G. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.
-
-
-
|