-
3
-
-
0003473816
-
-
SIAM, Philadelphia, PA
-
Richard Barrett, Michael Berry, Tony F. Chan, James Demmel, June Donato, Jack Dongarra, Victor Eijkhout, Roldan Pozo, Charles Romine, and Henk Van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994.
-
(1994)
Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition
-
-
Barrett, R.1
Berry, M.2
Chan, T.F.3
Demmel, J.4
Donato, J.5
Dongarra, J.6
Eijkhout, V.7
Pozo, R.8
Romine, C.9
Van Der Vorst, H.10
-
4
-
-
0029210635
-
Learning to act using real-time dynamic programming
-
Andrew G. Barto, S. J. Bradtke, and Satinder P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81-138, 1995.
-
(1995)
Artificial Intelligence
, vol.72
, Issue.1
, pp. 81-138
-
-
Barto, A.G.1
Bradtke, S.J.2
Singh, S.P.3
-
6
-
-
0020822225
-
Distributed asynchronous computation of fixed points
-
Dimitri P. Bertsekas. Distributed asynchronous computation of fixed points. Mathematics Programming, 27:107-120, 1983.
-
(1983)
Mathematics Programming
, vol.27
, pp. 107-120
-
-
Bertsekas, D.P.1
-
11
-
-
4544318426
-
Efficient solution algorithms for factored MDPs
-
Carlos Guestrin, Daphne Koller, Ronald Parr, and Shobha Venkataraman. Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 19:399-468, 2003.
-
(2003)
Journal of Artificial Intelligence Research
, vol.19
, pp. 399-468
-
-
Guestrin, C.1
Koller, D.2
Parr, R.3
Venkataraman, S.4
-
14
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49:209-232, 2002.
-
(2002)
Machine Learning
, vol.49
, pp. 209-232
-
-
Kearns, M.J.1
Singh, S.P.2
-
19
-
-
0027684215
-
Prioritized sweeping: Reinforcement learning with less data and less time
-
Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13:103-130, 1993.
-
(1993)
Machine Learning
, vol.13
, pp. 103-130
-
-
Moore, A.W.1
Atkeson, C.G.2
-
20
-
-
0029514510
-
The parti-game algorithm for variable resolution reinforcement learning in multidimensional state space
-
Andrew W. Moore and Christopher G. Atkeson. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state space. Machine Learning, 21:199-233, 1995.
-
(1995)
Machine Learning
, vol.21
, pp. 199-233
-
-
Moore, A.W.1
Atkeson, C.G.2
-
21
-
-
0036832953
-
Variable resolution discretization in optimal control
-
Remi Munos and Andrew W. Moore. Variable resolution discretization in optimal control. Machine Learning, 49:291-323, 2002.
-
(2002)
Machine Learning
, vol.49
, pp. 291-323
-
-
Munos, R.1
Moore, A.W.2
-
26
-
-
0037581251
-
Modified policy iteration algorithms for discounted Markov Decision Problems
-
Martin L. Puterman and Moon C. Shin. Modified policy iteration algorithms for discounted Markov Decision Problems. Management Science, 24:1127-1137, 1978.
-
(1978)
Management Science
, vol.24
, pp. 1127-1137
-
-
Puterman, M.L.1
Shin, M.C.2
-
27
-
-
13244294436
-
-
PhD thesis, University of Birmingham, Birmingham, United Kingdom
-
Stuart I. Reynolds. Reinforcement Learning with Exploration. PhD thesis, University of Birmingham, Birmingham, United Kingdom, 2002.
-
(2002)
Reinforcement Learning with Exploration
-
-
Reynolds, S.I.1
-
28
-
-
0003636089
-
On-line Q-learning using connectionist systems
-
Cambridge University, Cambridge, United Kingdom
-
Gavin A. Rummery and Mahesan Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University, Cambridge, United Kingdom, 1994.
-
(1994)
Technical Report
, vol.CUED-F-INFENG-TR 166
-
-
Rummery, G.A.1
Niranjan, M.2
-
31
-
-
0029753630
-
Reinforcement learning with replacing eligibility traces
-
Satinder P. Singh and Richard S. Sutton. Reinforcement learning with replacing eligibility traces. Machine Learning, 22:123-158, 1996.
-
(1996)
Machine Learning
, vol.22
, pp. 123-158
-
-
Singh, S.P.1
Sutton, R.S.2
-
32
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9-44, 1988.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
33
-
-
85156221438
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
Richard S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems, 8:1038-1044, 1996.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, pp. 1038-1044
-
-
Sutton, R.S.1
-
34
-
-
0003603271
-
-
Sandia National Laboratory, Albuquerque, NM
-
Ray S. Tuminaro, Mike Heroux, S. A. Hutchinson, and John N. Shadid. Official Aztec User's Guide: Version 2.1. Sandia National Laboratory, Albuquerque, NM, 1999.
-
(1999)
Official Aztec User's Guide: Version 2.1
-
-
Tuminaro, R.S.1
Heroux, M.2
Hutchinson, S.A.3
Shadid, J.N.4
-
35
-
-
0012252296
-
Tight performance bounds on greedy policies based on imperfect value functions
-
Northeastern University, Boston, MA
-
Ronald J. Williams and Leemon C. Baird. Tight performance bounds on greedy policies based on imperfect value functions. Technical Report NU-CCS-93-14, Northeastern University, Boston, MA, 1993.
-
(1993)
Technical Report
, vol.NU-CCS-93-14
-
-
Williams, R.J.1
Baird, L.C.2
-
39
-
-
0036374229
-
Speeding up the convergence of value iteration in partially observable Markov Decision Processes
-
Nevin L. Zhang and Weihong Zhang. Speeding up the convergence of value iteration in partially observable Markov Decision Processes. Journal of Artificial Intelligence Research, 14:29-51, 2001.
-
(2001)
Journal of Artificial Intelligence Research
, vol.14
, pp. 29-51
-
-
Zhang, N.L.1
Zhang, W.2
|