-
1
-
-
33746032553
-
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
-
Springer-Verlag, New York
-
A. ANTOS, CS. SZEPESVARI, AND R. MUNOS, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, in Proceedings of the Conference on Learning Theory, Springer-Verlag, New York, 2006, pp. 574-588.
-
(2006)
Proceedings of the Conference on Learning Theory
, pp. 574-588
-
-
ANTOS, A.1
SZEPESVARI, C.2
MUNOS, R.3
-
2
-
-
0031074521
-
-
AI Rev, 11
-
C. G. ATKESON, A. W. MOORE, AND S. A. SCHAAL, Locally weighted learning, AI Rev., 11 (1997), pp. 11-73.
-
(1997)
Locally weighted learning
, pp. 11-73
-
-
ATKESON, C.G.1
MOORE, A.W.2
SCHAAL, S.A.3
-
3
-
-
0031073475
-
-
AI Rev, 11
-
C. G. ATKESON. A. W. MOORE, AND S. A. SCHAAL, Locally weighted learning for control, AI Rev., 11 (1997), pp. 75-113.
-
(1997)
Locally weighted learning for control
, pp. 75-113
-
-
ATKESON, C.G.1
MOORE, A.W.2
SCHAAL, S.A.3
-
4
-
-
0003787146
-
-
Princeton University Press, Princeton, NJ
-
R. BELLMAN, Dynamic Programming, Princeton University Press, Princeton, NJ, 1957.
-
(1957)
Dynamic Programming
-
-
BELLMAN, R.1
-
5
-
-
84968519017
-
Functional approximation and dynamic programming
-
R. E. BELLMAN AND S. E. DREYFUS, Functional approximation and dynamic programming, Math. Tables Aids Comput., 13 (1959), pp. 247-251.
-
(1959)
Math. Tables Aids Comput
, vol.13
, pp. 247-251
-
-
BELLMAN, R.E.1
DREYFUS, S.E.2
-
8
-
-
0001523794
-
Strict stationarity of generalized autoregressive processes
-
P. BOUGEROL AND N. PICARD, Strict stationarity of generalized autoregressive processes, Ann. Probab., 20 (1992), pp. 1714-1730.
-
(1992)
Ann. Probab
, vol.20
, pp. 1714-1730
-
-
BOUGEROL, P.1
PICARD, N.2
-
9
-
-
0031541839
-
Adaptive greedy approximations
-
G. M. DAVIES, S. MALLAT, AND M. AVELLANEDA, Adaptive greedy approximations, J. Constr. Approx., 13 (1997), pp. 57-98.
-
(1997)
J. Constr. Approx
, vol.13
, pp. 57-98
-
-
DAVIES, G.M.1
MALLAT, S.2
AVELLANEDA, M.3
-
10
-
-
0348090400
-
The linear programming approach to approximate dynamic programming
-
D. P. DE FARIAS AND B. VAN ROY, The linear programming approach to approximate dynamic programming, Oper. Res., 51 (2003), pp. 850-865.
-
(2003)
Oper. Res
, vol.51
, pp. 850-865
-
-
DE FARIAS, D.P.1
VAN ROY, B.2
-
11
-
-
85009724776
-
Nonlinear approximation
-
R. DEVORE, Nonlinear approximation, Acta Numer., 7 (1998), pp. 51-150.
-
(1998)
Acta Numer
, vol.7
, pp. 51-150
-
-
DEVORE, R.1
-
13
-
-
0003989207
-
-
Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA
-
G. J. GORDON, Approximate Solutions to Markov Decision Processes, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, 1999.
-
(1999)
Approximate Solutions to Markov Decision Processes
-
-
GORDON, G.J.1
-
14
-
-
84880898477
-
Max-norm projections for factored MDPs
-
Lawrence Erlbaum
-
C. GUESTRIN, D. KOLLER, AND R. PARR, Max-norm projections for factored MDPs, in Proceedings of the International Joint Conference on Artificial Intelligence, Lawrence Erlbaum, 2001, pp. 673-682.
-
(2001)
Proceedings of the International Joint Conference on Artificial Intelligence
, pp. 673-682
-
-
GUESTRIN, C.1
KOLLER, D.2
PARR, R.3
-
15
-
-
0003684449
-
The Elements of Statistical Learning
-
Springer-Verlag, New York
-
T. HASTIE, R. TIBSHIRANI, AND J. FRIEDMAN, The Elements of Statistical Learning, Springer Ser. Statist., Springer-Verlag, New York, 2001.
-
(2001)
Springer Ser. Statist
-
-
HASTIE, T.1
TIBSHIRANI, R.2
FRIEDMAN, J.3
-
18
-
-
0006238280
-
Recurrence conditions for Markov decision processes with Borel state space: A survey
-
O. HERNÁNDEZ-LERMA, R. MONTES- DE-OCA, AND R. CAVAZOS-CANEDA, Recurrence conditions for Markov decision processes with Borel state space: A survey, Ann. Oper. Res., 28 (1991), pp. 29-46.
-
(1991)
Ann. Oper. Res
, vol.28
, pp. 29-46
-
-
HERNÁNDEZ-LERMA, O.1
MONTES- DE-OCA, R.2
CAVAZOS-CANEDA, R.3
-
19
-
-
0001144425
-
On ergodicity and recurrence properties of a Markov chain with an application to an open Jackson network
-
A. HORDIJK AND F. SPIEKSMA, On ergodicity and recurrence properties of a Markov chain with an application to an open Jackson network, Adv. Appl. Probab., 24 (1992), pp. 343-376.
-
(1992)
Adv. Appl. Probab
, vol.24
, pp. 343-376
-
-
HORDIJK, A.1
SPIEKSMA, F.2
-
23
-
-
4644323293
-
Least-squares policy iteration
-
M. LAGOUDAKIS AND R. PARR, Least-squares policy iteration, J. Mach. Learn. Res., 4 (2003), pp. 1107-1149.
-
(2003)
J. Mach. Learn. Res
, vol.4
, pp. 1107-1149
-
-
LAGOUDAKIS, M.1
PARR, R.2
-
25
-
-
28244499470
-
Stability, performance evaluation, and optimization
-
Kluwer Academic, Boston, MA
-
S. P. MEYN, Stability, performance evaluation, and optimization, in Handbook of Markov Decision Processes: Methods and Applications, Kluwer Academic, Boston, MA, 2002, pp. 305-346.
-
(2002)
Handbook of Markov Decision Processes: Methods and Applications
, pp. 305-346
-
-
MEYN, S.P.1
-
27
-
-
40849114100
-
Finite-Time Bounds for Sampling-Based Fitted Value Iteration
-
Technical report, INRIA, available online from
-
R. MUNOS AND CS. SZEPESVÁRI, Finite-Time Bounds for Sampling-Based Fitted Value Iteration, Technical report, INRIA, 2006; available online from http://hal.inria.fr/inria-00120882.
-
(2006)
-
-
MUNOS, R.1
SZEPESVÁRI, C.2
-
31
-
-
70350192140
-
Numerical dynamic programming in economics
-
Elsevier/North-Holland, Amsterdam
-
J. RUST, Numerical dynamic programming in economics, in Handbook of Computational Economics, Elsevier/North-Holland, Amsterdam, 1996, pp. 619-729.
-
(1996)
Handbook of Computational Economics
, pp. 619-729
-
-
RUST, J.1
-
32
-
-
41449084934
-
-
A. L. SAMUEL, Some studies in machine learning using the game of checkers, IBM J. Res. Develop., 3 (1959), pp. 210-229; reprinted in Computers and Thought, E. A. Feigenbaum and J. Feldman, eds., McGraw-Hill, New York, 1963.
-
A. L. SAMUEL, Some studies in machine learning using the game of checkers, IBM J. Res. Develop., 3 (1959), pp. 210-229; reprinted in Computers and Thought, E. A. Feigenbaum and J. Feldman, eds., McGraw-Hill, New York, 1963.
-
-
-
-
33
-
-
0004102479
-
-
Bradford Book, MIT Press, Cambridge, MA
-
R. S. SUTTON AND A. G. BARTO, Reinforcement Learning: An Introduction, Bradford Book, MIT Press, Cambridge, MA, 1998.
-
(1998)
Reinforcement Learning: An Introduction
-
-
SUTTON, R.S.1
BARTO, A.G.2
-
34
-
-
31844456754
-
-
CS. SZEPESVARI AND R. MUNOS, Finite time bounds for sampling based fitted value iteration,in Proceedings of the International Conference on Machine Learning, ACM, New York, 2005, pp. 881-886.
-
CS. SZEPESVARI AND R. MUNOS, Finite time bounds for sampling based fitted value iteration,in Proceedings of the International Conference on Machine Learning, ACM, New York, 2005, pp. 881-886.
-
-
-
-
35
-
-
0031143730
-
An analysis of temporal difference learning with function approximation
-
J. N. TSITSIKLIS AND B. VAN ROY, An analysis of temporal difference learning with function approximation, IEEE Trans. Automat. Control, 42 (1997), pp. 674-690.
-
(1997)
IEEE Trans. Automat. Control
, vol.42
, pp. 674-690
-
-
TSITSIKLIS, J.N.1
VAN ROY, B.2
-
37
-
-
84887252594
-
Support vector method for function approximation, regression estimation and signal processing
-
V. VAPNIK, S. E. GOLOWICH, AND A. SMOLA, Support vector method for function approximation, regression estimation and signal processing, in Advances in Neural Information Processing Systems, 1997, pp. 281-287.
-
(1997)
in Advances in Neural Information Processing Systems
, pp. 281-287
-
-
VAPNIK, V.1
GOLOWICH, S.E.2
SMOLA, A.3
-
38
-
-
0012252296
-
Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions
-
Technical report NU-CCS-93-14, Northeastern University, Boston, MA
-
R. J. WILLIAMS AND L. C. BAIRD, Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions, Technical report NU-CCS-93-14, Northeastern University, Boston, MA, 1993.
-
(1993)
-
-
WILLIAMS, R.J.1
BAIRD, L.C.2
|