-
2
-
-
33746032553
-
-
A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. In G. Lugosi and H.U. Simon, editors, The Nineteenth Annual Conference on Learning Theory, COLT 2006, Proceedings, 4005 of LNCS/LNAI, pages 574-588, Berlin, Heidelberg, June 2006. Springer-Verlag. (Pittsburgh, PA, USA, June 22-25, 2006.).
-
A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. In G. Lugosi and H.U. Simon, editors, The Nineteenth Annual Conference on Learning Theory, COLT 2006, Proceedings, volume 4005 of LNCS/LNAI, pages 574-588, Berlin, Heidelberg, June 2006. Springer-Verlag. (Pittsburgh, PA, USA, June 22-25, 2006.).
-
-
-
-
3
-
-
34548752490
-
Value-iteration based fitted policy iteration: Learning with a single trajectory
-
IEEE, April, Honolulu, Hawaii, Apr 1-5
-
A. Antos, Cs. Szepesvári, and R. Munos. Value-iteration based fitted policy iteration: learning with a single trajectory. In 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007), pages 330-337. IEEE, April 2007. (Honolulu, Hawaii, Apr 1-5, 2007.).
-
(2007)
2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007)
, pp. 330-337
-
-
Antos, A.1
Szepesvári, C.2
Munos, R.3
-
4
-
-
40849145988
-
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
-
A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71:89-129, 2008.
-
(2008)
Machine Learning
, vol.71
, pp. 89-129
-
-
Antos, A.1
Szepesvári, C.2
Munos, R.3
-
5
-
-
85151728371
-
Residual algorithms: Reinforcement learning with function approximation
-
Armand Prieditis and Stuart Russell, editors, San Francisco, CA, Morgan Kaufmann
-
Leemon C. Baird. Residual algorithms: Reinforcement learning with function approximation. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 30-37, San Francisco, CA, 1995. Morgan Kaufmann.
-
(1995)
Proceedings of the Twelfth International Conference on Machine Learning
, pp. 30-37
-
-
Baird, L.C.1
-
9
-
-
0001523794
-
Strict stationarity of generalized autoregressive processes
-
P. Bougerol and N. Picard. Strict stationarity of generalized autoregressive processes. Annals of Probability, 20:1714-1730, 1992.
-
(1992)
Annals of Probability
, vol.20
, pp. 1714-1730
-
-
Bougerol, P.1
Picard, N.2
-
12
-
-
0026206780
-
An optimal multigrid algorithm for continuous state discrete time stochastic control
-
C.S. Chow and J.N. Tsitsiklis. An optimal multigrid algorithm for continuous state discrete time stochastic control. IEEE Transactions on Automatic Control, 36(8):898-914, 1991.
-
(1991)
IEEE Transactions on Automatic Control
, vol.36
, Issue.8
, pp. 898-914
-
-
Chow, C.S.1
Tsitsiklis, J.N.2
-
15
-
-
0002319896
-
Nonlinear Approximation
-
R. DeVore. Nonlinear Approximation. Acta Numerica, 1997.
-
(1997)
Acta Numerica
-
-
DeVore, R.1
-
16
-
-
84899029004
-
Batch value function approximation via support vectors
-
T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Cambridge, MA, MIT Press
-
T. G. Dietterich and X. Wang. Batch value function approximation via support vectors. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press.
-
(2002)
Advances in Neural Information Processing Systems 14
-
-
Dietterich, T.G.1
Wang, X.2
-
19
-
-
84880694195
-
Stable function approximation in dynamic programming
-
Armand Prieditis and Stuart Russell, editors, San Francisco, CA, Morgan Kaufmann
-
G.J. Gordon. Stable function approximation in dynamic programming. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 261-268, San Francisco, CA, 1995. Morgan Kaufmann.
-
(1995)
Proceedings of the Twelfth International Conference on Machine Learning
, pp. 261-268
-
-
Gordon, G.J.1
-
20
-
-
2342446663
-
A reinforcement learning algorithm based on policy iteration for average reward: Empirical results with yield management and convergence analysis
-
A. Gosavi. A reinforcement learning algorithm based on policy iteration for average reward: Empirical results with yield management and convergence analysis. Machine Learning, 55:5-29, 2004.
-
(2004)
Machine Learning
, vol.55
, pp. 5-29
-
-
Gosavi, A.1
-
22
-
-
0003624357
-
-
Springer-Verlag, New York
-
L. Györfi, M. Kohler, A. Krzyżak, and H. Walk. A distribution-free theory of nonparametric regression. Springer-Verlag, New York, 2002.
-
(2002)
A distribution-free theory of nonparametric regression
-
-
Györfi, L.1
Kohler, M.2
Krzyżak, A.3
Walk, H.4
-
24
-
-
0000996139
-
Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension
-
D. Haussler. Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory, Series A, 69(2):217-232, 1995.
-
(1995)
Journal of Combinatorial Theory, Series A
, vol.69
, Issue.2
, pp. 217-232
-
-
Haussler, D.1
-
25
-
-
84947403595
-
Probability inequalities for sums of bounded random variables
-
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13-30, 1963.
-
(1963)
Journal of the American Statistical Association
, vol.58
, pp. 13-30
-
-
Hoeffding, W.1
-
26
-
-
22944487667
-
Experiments in value function approximation with sparse support vector regression
-
T. Jung and T. Uthmann. Experiments in value function approximation with sparse support vector regression. In ECML, pages 180-191, 2004.
-
(2004)
ECML
, pp. 180-191
-
-
Jung, T.1
Uthmann, T.2
-
27
-
-
1942514728
-
Approximately optimal approximate reinforcement learning
-
San Francisco, CA, USA, Morgan Kaufmann Publishers Inc
-
S. Kakade and J. Langford. Approximately optimal approximate reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 267-274, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc.
-
(2002)
Proceedings of the Nineteenth International Conference on Machine Learning
, pp. 267-274
-
-
Kakade, S.1
Langford, J.2
-
29
-
-
84880649215
-
A sparse sampling algorithm for near-optimal planning in large Markovian decision processes
-
M. Kearns, Y. Mansour, and A.Y. Ng. A sparse sampling algorithm for near-optimal planning in large Markovian decision processes. In Proceedings of IJCAI'99, pages 1324-1331, 1999.
-
(1999)
Proceedings of IJCAI'99
, pp. 1324-1331
-
-
Kearns, M.1
Mansour, Y.2
Ng, A.Y.3
-
30
-
-
0015000439
-
Some results on Tchebycheffian spline functions
-
G. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions. J. Math. Anal. Applic., 33:82-95, 1971.
-
(1971)
J. Math. Anal. Applic
, vol.33
, pp. 82-95
-
-
Kimeldorf, G.1
Wahba, G.2
-
32
-
-
0001556720
-
Efficient agnostic learning of neural networks with bounded fan-in
-
W.S. Lee, P.L. Bartlett, and R.C. Williamson. Efficient agnostic learning of neural networks with bounded fan-in. IEEE Transactions on Information Theory, 42(6):2118-2132, 1996.
-
(1996)
IEEE Transactions on Information Theory
, vol.42
, Issue.6
, pp. 2118-2132
-
-
Lee, W.S.1
Bartlett, P.L.2
Williamson, R.C.3
-
33
-
-
0035578679
-
Valuing american options by simulation: A simple least-squares approach
-
F. A. Longstaff and E. S. Shwartz. Valuing american options by simulation: A simple least-squares approach. Rev. Financial Studies, 14(1): 113-147, 2001.
-
(2001)
Rev. Financial Studies
, vol.14
, Issue.1
, pp. 113-147
-
-
Longstaff, F.A.1
Shwartz, E.S.2
-
35
-
-
0345184460
-
Computational advances in dynamic programming
-
Academic Press
-
T.L. Morin. Computational advances in dynamic programming. In Dynamic Programming and its Applications, pages 53-90. Academic Press, 1978.
-
(1978)
Dynamic Programming and its Applications
, pp. 53-90
-
-
Morin, T.L.1
-
40
-
-
0033480745
-
Generalization bounds for function approximation from scattered noisy data
-
P. Niyogi and F. Girosi. Generalization bounds for function approximation from scattered noisy data. Advances in Computational Mathematics, 10:51-80, 1999.
-
(1999)
Advances in Computational Mathematics
, vol.10
, pp. 51-80
-
-
Niyogi, P.1
Girosi, F.2
-
41
-
-
0036832956
-
Kernel-based reinforcement learning
-
D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 49:161-178, 2002.
-
(2002)
Machine Learning
, vol.49
, pp. 161-178
-
-
Ormoneit, D.1
Sen, S.2
-
43
-
-
27144457662
-
Approximate solutions of a discounted Markovian decision problem
-
98: Dynamische Optimierungen:77-92
-
D. Reetz. Approximate solutions of a discounted Markovian decision problem. Bonner Mathematischer Schriften, 98: Dynamische Optimierungen:77-92, 1977.
-
(1977)
Bonner Mathematischer Schriften
-
-
Reetz, D.1
-
44
-
-
33646398129
-
Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method
-
M. Riedmiller. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In 16th European Conference on Machine Learning, pages 317-328, 2005.
-
(2005)
16th European Conference on Machine Learning
, pp. 317-328
-
-
Riedmiller, M.1
-
45
-
-
0002317013
-
Numerical dyanmic programming in economics
-
H. Amman, D. Kendrick, and J. Rust, editors, Elsevier, North Holland
-
J. Rust. Numerical dyanmic programming in economics. In H. Amman, D. Kendrick, and J. Rust, editors, Handbook of Computational Economics. Elsevier, North Holland, 1996a.
-
(1996)
Handbook of Computational Economics
-
-
Rust, J.1
-
46
-
-
0001509947
-
Using randomization to break the curse of dimensionality
-
J. Rust. Using randomization to break the curse of dimensionality. Econometrica, 65:487-516, 1996b.
-
(1996)
Econometrica
, vol.65
, pp. 487-516
-
-
Rust, J.1
-
47
-
-
0001201756
-
Some studies in machine learning using the game of checkers
-
A.L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, pages 210-229, 1959.
-
(1959)
IBM Journal on Research and Development
, pp. 210-229
-
-
Samuel, A.L.1
-
48
-
-
0004242550
-
-
E.A. Feigenbaum and J. Feldman, editors, McGraw-Hill, New York
-
Reprinted in Computers and Thought, E.A. Feigenbaum and J. Feldman, editors, McGraw-Hill, New York, 1963.
-
(1963)
Computers and Thought
-
-
-
49
-
-
0001201757
-
Some studies in machine learning using the game of checkers, II - recent progress
-
A.L. Samuel. Some studies in machine learning using the game of checkers, II - recent progress. IBM Journal on Research and Development, pages 601-617, 1967.
-
(1967)
IBM Journal on Research and Development
, pp. 601-617
-
-
Samuel, A.L.1
-
54
-
-
0000439527
-
Optimal rates of convergence for nonparametric estimators
-
C.J. Stone. Optimal rates of convergence for nonparametric estimators. Annals of Statistics, 8: 1348-1360, 1980.
-
(1980)
Annals of Statistics
, vol.8
, pp. 1348-1360
-
-
Stone, C.J.1
-
55
-
-
0000439527
-
Optimal global rates of convergence for nonparametric regression
-
C.J. Stone. Optimal global rates of convergence for nonparametric regression. Annals of Statistics, 10:1040-1053, 1982.
-
(1982)
Annals of Statistics
, vol.10
, pp. 1040-1053
-
-
Stone, C.J.1
-
56
-
-
0034759906
-
Efficient approximate planning in continuous space Markovian decision problems
-
Cs. Szepesvári. Efficient approximate planning in continuous space Markovian decision problems. AI Communications, 13:163-176, 2001.
-
(2001)
AI Communications
, vol.13
, pp. 163-176
-
-
Szepesvári, C.1
-
57
-
-
44649150245
-
Efficient approximate planning in continuous space Markovian decision problems
-
accepted
-
Cs. Szepesvári. Efficient approximate planning in continuous space Markovian decision problems. Journal of European Artificial Intelligence Research, 2000. accepted.
-
(2000)
Journal of European Artificial Intelligence Research
-
-
Szepesvári, C.1
-
58
-
-
31844456754
-
Finite time bounds for sampling based fitted value iteration
-
Cs. Szepesvári and R. Munos. Finite time bounds for sampling based fitted value iteration. In ICML'2005, pages 881-886, 2005.
-
(2005)
ICML'2005
, pp. 881-886
-
-
Szepesvári, C.1
Munos, R.2
-
60
-
-
0029276036
-
Temporal difference learning and TD-Gammon
-
March
-
G.J. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38: 58-67, March 1995.
-
(1995)
Communications of the ACM
, vol.38
, pp. 58-67
-
-
Tesauro, G.J.1
-
61
-
-
0035391083
-
Regression methods for pricing complex American-style options
-
J. N. Tsitsiklis and Van B. Roy. Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks, 12:694-703, 2001.
-
(2001)
IEEE Transactions on Neural Networks
, vol.12
, pp. 694-703
-
-
Tsitsiklis, J.N.1
Roy, V.B.2
-
62
-
-
0029752470
-
Feature-based methods for large scale dynamic programming
-
J. N. Tsitsiklis and B. Van Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22:59-94, 1996.
-
(1996)
Machine Learning
, vol.22
, pp. 59-94
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
63
-
-
0001024505
-
On the uniform convergence of relative frequencies of events to their probabilities
-
V. N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16:264-280, 1971.
-
(1971)
Theory of Probability and its Applications
, vol.16
, pp. 264-280
-
-
Vapnik, V.N.1
Chervonenkis, A.Y.2
-
65
-
-
0347067948
-
Covering number bounds of certain regularized linear function classes
-
T. Zhang. Covering number bounds of certain regularized linear function classes. Journal of Machine Learning Research, 2:527-550, 2002.
-
(2002)
Journal of Machine Learning Research
, vol.2
, pp. 527-550
-
-
Zhang, T.1
|