메뉴 건너뛰기




Volumn 9, Issue , 2008, Pages 815-857

Finite-time bounds for fitted value iteration

Author keywords

Discounted markovian decision processes; Fitted value iteration; Generative model; Optimal control; Pollard's inequality; Regression; Reinforcement learning; Statistical learning theory; Supervised learning

Indexed keywords

CONVERGENCE OF NUMERICAL METHODS; ELECTRIC NETWORK ANALYSIS; MODAL ANALYSIS; RISK ASSESSMENT; ROBOT LEARNING; STATE SPACE METHODS;

EID: 44649189852     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (575)

References (66)
  • 2
    • 33746032553 scopus 로고    scopus 로고
    • A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. In G. Lugosi and H.U. Simon, editors, The Nineteenth Annual Conference on Learning Theory, COLT 2006, Proceedings, 4005 of LNCS/LNAI, pages 574-588, Berlin, Heidelberg, June 2006. Springer-Verlag. (Pittsburgh, PA, USA, June 22-25, 2006.).
    • A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. In G. Lugosi and H.U. Simon, editors, The Nineteenth Annual Conference on Learning Theory, COLT 2006, Proceedings, volume 4005 of LNCS/LNAI, pages 574-588, Berlin, Heidelberg, June 2006. Springer-Verlag. (Pittsburgh, PA, USA, June 22-25, 2006.).
  • 4
    • 40849145988 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    • A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71:89-129, 2008.
    • (2008) Machine Learning , vol.71 , pp. 89-129
    • Antos, A.1    Szepesvári, C.2    Munos, R.3
  • 5
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • Armand Prieditis and Stuart Russell, editors, San Francisco, CA, Morgan Kaufmann
    • Leemon C. Baird. Residual algorithms: Reinforcement learning with function approximation. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 30-37, San Francisco, CA, 1995. Morgan Kaufmann.
    • (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 30-37
    • Baird, L.C.1
  • 9
    • 0001523794 scopus 로고
    • Strict stationarity of generalized autoregressive processes
    • P. Bougerol and N. Picard. Strict stationarity of generalized autoregressive processes. Annals of Probability, 20:1714-1730, 1992.
    • (1992) Annals of Probability , vol.20 , pp. 1714-1730
    • Bougerol, P.1    Picard, N.2
  • 11
  • 12
    • 0026206780 scopus 로고
    • An optimal multigrid algorithm for continuous state discrete time stochastic control
    • C.S. Chow and J.N. Tsitsiklis. An optimal multigrid algorithm for continuous state discrete time stochastic control. IEEE Transactions on Automatic Control, 36(8):898-914, 1991.
    • (1991) IEEE Transactions on Automatic Control , vol.36 , Issue.8 , pp. 898-914
    • Chow, C.S.1    Tsitsiklis, J.N.2
  • 15
    • 0002319896 scopus 로고    scopus 로고
    • Nonlinear Approximation
    • R. DeVore. Nonlinear Approximation. Acta Numerica, 1997.
    • (1997) Acta Numerica
    • DeVore, R.1
  • 16
    • 84899029004 scopus 로고    scopus 로고
    • Batch value function approximation via support vectors
    • T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Cambridge, MA, MIT Press
    • T. G. Dietterich and X. Wang. Batch value function approximation via support vectors. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press.
    • (2002) Advances in Neural Information Processing Systems 14
    • Dietterich, T.G.1    Wang, X.2
  • 19
    • 84880694195 scopus 로고
    • Stable function approximation in dynamic programming
    • Armand Prieditis and Stuart Russell, editors, San Francisco, CA, Morgan Kaufmann
    • G.J. Gordon. Stable function approximation in dynamic programming. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 261-268, San Francisco, CA, 1995. Morgan Kaufmann.
    • (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 261-268
    • Gordon, G.J.1
  • 20
    • 2342446663 scopus 로고    scopus 로고
    • A reinforcement learning algorithm based on policy iteration for average reward: Empirical results with yield management and convergence analysis
    • A. Gosavi. A reinforcement learning algorithm based on policy iteration for average reward: Empirical results with yield management and convergence analysis. Machine Learning, 55:5-29, 2004.
    • (2004) Machine Learning , vol.55 , pp. 5-29
    • Gosavi, A.1
  • 23
  • 24
    • 0000996139 scopus 로고
    • Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension
    • D. Haussler. Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory, Series A, 69(2):217-232, 1995.
    • (1995) Journal of Combinatorial Theory, Series A , vol.69 , Issue.2 , pp. 217-232
    • Haussler, D.1
  • 25
  • 26
    • 22944487667 scopus 로고    scopus 로고
    • Experiments in value function approximation with sparse support vector regression
    • T. Jung and T. Uthmann. Experiments in value function approximation with sparse support vector regression. In ECML, pages 180-191, 2004.
    • (2004) ECML , pp. 180-191
    • Jung, T.1    Uthmann, T.2
  • 27
  • 29
    • 84880649215 scopus 로고    scopus 로고
    • A sparse sampling algorithm for near-optimal planning in large Markovian decision processes
    • M. Kearns, Y. Mansour, and A.Y. Ng. A sparse sampling algorithm for near-optimal planning in large Markovian decision processes. In Proceedings of IJCAI'99, pages 1324-1331, 1999.
    • (1999) Proceedings of IJCAI'99 , pp. 1324-1331
    • Kearns, M.1    Mansour, Y.2    Ng, A.Y.3
  • 30
    • 0015000439 scopus 로고
    • Some results on Tchebycheffian spline functions
    • G. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions. J. Math. Anal. Applic., 33:82-95, 1971.
    • (1971) J. Math. Anal. Applic , vol.33 , pp. 82-95
    • Kimeldorf, G.1    Wahba, G.2
  • 33
    • 0035578679 scopus 로고    scopus 로고
    • Valuing american options by simulation: A simple least-squares approach
    • F. A. Longstaff and E. S. Shwartz. Valuing american options by simulation: A simple least-squares approach. Rev. Financial Studies, 14(1): 113-147, 2001.
    • (2001) Rev. Financial Studies , vol.14 , Issue.1 , pp. 113-147
    • Longstaff, F.A.1    Shwartz, E.S.2
  • 35
    • 0345184460 scopus 로고
    • Computational advances in dynamic programming
    • Academic Press
    • T.L. Morin. Computational advances in dynamic programming. In Dynamic Programming and its Applications, pages 53-90. Academic Press, 1978.
    • (1978) Dynamic Programming and its Applications , pp. 53-90
    • Morin, T.L.1
  • 40
    • 0033480745 scopus 로고    scopus 로고
    • Generalization bounds for function approximation from scattered noisy data
    • P. Niyogi and F. Girosi. Generalization bounds for function approximation from scattered noisy data. Advances in Computational Mathematics, 10:51-80, 1999.
    • (1999) Advances in Computational Mathematics , vol.10 , pp. 51-80
    • Niyogi, P.1    Girosi, F.2
  • 41
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 49:161-178, 2002.
    • (2002) Machine Learning , vol.49 , pp. 161-178
    • Ormoneit, D.1    Sen, S.2
  • 43
    • 27144457662 scopus 로고
    • Approximate solutions of a discounted Markovian decision problem
    • 98: Dynamische Optimierungen:77-92
    • D. Reetz. Approximate solutions of a discounted Markovian decision problem. Bonner Mathematischer Schriften, 98: Dynamische Optimierungen:77-92, 1977.
    • (1977) Bonner Mathematischer Schriften
    • Reetz, D.1
  • 44
    • 33646398129 scopus 로고    scopus 로고
    • Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method
    • M. Riedmiller. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In 16th European Conference on Machine Learning, pages 317-328, 2005.
    • (2005) 16th European Conference on Machine Learning , pp. 317-328
    • Riedmiller, M.1
  • 45
    • 0002317013 scopus 로고    scopus 로고
    • Numerical dyanmic programming in economics
    • H. Amman, D. Kendrick, and J. Rust, editors, Elsevier, North Holland
    • J. Rust. Numerical dyanmic programming in economics. In H. Amman, D. Kendrick, and J. Rust, editors, Handbook of Computational Economics. Elsevier, North Holland, 1996a.
    • (1996) Handbook of Computational Economics
    • Rust, J.1
  • 46
    • 0001509947 scopus 로고    scopus 로고
    • Using randomization to break the curse of dimensionality
    • J. Rust. Using randomization to break the curse of dimensionality. Econometrica, 65:487-516, 1996b.
    • (1996) Econometrica , vol.65 , pp. 487-516
    • Rust, J.1
  • 47
    • 0001201756 scopus 로고
    • Some studies in machine learning using the game of checkers
    • A.L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, pages 210-229, 1959.
    • (1959) IBM Journal on Research and Development , pp. 210-229
    • Samuel, A.L.1
  • 48
    • 0004242550 scopus 로고
    • E.A. Feigenbaum and J. Feldman, editors, McGraw-Hill, New York
    • Reprinted in Computers and Thought, E.A. Feigenbaum and J. Feldman, editors, McGraw-Hill, New York, 1963.
    • (1963) Computers and Thought
  • 49
    • 0001201757 scopus 로고
    • Some studies in machine learning using the game of checkers, II - recent progress
    • A.L. Samuel. Some studies in machine learning using the game of checkers, II - recent progress. IBM Journal on Research and Development, pages 601-617, 1967.
    • (1967) IBM Journal on Research and Development , pp. 601-617
    • Samuel, A.L.1
  • 54
    • 0000439527 scopus 로고
    • Optimal rates of convergence for nonparametric estimators
    • C.J. Stone. Optimal rates of convergence for nonparametric estimators. Annals of Statistics, 8: 1348-1360, 1980.
    • (1980) Annals of Statistics , vol.8 , pp. 1348-1360
    • Stone, C.J.1
  • 55
    • 0000439527 scopus 로고
    • Optimal global rates of convergence for nonparametric regression
    • C.J. Stone. Optimal global rates of convergence for nonparametric regression. Annals of Statistics, 10:1040-1053, 1982.
    • (1982) Annals of Statistics , vol.10 , pp. 1040-1053
    • Stone, C.J.1
  • 56
    • 0034759906 scopus 로고    scopus 로고
    • Efficient approximate planning in continuous space Markovian decision problems
    • Cs. Szepesvári. Efficient approximate planning in continuous space Markovian decision problems. AI Communications, 13:163-176, 2001.
    • (2001) AI Communications , vol.13 , pp. 163-176
    • Szepesvári, C.1
  • 57
    • 44649150245 scopus 로고    scopus 로고
    • Efficient approximate planning in continuous space Markovian decision problems
    • accepted
    • Cs. Szepesvári. Efficient approximate planning in continuous space Markovian decision problems. Journal of European Artificial Intelligence Research, 2000. accepted.
    • (2000) Journal of European Artificial Intelligence Research
    • Szepesvári, C.1
  • 58
    • 31844456754 scopus 로고    scopus 로고
    • Finite time bounds for sampling based fitted value iteration
    • Cs. Szepesvári and R. Munos. Finite time bounds for sampling based fitted value iteration. In ICML'2005, pages 881-886, 2005.
    • (2005) ICML'2005 , pp. 881-886
    • Szepesvári, C.1    Munos, R.2
  • 60
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • March
    • G.J. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38: 58-67, March 1995.
    • (1995) Communications of the ACM , vol.38 , pp. 58-67
    • Tesauro, G.J.1
  • 61
    • 0035391083 scopus 로고    scopus 로고
    • Regression methods for pricing complex American-style options
    • J. N. Tsitsiklis and Van B. Roy. Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks, 12:694-703, 2001.
    • (2001) IEEE Transactions on Neural Networks , vol.12 , pp. 694-703
    • Tsitsiklis, J.N.1    Roy, V.B.2
  • 62
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • J. N. Tsitsiklis and B. Van Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22:59-94, 1996.
    • (1996) Machine Learning , vol.22 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 63
    • 0001024505 scopus 로고
    • On the uniform convergence of relative frequencies of events to their probabilities
    • V. N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16:264-280, 1971.
    • (1971) Theory of Probability and its Applications , vol.16 , pp. 264-280
    • Vapnik, V.N.1    Chervonenkis, A.Y.2
  • 65
    • 0347067948 scopus 로고    scopus 로고
    • Covering number bounds of certain regularized linear function classes
    • T. Zhang. Covering number bounds of certain regularized linear function classes. Journal of Machine Learning Research, 2:527-550, 2002.
    • (2002) Journal of Machine Learning Research , vol.2 , pp. 527-550
    • Zhang, T.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.