메뉴 건너뛰기




Volumn 46, Issue 2, 2007, Pages 541-561

Performance bounds in Lp-norm for approximate value iteration

Author keywords

Dynamic programming; Error analysis; Function approximation; Markov decision processes; Optimal control; Reinforcement learning; Statistical learning

Indexed keywords

DYNAMIC PROGRAMMING; ERROR ANALYSIS; ITERATIVE METHODS; OPTIMAL CONTROL SYSTEMS; REINFORCEMENT LEARNING;

EID: 40949107944     PISSN: 03630129     EISSN: None     Source Type: Journal    
DOI: 10.1137/040614384     Document Type: Article
Times cited : (151)

References (38)
  • 1
    • 33746032553 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    • Springer-Verlag, New York
    • A. ANTOS, CS. SZEPESVARI, AND R. MUNOS, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, in Proceedings of the Conference on Learning Theory, Springer-Verlag, New York, 2006, pp. 574-588.
    • (2006) Proceedings of the Conference on Learning Theory , pp. 574-588
    • ANTOS, A.1    SZEPESVARI, C.2    MUNOS, R.3
  • 4
    • 0003787146 scopus 로고
    • Princeton University Press, Princeton, NJ
    • R. BELLMAN, Dynamic Programming, Princeton University Press, Princeton, NJ, 1957.
    • (1957) Dynamic Programming
    • BELLMAN, R.1
  • 5
    • 84968519017 scopus 로고
    • Functional approximation and dynamic programming
    • R. E. BELLMAN AND S. E. DREYFUS, Functional approximation and dynamic programming, Math. Tables Aids Comput., 13 (1959), pp. 247-251.
    • (1959) Math. Tables Aids Comput , vol.13 , pp. 247-251
    • BELLMAN, R.E.1    DREYFUS, S.E.2
  • 8
    • 0001523794 scopus 로고
    • Strict stationarity of generalized autoregressive processes
    • P. BOUGEROL AND N. PICARD, Strict stationarity of generalized autoregressive processes, Ann. Probab., 20 (1992), pp. 1714-1730.
    • (1992) Ann. Probab , vol.20 , pp. 1714-1730
    • BOUGEROL, P.1    PICARD, N.2
  • 10
    • 0348090400 scopus 로고    scopus 로고
    • The linear programming approach to approximate dynamic programming
    • D. P. DE FARIAS AND B. VAN ROY, The linear programming approach to approximate dynamic programming, Oper. Res., 51 (2003), pp. 850-865.
    • (2003) Oper. Res , vol.51 , pp. 850-865
    • DE FARIAS, D.P.1    VAN ROY, B.2
  • 11
    • 85009724776 scopus 로고    scopus 로고
    • Nonlinear approximation
    • R. DEVORE, Nonlinear approximation, Acta Numer., 7 (1998), pp. 51-150.
    • (1998) Acta Numer , vol.7 , pp. 51-150
    • DEVORE, R.1
  • 18
    • 0006238280 scopus 로고
    • Recurrence conditions for Markov decision processes with Borel state space: A survey
    • O. HERNÁNDEZ-LERMA, R. MONTES- DE-OCA, AND R. CAVAZOS-CANEDA, Recurrence conditions for Markov decision processes with Borel state space: A survey, Ann. Oper. Res., 28 (1991), pp. 29-46.
    • (1991) Ann. Oper. Res , vol.28 , pp. 29-46
    • HERNÁNDEZ-LERMA, O.1    MONTES- DE-OCA, R.2    CAVAZOS-CANEDA, R.3
  • 19
    • 0001144425 scopus 로고
    • On ergodicity and recurrence properties of a Markov chain with an application to an open Jackson network
    • A. HORDIJK AND F. SPIEKSMA, On ergodicity and recurrence properties of a Markov chain with an application to an open Jackson network, Adv. Appl. Probab., 24 (1992), pp. 343-376.
    • (1992) Adv. Appl. Probab , vol.24 , pp. 343-376
    • HORDIJK, A.1    SPIEKSMA, F.2
  • 23
    • 4644323293 scopus 로고    scopus 로고
    • Least-squares policy iteration
    • M. LAGOUDAKIS AND R. PARR, Least-squares policy iteration, J. Mach. Learn. Res., 4 (2003), pp. 1107-1149.
    • (2003) J. Mach. Learn. Res , vol.4 , pp. 1107-1149
    • LAGOUDAKIS, M.1    PARR, R.2
  • 25
  • 27
    • 40849114100 scopus 로고    scopus 로고
    • Finite-Time Bounds for Sampling-Based Fitted Value Iteration
    • Technical report, INRIA, available online from
    • R. MUNOS AND CS. SZEPESVÁRI, Finite-Time Bounds for Sampling-Based Fitted Value Iteration, Technical report, INRIA, 2006; available online from http://hal.inria.fr/inria-00120882.
    • (2006)
    • MUNOS, R.1    SZEPESVÁRI, C.2
  • 31
    • 70350192140 scopus 로고    scopus 로고
    • Numerical dynamic programming in economics
    • Elsevier/North-Holland, Amsterdam
    • J. RUST, Numerical dynamic programming in economics, in Handbook of Computational Economics, Elsevier/North-Holland, Amsterdam, 1996, pp. 619-729.
    • (1996) Handbook of Computational Economics , pp. 619-729
    • RUST, J.1
  • 32
    • 41449084934 scopus 로고    scopus 로고
    • A. L. SAMUEL, Some studies in machine learning using the game of checkers, IBM J. Res. Develop., 3 (1959), pp. 210-229; reprinted in Computers and Thought, E. A. Feigenbaum and J. Feldman, eds., McGraw-Hill, New York, 1963.
    • A. L. SAMUEL, Some studies in machine learning using the game of checkers, IBM J. Res. Develop., 3 (1959), pp. 210-229; reprinted in Computers and Thought, E. A. Feigenbaum and J. Feldman, eds., McGraw-Hill, New York, 1963.
  • 34
    • 31844456754 scopus 로고    scopus 로고
    • CS. SZEPESVARI AND R. MUNOS, Finite time bounds for sampling based fitted value iteration,in Proceedings of the International Conference on Machine Learning, ACM, New York, 2005, pp. 881-886.
    • CS. SZEPESVARI AND R. MUNOS, Finite time bounds for sampling based fitted value iteration,in Proceedings of the International Conference on Machine Learning, ACM, New York, 2005, pp. 881-886.
  • 35
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal difference learning with function approximation
    • J. N. TSITSIKLIS AND B. VAN ROY, An analysis of temporal difference learning with function approximation, IEEE Trans. Automat. Control, 42 (1997), pp. 674-690.
    • (1997) IEEE Trans. Automat. Control , vol.42 , pp. 674-690
    • TSITSIKLIS, J.N.1    VAN ROY, B.2
  • 38
    • 0012252296 scopus 로고
    • Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions
    • Technical report NU-CCS-93-14, Northeastern University, Boston, MA
    • R. J. WILLIAMS AND L. C. BAIRD, Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions, Technical report NU-CCS-93-14, Northeastern University, Boston, MA, 1993.
    • (1993)
    • WILLIAMS, R.J.1    BAIRD, L.C.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.