SCOPUS 정보 검색 플랫폼

SIAM Journal on Control and Optimization

Volumn 46, Issue 2, 2007, Pages 541-561

Performance bounds in Lp-norm for approximate value iteration

Author keywords

Dynamic programming; Error analysis; Function approximation; Markov decision processes; Optimal control; Reinforcement learning; Statistical learning

Indexed keywords

DYNAMIC PROGRAMMING; ERROR ANALYSIS; ITERATIVE METHODS; OPTIMAL CONTROL SYSTEMS; REINFORCEMENT LEARNING;

FUNCTION APPROXIMATION; STATISTICAL LEARNING;

MARKOV PROCESSES;

EID: 40949107944 PISSN: 03630129 EISSN: None Source Type: Journal
DOI: 10.1137/040614384 Document Type: Article

Times cited : (151)

References (38)

1
- 33746032553
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- Springer-Verlag, New York
- A. ANTOS, CS. SZEPESVARI, AND R. MUNOS, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, in Proceedings of the Conference on Learning Theory, Springer-Verlag, New York, 2006, pp. 574-588.
- (2006) Proceedings of the Conference on Learning Theory , pp. 574-588
- ANTOS, A.¹ SZEPESVARI, C.² MUNOS, R.³

2
- 0031074521
- AI Rev, 11
- C. G. ATKESON, A. W. MOORE, AND S. A. SCHAAL, Locally weighted learning, AI Rev., 11 (1997), pp. 11-73.
- (1997) Locally weighted learning , pp. 11-73
- ATKESON, C.G.¹ MOORE, A.W.² SCHAAL, S.A.³

3
- 0031073475
- AI Rev, 11
- C. G. ATKESON. A. W. MOORE, AND S. A. SCHAAL, Locally weighted learning for control, AI Rev., 11 (1997), pp. 75-113.
- (1997) Locally weighted learning for control , pp. 75-113
- ATKESON, C.G.¹ MOORE, A.W.² SCHAAL, S.A.³

4
- 0003787146
- Princeton University Press, Princeton, NJ
- R. BELLMAN, Dynamic Programming, Princeton University Press, Princeton, NJ, 1957.
- (1957) Dynamic Programming
- BELLMAN, R.¹

5
- 84968519017
- Functional approximation and dynamic programming
- R. E. BELLMAN AND S. E. DREYFUS, Functional approximation and dynamic programming, Math. Tables Aids Comput., 13 (1959), pp. 247-251.
- (1959) Math. Tables Aids Comput , vol.13 , pp. 247-251
- BELLMAN, R.E.¹ DREYFUS, S.E.²

6
- 0003565779
- Prentice-Hall, Englewood Cliffs, NJ
- D. P. BERTSEKAS, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Englewood Cliffs, NJ, 1987.
- (1987) Dynamic Programming: Deterministic and Stochastic Models
- BERTSEKAS, D.P.¹

7
- 0003487482
- Athena Scientific, Nashua, NH
- D. P. BERTSEKAS AND J. TSITSIKLIS, Neuro-Dynamic Programming, Athena Scientific, Nashua, NH, 1996.
- (1996) Neuro-Dynamic Programming
- BERTSEKAS, D.P.¹ TSITSIKLIS, J.²

8
- 0001523794
- Strict stationarity of generalized autoregressive processes
- P. BOUGEROL AND N. PICARD, Strict stationarity of generalized autoregressive processes, Ann. Probab., 20 (1992), pp. 1714-1730.
- (1992) Ann. Probab , vol.20 , pp. 1714-1730
- BOUGEROL, P.¹ PICARD, N.²

9
- 0031541839
- Adaptive greedy approximations
- G. M. DAVIES, S. MALLAT, AND M. AVELLANEDA, Adaptive greedy approximations, J. Constr. Approx., 13 (1997), pp. 57-98.
- (1997) J. Constr. Approx , vol.13 , pp. 57-98
- DAVIES, G.M.¹ MALLAT, S.² AVELLANEDA, M.³

10
- 0348090400
- The linear programming approach to approximate dynamic programming
- D. P. DE FARIAS AND B. VAN ROY, The linear programming approach to approximate dynamic programming, Oper. Res., 51 (2003), pp. 850-865.
- (2003) Oper. Res , vol.51 , pp. 850-865
- DE FARIAS, D.P.¹ VAN ROY, B.²

11
- 85009724776
- Nonlinear approximation
- R. DEVORE, Nonlinear approximation, Acta Numer., 7 (1998), pp. 51-150.
- (1998) Acta Numer , vol.7 , pp. 51-150
- DEVORE, R.¹

12
- 84880694195
- Stable function approximation in dynamic programming
- Morgan Kaufmann
- G. GORDON, Stable function approximation in dynamic programming, in Proceedings of the International Conference on Machine Learning, Morgan Kaufmann, 1995, pp. 261-268.
- (1995) Proceedings of the International Conference on Machine Learning , pp. 261-268
- GORDON, G.¹

13
- 0003989207
- Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA
- G. J. GORDON, Approximate Solutions to Markov Decision Processes, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, 1999.
- (1999) Approximate Solutions to Markov Decision Processes
- GORDON, G.J.¹

14
- 84880898477
- Max-norm projections for factored MDPs
- Lawrence Erlbaum
- C. GUESTRIN, D. KOLLER, AND R. PARR, Max-norm projections for factored MDPs, in Proceedings of the International Joint Conference on Artificial Intelligence, Lawrence Erlbaum, 2001, pp. 673-682.
- (2001) Proceedings of the International Joint Conference on Artificial Intelligence , pp. 673-682
- GUESTRIN, C.¹ KOLLER, D.² PARR, R.³

15
- 0003684449
- The Elements of Statistical Learning
- Springer-Verlag, New York
- T. HASTIE, R. TIBSHIRANI, AND J. FRIEDMAN, The Elements of Statistical Learning, Springer Ser. Statist., Springer-Verlag, New York, 2001.
- (2001) Springer Ser. Statist
- HASTIE, T.¹ TIBSHIRANI, R.² FRIEDMAN, J.³

16
- 85075781529
- Springer-Verlag, New York
- O. HERNÁNDEZ-LERMA AND J. B. LASSERRE, Discrete-Time Markov Control Processes, Basic Optimality Criteria, Springer-Verlag, New York, 1996.
- (1996) Discrete-Time Markov Control Processes, Basic Optimality Criteria
- HERNÁNDEZ-LERMA, O.¹ LASSERRE, J.B.²

17
- 0003952176
- Springer-Verlag, New York
- O. HERNÁNDEZ-LERMA AND J. B. LASSERRE, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1999.
- (1999) Further Topics on Discrete-Time Markov Control Processes
- HERNÁNDEZ-LERMA, O.¹ LASSERRE, J.B.²

18
- 0006238280
- Recurrence conditions for Markov decision processes with Borel state space: A survey
- O. HERNÁNDEZ-LERMA, R. MONTES- DE-OCA, AND R. CAVAZOS-CANEDA, Recurrence conditions for Markov decision processes with Borel state space: A survey, Ann. Oper. Res., 28 (1991), pp. 29-46.
- (1991) Ann. Oper. Res , vol.28 , pp. 29-46
- HERNÁNDEZ-LERMA, O.¹ MONTES- DE-OCA, R.² CAVAZOS-CANEDA, R.³

19
- 0001144425
- On ergodicity and recurrence properties of a Markov chain with an application to an open Jackson network
- A. HORDIJK AND F. SPIEKSMA, On ergodicity and recurrence properties of a Markov chain with an application to an open Jackson network, Adv. Appl. Probab., 24 (1992), pp. 343-376.
- (1992) Adv. Appl. Probab , vol.24 , pp. 343-376
- HORDIJK, A.¹ SPIEKSMA, F.²

20
- 23244466805
- Ph.D. thesis, University College London
- S. M. KAKADE, On the Sample Complexity of Reinforcement Learning, Ph.D. thesis, University College London, 2003.
- (2003) On the Sample Complexity of Reinforcement Learning
- KAKADE, S.M.¹

21
- 1942514728
- Approximately optimal approximate reinforcement learning
- Morgan Kaufmann
- S. KAKADE AND J. LANGFORD, Approximately optimal approximate reinforcement learning, in Proceedings of the 19th International Conference on Machine Learning, Morgan Kaufmann, 2002, pp. 267-274.
- (2002) Proceedings of the 19th International Conference on Machine Learning , pp. 267-274
- KAKADE, S.¹ LANGFORD, J.²

22
- 0010359703
- Policy iteration for factored MDPs
- Morgan Kaufmann
- D. KOLLER AND R. PARR, Policy iteration for factored MDPs, in Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, 2000, pp. 326-334.
- (2000) Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence , pp. 326-334
- KOLLER, D.¹ PARR, R.²

23
- 4644323293
- Least-squares policy iteration
- M. LAGOUDAKIS AND R. PARR, Least-squares policy iteration, J. Mach. Learn. Res., 4 (2003), pp. 1107-1149.
- (2003) J. Mach. Learn. Res , vol.4 , pp. 1107-1149
- LAGOUDAKIS, M.¹ PARR, R.²

24
- 0003456805
- Academic Press, San Diego, CA
- S. MALLAT, A Wavelet Tour of Signal Processing, Academic Press, San Diego, CA, 1997.
- (1997) A Wavelet Tour of Signal Processing
- MALLAT, S.¹

25
- 28244499470
- Stability, performance evaluation, and optimization
- Kluwer Academic, Boston, MA
- S. P. MEYN, Stability, performance evaluation, and optimization, in Handbook of Markov Decision Processes: Methods and Applications, Kluwer Academic, Boston, MA, 2002, pp. 305-346.
- (2002) Handbook of Markov Decision Processes: Methods and Applications , pp. 305-346
- MEYN, S.P.¹

26
- 1942516880
- Error bounds for approximate policy iteration
- AAAI Press
- R. MUNOS, Error bounds for approximate policy iteration, in Proceedings of the 19th International Conference on Machine Learning, AAAI Press, 2003, pp. 560-567.
- (2003) Proceedings of the 19th International Conference on Machine Learning , pp. 560-567
- MUNOS, R.¹

27
- 40849114100
- Finite-Time Bounds for Sampling-Based Fitted Value Iteration
- Technical report, INRIA, available online from
- R. MUNOS AND CS. SZEPESVÁRI, Finite-Time Bounds for Sampling-Based Fitted Value Iteration, Technical report, INRIA, 2006; available online from http://hal.inria.fr/inria-00120882.
- (2006)
- MUNOS, R.¹ SZEPESVÁRI, C.²

28
- 0003983125
- Springer-Verlag, New York
- D. POLLARD, Convergence of Stochastic Processes, Springer-Verlag, New York, 1984.
- (1984) Convergence of Stochastic Processes
- POLLARD, D.¹

29
- 85102627959
- Wiley-Interscience, New York
- M. L. PUTERMAN, Markov Decision Processes, Discrete Stochastic Dynamic Programming, Wiley-Interscience, New York, 1994.
- (1994) Markov Decision Processes, Discrete Stochastic Dynamic Programming
- PUTERMAN, M.L.¹

30
- 25444448065
- MIT Press, Cambridge, MA
- E. RASMUSSEN AND C. K. I. WILLIAMS, Gaussian Processes for Machine Learning, MIT Press, Cambridge, MA, 2005.
- (2005) Gaussian Processes for Machine Learning
- RASMUSSEN, E.¹ WILLIAMS, C.K.I.²

31
- 70350192140
- Numerical dynamic programming in economics
- Elsevier/North-Holland, Amsterdam
- J. RUST, Numerical dynamic programming in economics, in Handbook of Computational Economics, Elsevier/North-Holland, Amsterdam, 1996, pp. 619-729.
- (1996) Handbook of Computational Economics , pp. 619-729
- RUST, J.¹

32
- 41449084934
- A. L. SAMUEL, Some studies in machine learning using the game of checkers, IBM J. Res. Develop., 3 (1959), pp. 210-229; reprinted in Computers and Thought, E. A. Feigenbaum and J. Feldman, eds., McGraw-Hill, New York, 1963.
- A. L. SAMUEL, Some studies in machine learning using the game of checkers, IBM J. Res. Develop., 3 (1959), pp. 210-229; reprinted in Computers and Thought, E. A. Feigenbaum and J. Feldman, eds., McGraw-Hill, New York, 1963.

33
- 0004102479
- Bradford Book, MIT Press, Cambridge, MA
- R. S. SUTTON AND A. G. BARTO, Reinforcement Learning: An Introduction, Bradford Book, MIT Press, Cambridge, MA, 1998.
- (1998) Reinforcement Learning: An Introduction
- SUTTON, R.S.¹ BARTO, A.G.²

34
- 31844456754
- CS. SZEPESVARI AND R. MUNOS, Finite time bounds for sampling based fitted value iteration,in Proceedings of the International Conference on Machine Learning, ACM, New York, 2005, pp. 881-886.
- CS. SZEPESVARI AND R. MUNOS, Finite time bounds for sampling based fitted value iteration,in Proceedings of the International Conference on Machine Learning, ACM, New York, 2005, pp. 881-886.

35
- 0031143730
- An analysis of temporal difference learning with function approximation
- J. N. TSITSIKLIS AND B. VAN ROY, An analysis of temporal difference learning with function approximation, IEEE Trans. Automat. Control, 42 (1997), pp. 674-690.
- (1997) IEEE Trans. Automat. Control , vol.42 , pp. 674-690
- TSITSIKLIS, J.N.¹ VAN ROY, B.²

36
- 0003991806
- John Wiley & Sons, New York
- V. VAPNIK, Statistical Learning Theory, John Wiley & Sons, New York, 1998.
- (1998) Statistical Learning Theory
- VAPNIK, V.¹

37
- 84887252594
- Support vector method for function approximation, regression estimation and signal processing
- V. VAPNIK, S. E. GOLOWICH, AND A. SMOLA, Support vector method for function approximation, regression estimation and signal processing, in Advances in Neural Information Processing Systems, 1997, pp. 281-287.
- (1997) in Advances in Neural Information Processing Systems , pp. 281-287
- VAPNIK, V.¹ GOLOWICH, S.E.² SMOLA, A.³

38
- 0012252296
- Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions
- Technical report NU-CCS-93-14, Northeastern University, Boston, MA
- R. J. WILLIAMS AND L. C. BAIRD, Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions, Technical report NU-CCS-93-14, Northeastern University, Boston, MA, 1993.
- (1993)
- WILLIAMS, R.J.¹ BAIRD, L.C.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.