메뉴 건너뛰기




Volumn 4005 LNAI, Issue , 2006, Pages 574-588

Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATIONAL COMPLEXITY; DECISION MAKING; ITERATIVE METHODS; LEARNING ALGORITHMS; MARKOV PROCESSES;

EID: 33746032553     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/11776420_42     Document Type: Conference Paper
Times cited : (21)

References (23)
  • 2
    • 84941157238 scopus 로고    scopus 로고
    • Learning near-optimal policies with fitted policy iteration and a single sample path: Approximate iterative policy evaluation
    • A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with fitted policy iteration and a single sample path: approximate iterative policy evaluation, (submitted to ICML'2006, 2006.
    • (2006) ICML'2006
    • Antos, A.1    Szepesvári, Cs.2    Munos, R.3
  • 8
    • 0003161174 scopus 로고
    • Rates of convergence for empirical processes of stationary mixing sequences
    • January
    • B. Yu. Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, 22(1):94-116, January 1994.
    • (1994) The Annals of Probability , vol.22 , Issue.1 , pp. 94-116
    • Yu, B.1
  • 9
    • 0030489341 scopus 로고    scopus 로고
    • Histogram regression estimation using data-dependent partitions
    • A. Nobel. Histogram regression estimation using data-dependent partitions. Annals of Statistics, 24(3):1084-1105, 1996.
    • (1996) Annals of Statistics , vol.24 , Issue.3 , pp. 1084-1105
    • Nobel, A.1
  • 10
    • 0000996139 scopus 로고
    • Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension
    • D. Haussler. Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory Series A, 69:217-232, 1995.
    • (1995) Journal of Combinatorial Theory Series A , vol.69 , pp. 217-232
    • Haussler, D.1
  • 11
    • 0001201756 scopus 로고
    • Some studies in machine learning using the game of checkers
    • A.L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, pages 210-229, 1959.
    • (1959) IBM Journal on Research and Development , pp. 210-229
    • Samuel, A.L.1
  • 12
    • 0004242550 scopus 로고
    • Reprinted, E.A. Feigenbaum and J. Feldman, editors, McGraw-Hill, New York
    • Reprinted in Computers and Thought, E.A. Feigenbaum and J. Feldman, editors, McGraw-Hill, New York, 1963.
    • (1963) Computers and Thought
  • 15
    • 0008321896 scopus 로고    scopus 로고
    • Reinforcement learning: An introduction
    • Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An introduction. Bradford Book, 1998.
    • (1998) Bradford Book
    • Sutton, R.S.1    Barto, A.G.2
  • 16
    • 84880694195 scopus 로고
    • Stable function approximation in dynamic programming
    • Armand Prieditis and Stuart Russell, editors, San Francisco, CA. Morgan Kaufmann
    • Geoffrey J. Gordon. Stable function approximation in dynamic programming. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 261-268, San Francisco, CA, 1995. Morgan Kaufmann.
    • (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 261-268
    • Gordon, G.J.1
  • 17
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • J. N. Tsitsiklis and B. Van Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22:59-94, 1996.
    • (1996) Machine Learning , vol.22 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 21
    • 84899029004 scopus 로고    scopus 로고
    • Batch value function approximation via support vectors
    • T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Cambridge, MA. MIT Press
    • T. G. Dietterich and X. Wang. Batch value function approximation via support vectors. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press.
    • (2002) Advances in Neural Information Processing Systems , vol.14
    • Dietterich, T.G.1    Wang, X.2
  • 22
    • 31844456754 scopus 로고    scopus 로고
    • Finite time bounds for sampling based fitted value iteration
    • Cs. Szepesvári and R. Munos. Finite time bounds for sampling based fitted value iteration. In ICML'2005, 2005.
    • (2005) ICML'2005
    • Szepesvári, Cs.1    Munos, R.2
  • 23
    • 0033904367 scopus 로고    scopus 로고
    • Nonparametric time series prediction through adaptive model selection
    • April
    • R. Meir. Nonparametric time series prediction through adaptive model selection. Machine Learning, 39(1):5-34, April 2000.
    • (2000) Machine Learning , vol.39 , Issue.1 , pp. 5-34
    • Meir, R.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.