메뉴 건너뛰기




Volumn , Issue , 2007, Pages 330-337

Value-iteration based fitted policy iteration: Learning with a single trajectory

Author keywords

[No Author keywords available]

Indexed keywords

DECISION THEORY; ITERATIVE METHODS; MARKOV PROCESSES; POLYNOMIALS; PROBLEM SOLVING; PUBLIC POLICY;

EID: 34548752490     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ADPRL.2007.368207     Document Type: Conference Paper
Times cited : (34)

References (17)
  • 3
    • 31844456754 scopus 로고    scopus 로고
    • Finite time bounds for sampling based fitted value iteration
    • Cs. Szepesvári and R. Munos. Finite time bounds for sampling based fitted value iteration. In ICML'2005, pages 881-886, 2005.
    • (2005) ICML'2005 , pp. 881-886
    • Szepesvári, C.1    Munos, R.2
  • 5
    • 0003161174 scopus 로고
    • Rates of convergence for empirical processes of stationary mixing sequences
    • January
    • B. Yu. Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, 22(1):94-116, January 1994.
    • (1994) The Annals of Probability , vol.22 , Issue.1 , pp. 94-116
    • Yu, B.1
  • 6
    • 0033904367 scopus 로고    scopus 로고
    • Nonparametric time series prediction through adaptive model selection
    • April
    • R. Meir. Nonparametric time series prediction through adaptive model selection. Machine Learning, 39(1):5-34, April 2000.
    • (2000) Machine Learning , vol.39 , Issue.1 , pp. 5-34
    • Meir, R.1
  • 7
    • 33746032553 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    • A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. In COLT-19, pages 574-588, 2006.
    • (2006) COLT-19 , pp. 574-588
    • Antos, A.1    Szepesvári, C.2    Munos, R.3
  • 9
    • 34548725480 scopus 로고    scopus 로고
    • Approximate action-value iteration in continuous state spaces: Learning with a single trajectory
    • submitted
    • A. Antos, Cs. Szepesvári, and R. Munos. Approximate action-value iteration in continuous state spaces: learning with a single trajectory. (submitted), 2006.
    • (2006)
    • Antos, A.1    Szepesvári, C.2    Munos, R.3
  • 12
    • 0030489341 scopus 로고    scopus 로고
    • Histogram regression estimation using data-dependent partitions
    • A. Nobel. Histogram regression estimation using data-dependent partitions. Annals of Statistics, 24(3): 1084-1105, 1996.
    • (1996) Annals of Statistics , vol.24 , Issue.3 , pp. 1084-1105
    • Nobel, A.1
  • 13
    • 0000996139 scopus 로고
    • Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension
    • D. Haussler. Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory Series A, 69:217-232, 1995.
    • (1995) Journal of Combinatorial Theory Series A , vol.69 , pp. 217-232
    • Haussler, D.1
  • 14
    • 1942516880 scopus 로고    scopus 로고
    • Error bounds for approximate policy iteration
    • R. Munos. Error bounds for approximate policy iteration. In ICML'2003, pages 560-567, 2003.
    • (2003) ICML'2003 , pp. 560-567
    • Munos, R.1
  • 16
    • 0026206780 scopus 로고
    • An optimal multlgrid algorithm for continuous state discrete time stochastic control
    • C.S. Chow and J.N. Tsitsiklis. An optimal multlgrid algorithm for continuous state discrete time stochastic control. IEEE Transactions on Automatic Control, 36(8):898-914, 1991.
    • (1991) IEEE Transactions on Automatic Control , vol.36 , Issue.8 , pp. 898-914
    • Chow, C.S.1    Tsitsiklis, J.N.2
  • 17
    • 0001523794 scopus 로고
    • Strict stationarity of generalized autoregressive processes
    • P. Bougerol and N. Picard. Strict stationarity of generalized autoregressive processes. Annals of Probability, 20:1714-1730, 1992.
    • (1992) Annals of Probability , vol.20 , pp. 1714-1730
    • Bougerol, P.1    Picard, N.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.