메뉴 건너뛰기




Volumn , Issue , 2002, Pages 1595-1602

A Convergent Form of Approximate Policy Iteration

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATIONAL RESOURCES; CONVERGENCE RESULTS; FREEFORMS; LIPSCHITZ CONTINUOUS; MODEL FREE; POLICY EVALUATION; POLICY ITERATION; POLICY ITERATION ALGORITHMS; POLICY-BASED; VALUE FUNCTION APPROXIMATION;

EID: 22944468429     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (38)

References (17)
  • 2
  • 5
    • 0034342516 scopus 로고    scopus 로고
    • On the existence of fixed points for approximate value iteration and temporal-difference learning
    • D. P. De Farias and B. Van Roy. On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Opt. Theory and Applications, 105(3), 2000.
    • (2000) Journal of Opt. Theory and Applications , vol.105 , Issue.3
    • De Farias, D. P.1    Van Roy, B.2
  • 8
    • 84898995808 scopus 로고    scopus 로고
    • Reinforcement learning with function approximation converges to a region
    • MIT Press
    • G. J. Gordon. Reinforcement learning with function approximation converges to a region. Advances in Neural Information Processing Systems 13, pages 1040-1046. MIT Press, 2001.
    • (2001) Advances in Neural Information Processing Systems , vol.13 , pp. 1040-1046
    • Gordon, G. J.1
  • 12
    • 0037886159 scopus 로고
    • Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite markov chains
    • W. J. Stewart, editor, Dekker, NY
    • E. Seneta. Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite markov chains. In W. J. Stewart, editor, Numerical Solutions of Markov Chains. Dekker, NY, 1991.
    • (1991) Numerical Solutions of Markov Chains
    • Seneta, E.1
  • 13
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvari. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3):287-308, 2000.
    • (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
    • Singh, S.1    Jaakkola, T.2    Littman, M. L.3    Szepesvari, C.4
  • 15
    • 0000985504 scopus 로고
    • TD-Gammon, a self-teaching backgammon program, achieves master-level play
    • G. J. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.
    • (1994) Neural Computation , vol.6 , Issue.2 , pp. 215-219
    • Tesauro, G. J.1
  • 16
    • 0033351917 scopus 로고    scopus 로고
    • Optimal stopping of markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
    • J. N. Tsitsiklis and B. Van Roy. Optimal stopping of markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Transactions on Automatic Control, 44(10):1840-1851, 1999.
    • (1999) IEEE Transactions on Automatic Control , vol.44 , Issue.10 , pp. 1840-1851
    • Tsitsiklis, J. N.1    Van Roy, B.2
  • 17
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674-690, 1997.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J. N.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.