메뉴 건너뛰기




Volumn 10, Issue , 2009, Pages 1955-1988

Provably efficient learning with typed parametric models

Author keywords

Provably efficient learning; Reinforcement learning

Indexed keywords

EFFICIENT LEARNING; MARKOV DECISION PROCESSES; PARAMETRIC MODELS; PROBABLY APPROXIMATELY CORRECT; PROVABLY EFFICIENT LEARNING; REAL WORLD DOMAIN; REAL-WORLD; ROBOT NAVIGATION; SAMPLE COMPLEXITY; SAMPLE COMPLEXITY BOUNDS; SMALL ROBOTS; STATE-SPACE; TRAJECTORY DATA;

EID: 70349416596     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (23)

References (34)
  • 2
    • 85153940465 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • Justin Boyan and Andrew Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems (NIPS) 7, pages 369-376, 1995.
    • (1995) Advances in Neural Information Processing Systems (NIPS) , vol.7 , pp. 369-376
    • Boyan, J.1    Moore, A.2
  • 3
    • 0041965975 scopus 로고    scopus 로고
    • R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
    • Ronen I. Brafman and Moshe Tennenholtz. R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213-231, 2002.
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 5
    • 0003919624 scopus 로고    scopus 로고
    • Prentice Hall, ISBN 9780201808681.
    • Jeffrey B. Burl. Linear Optimal Control. Prentice Hall, 1998. ISBN 9780201808681.
    • (1998) Linear Optimal Control
    • Burl, J.B.1
  • 7
    • 38249024662 scopus 로고
    • The complexity of dynamic programming
    • Chee-Seng Chow and John N. Tsitsiklis. The complexity of dynamic programming. Journal of Complexity, 5(4):466-488, 1989.
    • (1989) Journal of Complexity , vol.5 , Issue.4 , pp. 466-488
    • Chow, C.-S.1    Tsitsiklis, J.N.2
  • 8
    • 0026206780 scopus 로고
    • An optimal one-way multigrid algorithm for discrete-time stochastic control
    • DOI 10.1109/9.133184
    • Chee-Seng Chow and John N. Tsitsiklis. An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control, 36(8):898-914, 1991. (Pubitemid 21674882)
    • (1991) IEEE Transactions on Automatic Control , vol.36 , Issue.8 , pp. 898-914
    • Chow, C.-S.1    Tsitsiklis, J.N.2
  • 12
    • 0004236492 scopus 로고    scopus 로고
    • The Johns Hopkins University Press, 3rd edition, ISBN 0-801-85414-8.
    • Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 3rd edition, 1996. ISBN 0-801-85414-8.
    • Matrix Computations , vol.1996
    • Golub, G.H.1    Van Loan, C.F.2
  • 13
    • 0004151494 scopus 로고
    • Cambridge University Press, ISBN 0-521-38632-2.
    • Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 1986. ISBN 0-521-38632-2.
    • (1986) Matrix Analysis
    • Horn, R.A.1    Johnson, C.R.2
  • 17
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • Michael J. Kearns and Satinder P Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209-232, 2002.
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 209-232
    • Kearns, M.J.1    Singh, S.P.2
  • 19
    • 84941465845 scopus 로고
    • A lower bound for discrimination in terms of variation
    • January
    • Solomon Kullback. A lower bound for discrimination in terms of variation. IEEE Transactions on Information Theory, 13(1): 126-127, January 1967.
    • (1967) IEEE Transactions on Information Theory , vol.13 , Issue.1 , pp. 126-127
    • Kullback, S.1
  • 29
    • 85162058047 scopus 로고    scopus 로고
    • Online linear regression and its application to modelbased reinforcement learning
    • Alexander L. Střehl and Michael L. Littman. Online linear regression and its application to modelbased reinforcement learning. In Advances in Neural Information Processing Systems (NIPS) 20, pages 1417-1424, 2008.
    • (2008) Advances in Neural Information Processing Systems (NIPS) , vol.20 , pp. 1417-1424
    • Střehl, A.L.1    Littman, M.L.2
  • 32
    • 0000985504 scopus 로고
    • TD-Gammon, a self-teaching backgammon program, achieves master-level play
    • Gerald J. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.
    • (1994) Neural Computation , vol.6 , Issue.2 , pp. 215-219
    • Tesauro, G.J.1
  • 33
    • 0000011340 scopus 로고
    • Some matrix-inequalities and metrization of matrix-space
    • John von Neumann. Some matrix-inequalities and metrization of matrix-space. Tomsk University Review, 1:286-300, 1937.
    • (1937) Tomsk University Review , vol.1 , pp. 286-300
    • Von Neumann, J.1
  • 34
    • 0004049893 scopus 로고
    • PhD thesis, King's College, University of Cambridge, United Kingdom
    • Christopher J.C.H. Watkins. Learning from delayed rewards. PhD thesis, King's College, University of Cambridge, United Kingdom, 1989.
    • (1989) Learning from Delayed Rewards
    • Watkins, C.J.C.H.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.