메뉴 건너뛰기




Volumn , Issue , 2008, Pages

Online linear regression and its application to model-based reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

E-LEARNING; LEARNING ALGORITHMS; REINFORCEMENT LEARNING;

EID: 85162058047     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (61)

References (14)
  • 1
    • 31844444663 scopus 로고    scopus 로고
    • Exploration and apprenticeship learning in reinforcement learning
    • DOI 10.1145/1102351.1102352, ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
    • Abbeel, P., & Ng, A. Y. (2005). Exploration and apprenticeship learning in reinforcement learning. ICML '05: Proceedings of the 22nd international conference on Machine learning (pp. 1-8). New York, NY, USA: ACM Press. (Pubitemid 43183309)
    • (2005) ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning , pp. 1-8
    • Abbeel, P.1    Ng, A.Y.2
  • 2
    • 0041966002 scopus 로고    scopus 로고
    • Using confidence bounds for exploitation-exploration trade-offs
    • Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3, 397-422.
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 397-422
    • Auer, P.1
  • 3
    • 0041965975 scopus 로고    scopus 로고
    • R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning
    • Brafman, R. I., & Tennenholtz, M. (2002). R-MAX-a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213-231.
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 4
    • 0026206780 scopus 로고
    • An optimal one-way multigrid algorithmfor discrete time stochastic control
    • Chow, C.-S., & Tsitsiklis, J. N. (1991). An optimal one-way multigrid algorithmfor discrete time stochastic control. IEEE Transactions on Automatic Control, 36, 898-914.
    • (1991) IEEE Transactions on Automatic Control , vol.36 , pp. 898-914
    • Chow, C.-S.1    Tsitsiklis, J.N.2
  • 6
    • 0004236492 scopus 로고    scopus 로고
    • Baltimore, Maryland: The Johns Hopkins University Press. 3rd edition
    • Golub, G. H., & Van Loan, C. F. (1996). Matrix computations. Baltimore, Maryland: The Johns Hopkins University Press. 3rd edition.
    • (1996) Matrix Computations
    • Golub, G.H.1    Van Loan, C.F.2
  • 7
    • 23244466805 scopus 로고    scopus 로고
    • Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London
    • Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London.
    • (2003) On the Sample Complexity of Reinforcement Learning
    • Kakade, S.M.1
  • 10
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • Kearns, M. J., & Singh, S. P. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49, 209-232.
    • (2002) Machine Learning , vol.49 , pp. 209-232
    • Kearns, M.J.1    Singh, S.P.2
  • 14
    • 0000985504 scopus 로고
    • TD-Gammon, a self-teaching backgammon program, achieves master-level play
    • Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215-219.
    • (1994) Neural Computation , vol.6 , pp. 215-219
    • Tesauro, G.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.