메뉴 건너뛰기




Volumn , Issue , 2013, Pages

(More) efficient reinforcement learning via posterior sampling

Author keywords

[No Author keywords available]

Indexed keywords

LEARNING ALGORITHMS; MARKOV PROCESSES;

EID: 84899019264     PISSN: 10495258     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (502)

References (22)
  • 1
    • 0031070051 scopus 로고    scopus 로고
    • Optimal adaptive policies for markov decision processes
    • A. N. Burnetas and M. N. Katehakis. Optimal adaptive policies for markov decision processes. Mathematics of Operations Research, 22(1):222-255, 1997.
    • (1997) Mathematics of Operations Research , vol.22 , Issue.1 , pp. 222-255
    • Burnetas, A.N.1    Katehakis, M.N.2
  • 3
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • T.L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4-22, 1985.
    • (1985) Advances in Applied Mathematics , vol.6 , Issue.1 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 6
    • 0041965975 scopus 로고    scopus 로고
    • R-max-A general polynomial time algorithm for nearoptimal reinforcement learning
    • R. I. Brafman and M. Tennenholtz. R-max-A general polynomial time algorithm for nearoptimal reinforcement learning. The Journal of Machine Learning Research, 3:213-231, 2003.
    • (2003) The Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 8
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209-232, 2002.
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 209-232
    • Kearns, M.1    Singh, S.2
  • 9
    • 0001395850 scopus 로고
    • On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
    • W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4):285-294, 1933.
    • (1933) Biometrika , vol.25 , Issue.3-4 , pp. 285-294
    • Thompson, W.R.1
  • 15
    • 84893254104 scopus 로고    scopus 로고
    • Learning to optimize via posterior sampling
    • abs/1301.2609
    • D. Russo and B. Van Roy. Learning to optimize via posterior sampling. CoRR, abs/1301.2609, 2013.
    • (2013) CoRR
    • Russo, D.1    Van Roy, B.2
  • 21
    • 55549110436 scopus 로고    scopus 로고
    • An analysis of model-based interval estimation for markov decision processes
    • A. L. Strehl and M. L. Littman. An analysis of model-based interval estimation for markov decision processes. Journal of Computer and System Sciences, 74(8):1309-1331, 2008.
    • (2008) Journal of Computer and System Sciences , vol.74 , Issue.8 , pp. 1309-1331
    • Strehl, A.L.1    Littman, M.L.2
  • 22
    • 84898972955 scopus 로고    scopus 로고
    • Optimism in reinforcement learning based on kullbackleibler divergence
    • abs/1004.5229
    • S. Filippi, O. Cappé, and A. Garivier. Optimism in reinforcement learning based on kullbackleibler divergence. CoRR, abs/1004.5229, 2010.
    • (2010) CoRR
    • Filippi, S.1    Cappé, O.2    Garivier, A.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.