메뉴 건너뛰기




Volumn , Issue , 2008, Pages

The Epoch-Greedy algorithm for contextual multi-armed bandits

Author keywords

[No Author keywords available]

Indexed keywords

GREEDY ALGORITHMS; MULTIARMED BANDITS (MABS); PROPERTY; SAMPLE COMPLEXITY BOUNDS; SIDE INFORMATION; TIME HORIZONS;

EID: 85162018594     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (82)

References (10)
  • 1
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • DOI 10.1023/A:1013689704352, Computational Learning Theory
    • Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite time analysis of the multi-armed bandit problem. Machine Learning, 47, 235-256. (Pubitemid 34126111)
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 2
    • 0029513526 scopus 로고
    • Gambling in a rigged casino: The adversarial multi-armed bandit problem
    • Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: The adversarial multi-armed bandit problem. FOCS.
    • (1995) FOCS
    • Auer, P.1    Cesa-Bianchi, N.2    Freund, Y.3    Schapire, R.E.4
  • 3
    • 33745295134 scopus 로고    scopus 로고
    • Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
    • Even-dar, E., Mannor, S., & Mansour, Y. (2006). Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. JMLR, 7, 1079-1105. (Pubitemid 43938989)
    • (2006) Journal of Machine Learning Research , vol.7 , pp. 1079-1105
    • Even-Bar, E.1    Mannor, S.2    Mansour, Y.3
  • 4
    • 0000125534 scopus 로고
    • Sample selection bias as a specification error
    • Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 47, 153-161.
    • (1979) Econometrica , vol.47 , pp. 153-161
    • Heckman, J.1
  • 5
    • 84898967749 scopus 로고    scopus 로고
    • Approximate planning in large pomdps via reusable trajectories
    • Kearns, M., Mansour, Y., & Ng, A. Y. (2000). Approximate planning in large pomdps via reusable trajectories. NIPS.
    • (2000) NIPS
    • Kearns, M.1    Mansour, Y.2    Ng, A.Y.3
  • 6
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • Lai, T., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6, 4-22.
    • (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
    • Lai, T.1    Robbins, H.2
  • 7
    • 0029344133 scopus 로고
    • Machine learning and nonparametric bandit theory
    • Lai, T., & Yakowitz, S. (1995). Machine learning and nonparametric bandit theory. IEEE TAC, 40, 1199-1209.
    • (1995) IEEE TAC , vol.40 , pp. 1199-1209
    • Lai, T.1    Yakowitz, S.2
  • 9
    • 33749242078 scopus 로고    scopus 로고
    • Experience-efficient learning in associative bandit problems
    • Strehl, A. L., Mesterharm, C., Littman, M. L., & Hirsh, H. (2006). Experience-efficient learning in associative bandit problems. ICML.
    • (2006) ICML
    • Strehl, A.L.1    Mesterharm, C.2    Littman, M.L.3    Hirsh, H.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.