메뉴 건너뛰기




Volumn 3, Issue 3, 2003, Pages 397-422

Using confidence bounds for exploitation-exploration trade-offs

Author keywords

Bandit Problem; Exploitation Exploration; Linear Value Function; Online Learning; Reinforcement Learning

Indexed keywords

ALGORITHMS; ONLINE SYSTEMS; RANDOM PROCESSES; STATISTICS;

EID: 0041966002     PISSN: 15324435     EISSN: None     Source Type: Journal    
DOI: 10.1162/153244303321897663     Document Type: Conference Paper
Times cited : (1885)

References (19)
  • 1
    • 0042996986 scopus 로고    scopus 로고
    • Associative reinforcement learning using linear probabilistic concepts
    • Morgan Kaufmann, San Francisco, CA
    • N. Abe and P. M. Long. Associative reinforcement learning using linear probabilistic concepts. In Proc. 16th International Conf. on Machine Learning, pages 3-11. Morgan Kaufmann, San Francisco, CA, 1999.
    • (1999) Proc. 16th International Conf. on Machine Learning , pp. 3-11
    • Abe, N.1    Long, P.M.2
  • 2
    • 0042496195 scopus 로고    scopus 로고
    • Learning to optimally schedule internet banner advertisements
    • Morgan Kaufmann, San Francisco, CA
    • N. Abe and A. Nakamura. Learning to optimally schedule internet banner advertisements. In Proc. 16th International Conf. on Machine Learning, pages 12-21. Morgan Kaufmann, San Francisco, CA, 1999.
    • (1999) Proc. 16th International Conf. on Machine Learning , pp. 12-21
    • Abe, N.1    Nakamura, A.2
  • 3
    • 0000616723 scopus 로고
    • Sample mean based index policies with O(log n) regret for the multi-armed bandit problem
    • R. Agrawal. Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27:1054-1078, 1995.
    • (1995) Advances in Applied Probability , vol.27 , pp. 1054-1078
    • Agrawal, R.1
  • 7
    • 0042496192 scopus 로고    scopus 로고
    • Gambling in a rigged casino: The adversarial multi-armed bandit problem
    • Royal Holloway, University of London
    • P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. NeuroCOLT2 Technical Report NC2-TR-1998-025, Royal Holloway, University of London, 1998. Accessible via http at www.neurocolt.org.
    • (1998) NeuroCOLT2 Technical Report , vol.NC2-TR-1998-025
    • Auer, P.1    Cesa-Bianchi, N.2    Freund, Y.3    Schapire, R.E.4
  • 9
    • 0032140937 scopus 로고    scopus 로고
    • Tracking the best disjunction
    • A preliminary version has appeared in Proceedings of the 36th Annual Symposium on Foundations of Computer Science
    • P. Auer and M. K. Warmuth. Tracking the best disjunction. Machine Learning, 32:127-150, 1998. A preliminary version has appeared in Proceedings of the 36th Annual Symposium on Foundations of Computer Science.
    • (1998) Machine Learning , vol.32 , pp. 127-150
    • Auer, P.1    Warmuth, M.K.2
  • 10
    • 84972574511 scopus 로고
    • Weighted sums of certain dependent random variables
    • K. Azuma. Weighted sums of certain dependent random variables. Tohoku Math. J., 3:357-367, 1967.
    • (1967) Tohoku Math. J. , vol.3 , pp. 357-367
    • Azuma, K.1
  • 13
  • 14
    • 0028442414 scopus 로고
    • Associative reinforcement learning: A generate and test algorithm
    • L. P. Kaelbling. Associative reinforcement learning: A generate and test algorithm. Machine Learning, 15:299-319, 1994a.
    • (1994) Machine Learning , vol.15 , pp. 299-319
    • Kaelbling, L.P.1
  • 15
    • 0028442413 scopus 로고
    • Associative reinforcement learning: Functions in k-DNF
    • L. P. Kaelbling. Associative reinforcement learning: Functions in k-DNF. Machine Learning, 15:279-298, 1994b.
    • (1994) Machine Learning , vol.15 , pp. 279-298
    • Kaelbling, L.P.1
  • 16
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
    • (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 18
    • 84966203785 scopus 로고
    • Some aspects of the sequential design of experiments
    • H. Robbins. Some aspects of the sequential design of experiments. Bulletin American Mathematical Society, 55:527-535, 1952.
    • (1952) Bulletin American Mathematical Society , vol.55 , pp. 527-535
    • Robbins, H.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.