메뉴 건너뛰기




Volumn , Issue , 2010, Pages 1675-1682

Online algorithms for the multi-armed bandit problem with Markovian rewards

Author keywords

[No Author keywords available]

Indexed keywords

INDEX POLICIES; LEARNING PROBLEM; MARKOVIAN; MULTI-ARMED BANDIT PROBLEM; NUMBER OF STATE; ON-LINE ALGORITHMS; OPTIMALITY; SAMPLE MEANS; STATE TRANSITION PROBABILITIES; STATE-DEPENDENT;

EID: 79952397795     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ALLERTON.2010.5707118     Document Type: Conference Paper
Times cited : (66)

References (12)
  • 1
    • 84966203785 scopus 로고
    • Some aspects of the sequential design of experiments
    • H. Robbins, Some aspects of the sequential design of experiments, Bull. Amer. Math. Soc., 55, pp. 527-535, 1952
    • (1952) Bull. Amer. Math. Soc. , vol.55 , pp. 527-535
    • Robbins, H.1
  • 3
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • T. Lai, H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, 1985, 6 ,4-22
    • (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
    • Lai, T.1    Robbins, H.2
  • 4
    • 0023453059 scopus 로고
    • Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: IID rewards
    • V. Anantharam, P. Varaiya, J . Walrand, Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: IID rewards, IEEE Trans. Automat. Contr., pp. 968-975, 1987
    • (1987) IEEE Trans. Automat. Contr. , pp. 968-975
    • Anantharam, V.1    Varaiya, P.2    Walrand, J.3
  • 5
    • 0023450663 scopus 로고
    • Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards
    • November
    • V. Anantharam, P. Varaiya, J . Walrand, Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards, IEEE Trans. Automat. Contr., pp. 977-982, November 1987
    • (1987) IEEE Trans. Automat. Contr. , pp. 977-982
    • Anantharam, V.1    Varaiya, P.2    Walrand, J.3
  • 6
    • 0000616723 scopus 로고
    • Sample Mean Based Index Policies with O(log n) Regret for the Multi-Armed Bandit Problem
    • December
    • R. Agrawal, Sample Mean Based Index Policies with O(log n) Regret for the Multi-Armed Bandit Problem, Advances in Applied Probability, Vol. 27, No. 4, pp. 1054-1078, December 1995.
    • (1995) Advances in Applied Probability , vol.27 , Issue.4 , pp. 1054-1078
    • Agrawal, R.1
  • 7
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem, Machine Learning, 47(2/3):235256, 2002.
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 10
    • 0000169010 scopus 로고
    • Bandit Processes and Dynamic Allocation Indices
    • Series B
    • J.C. Gittins, Bandit Processes and Dynamic Allocation Indices, Journal of the Royal Statistical Society, Series B, Vol. 41, No.2, pp 148-177, 1979.
    • (1979) Journal of the Royal Statistical Society , vol.41 , Issue.2 , pp. 148-177
    • Gittins, J.C.1
  • 11
    • 0000646152 scopus 로고    scopus 로고
    • A chernoff bound for random walks on expander graphs
    • Proc. 34th IEEE Symp. on Foundations of Computer Science (FOCS93), vol.
    • D. Gillman, A chernoff bound for random walks on expander graphs, Proc. 34th IEEE Symp. on Foundations of Computer Science (FOCS93), vol. SIAM J. Comp., Vol 27, No 4, 1998.
    • (1998) SIAM J. Comp. , vol.27 , Issue.4
    • Gillman, D.1
  • 12
    • 0032222170 scopus 로고    scopus 로고
    • Chernoff-type Bound for Finite Markov Chains
    • P. Lezaud, Chernoff-type Bound for Finite Markov Chains, Ann. Appl. Prob. 8, pp 849-867, 1998.
    • (1998) Ann. Appl. Prob. , vol.8 , pp. 849-867
    • Lezaud, P.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.