메뉴 건너뛰기




Volumn , Issue , 2011, Pages 983-990

Adaptive learning of uncontrolled restless bandits with logarithmic regret

Author keywords

[No Author keywords available]

Indexed keywords

ADAPTIVE LEARNING; AVERAGE REWARD; FINITE HORIZONS; INFINITE HORIZONS; OPTIMAL POLICIES; OPTIMAL STATIONARY POLICY; RESTLESS BANDIT; STATE TRANSITIONS; TRANSITION PROBABILITIES;

EID: 84856091352     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/Allerton.2011.6120273     Document Type: Conference Paper
Times cited : (19)

References (21)
  • 1
    • 0019037868 scopus 로고
    • Optimal infinite-horizon undiscounted control of finite probabilistic systems
    • L. K. Platzman, "Optimal infinite-horizon undiscounted control of finite probabilistic systems," SIAM J. Control Optim., vol. 18, pp. 362-380, 1980.
    • (1980) SIAM J. Control Optim. , vol.18 , pp. 362-380
    • Platzman, L.K.1
  • 2
    • 28544443262 scopus 로고    scopus 로고
    • On the existence of stationary optimal policies for partially observed MDPs under the long-run average cost criterion
    • DOI 10.1016/j.sysconle.2005.06.009, PII S016769110500109X
    • S. P. Hsu, D. M. Chuang, and A. Arapostathis, "On the existence of stationary optimal policies for partially observed mdps under the long-run average cost criterion," Systems and Control Letters, vol. 55, pp. 165-173, 2006. (Pubitemid 41745435)
    • (2006) Systems and Control Letters , vol.55 , Issue.2 , pp. 165-173
    • Hsu, S.-P.1    Chuang, D.-M.2    Arapostathis, A.3
  • 3
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • T. Lai and H. Robbins, "Asymptotically efficient adaptive allocation rules," Advances in Applied Mathematics, vol. 6, pp. 4-22, 1985.
    • (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
    • Lai, T.1    Robbins, H.2
  • 4
    • 0000616723 scopus 로고
    • Sample mean based index policies with o(log n) regret for the multi-armed bandit problem
    • December
    • R. Agrawal, "Sample mean based index policies with o(log n) regret for the multi-armed bandit problem," Advances in Applied Probability, vol. 27, no. 4, pp. 1054-1078, December 1995.
    • (1995) Advances in Applied Probability , vol.27 , Issue.4 , pp. 1054-1078
    • Agrawal, R.1
  • 5
    • 0023453059 scopus 로고
    • Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part i: Iid rewards
    • November
    • V. Anantharam, P. Varaiya, and J. . Walrand, "Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part i: Iid rewards," IEEE Trans. Automat. Contr., pp. 968-975, November 1987.
    • (1987) IEEE Trans. Automat. Contr. , pp. 968-975
    • Anantharam, V.1    Varaiya, P.2    Walrand, J.3
  • 6
    • 0023450663 scopus 로고
    • Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part ii: Markovian rewards
    • -, "Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part ii: Markovian rewards," IEEE Trans. Automat. Contr., pp. 977-982, November 1987.
    • (1987) IEEE Trans. Automat. Contr. , Issue.NOVEMBER , pp. 977-982
    • Anantharam, V.1    Varaiya, P.2    Walrand, J.3
  • 7
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • P. Auer, N. Cesa-Bianchi, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," Machine Learning, vol. 47, p. 235256, 2002.
    • (2002) Machine Learning , vol.47 , pp. 235256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 14
    • 0031070051 scopus 로고    scopus 로고
    • Optimal adaptive policies for markov decision processes
    • A. N. Burnetas and M. N. Katehakis, "Optimal adaptive policies for markov decision processes," Mathematics of Operations Research, vol. 22, no. 1, pp. 222-255, 1997.
    • (1997) Mathematics of Operations Research , vol.22 , Issue.1 , pp. 222-255
    • Burnetas, A.N.1    Katehakis, M.N.2
  • 15
    • 85162041468 scopus 로고    scopus 로고
    • Optimistic linear programming gives logarithmic regret for irreducible mdps
    • A. Tewari and P. Bartlett, "Optimistic linear programming gives logarithmic regret for irreducible mdps," Advances in Neural Information Processing Systems, vol. 20, pp. 1505-1512, 2008.
    • (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 1505-1512
    • Tewari, A.1    Bartlett, P.2
  • 17
    • 27944497396 scopus 로고    scopus 로고
    • Senstivity and convergence of uniformly ergodic markov chains
    • A. Y. Mitrophanov, "Senstivity and convergence of uniformly ergodic markov chains," J. Appl. Prob., vol. 42, pp. 1003-1014, 2005.
    • (2005) J. Appl. Prob. , vol.42 , pp. 1003-1014
    • Mitrophanov, A.Y.1
  • 18
    • 0032628612 scopus 로고    scopus 로고
    • The complexity of optimal queuing network control
    • C. H. Papadimitriou and J. N. Tsitsiklis, "The complexity of optimal queuing network control," Math. Oper. Res., vol. 24, no. 2, pp. 293-305, 1999.
    • (1999) Math. Oper. Res. , vol.24 , Issue.2 , pp. 293-305
    • Papadimitriou, C.H.1    Tsitsiklis, J.N.2
  • 19
    • 0001043843 scopus 로고
    • Restless bandits
    • P. Whitlle, "Restless bandits," J. Appl. Prob., pp. 301-313, 1988.
    • (1988) J. Appl. Prob. , pp. 301-313
    • Whitlle, P.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.