메뉴 건너뛰기




Volumn 58, Issue 8, 2012, Pages 5588-5611

Online learning of rested and restless bandits

Author keywords

Exploration exploitation tradeoff; multiarmed bandits; online learning; opportunistic spectrum access (OSA); regret; restless bandits

Indexed keywords

EXPLORATION-EXPLOITATION TRADEOFF; MULTI ARMED BANDIT; ONLINE LEARNING; OPPORTUNISTIC SPECTRUM ACCESSES (OSA); REGRET; RESTLESS BANDITS;

EID: 84863956678     PISSN: 00189448     EISSN: None     Source Type: Journal    
DOI: 10.1109/TIT.2012.2198613     Document Type: Article
Times cited : (183)

References (26)
  • 2
    • 84966203785 scopus 로고
    • Some aspects of the sequential design of experiments
    • H. Robbins, "Some aspects of the sequential design of experiments," Bull. Amer. Math. Soc., vol. 55, pp. 527-535, 1952.
    • (1952) Bull. Amer. Math. Soc. , vol.55 , pp. 527-535
    • Robbins, H.1
  • 3
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • DOI 10.1023/A:1013689704352, Computational Learning Theory
    • P. Auer, N. Cesa-Bianchi, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," Mach. Learn., vol. 47, pp. 235-256, 2002. (Pubitemid 34126111)
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 4
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • T. Lai and H. Robbins, "Asymptotically efficient adaptive allocation rules," Adv. Appl. Math., vol. 6, pp. 4-22, 1985.
    • (1985) Adv. Appl. Math. , vol.6 , pp. 4-22
    • Lai, T.1    Robbins, H.2
  • 5
    • 0023453059 scopus 로고
    • Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays - part I: I. I. D. RewardS
    • V. Anantharam, P. Varaiya, and J. Walrand, "Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewards," IEEE Trans. Autom. Control, vol. 32, no. 11, pp. 968-976, Nov. 1987. (Pubitemid 18521625)
    • (1987) IEEE Transactions on Automatic Control , vol.AC-32 , Issue.11 , pp. 968-976
    • Anantharam, V.1    Varaiya, P.2    Walrand, J.3
  • 6
    • 0023450663 scopus 로고
    • Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays - part II: Markovian rewards
    • V. Anantharam, P. Varaiya, and J. Walrand, "Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards," IEEE Trans. Autom. Control, vol. 32, no. 11, pp. 977-982, Nov. 1987. (Pubitemid 18521626)
    • (1987) IEEE Transactions on Automatic Control , vol.AC-32 , Issue.11 , pp. 977-982
    • Anantharam, V.1    Varaiya, P.2    Walrand, J.3
  • 7
    • 0000616723 scopus 로고
    • Samplemean based index policieswith regret for the multi-armed bandit problem
    • Dec.
    • R.Agrawal, "Samplemean based index policieswith regret for the multi-armed bandit problem," Adv. Appl. Probabil., vol. 27, no. 4, pp. 1054-1078, Dec. 1995.
    • (1995) Adv. Appl. Probabil. , vol.27 , Issue.4 , pp. 1054-1078
    • Agrawal, R.1
  • 8
    • 84898437076 scopus 로고    scopus 로고
    • The KL-UCB algorithm for bounded stochastic bandits and beyond
    • A. Garivier and O. Cappe, "The KL-UCB algorithm for bounded stochastic bandits and beyond," in Proc. JMLR Workshop Conf., 2011, vol. 19, pp. 359-376.
    • (2011) Proc. JMLR Workshop Conf. , vol.19 , pp. 359-376
    • Garivier, A.1    Cappe, O.2
  • 10
    • 79952397795 scopus 로고    scopus 로고
    • Online algorithms for the multi-armed bandit problem with Markovian rewards
    • Control, Comput., Sep.
    • C. Tekin and M. Liu, "Online algorithms for the multi-armed bandit problem with Markovian rewards," in Proc. 48th Annu. Allerton Conf. Commun., Control, Comput., Sep. 2010, pp. 1675-1682.
    • (2010) Proc. 48th Annu. Allerton Conf. Commun. , pp. 1675-1682
    • Tekin, C.1    Liu, M.2
  • 11
    • 79960884459 scopus 로고    scopus 로고
    • Online learning in opportunistic spectrum access: A restless bandit approach
    • Apr.
    • C. Tekin and M. Liu, "Online learning in opportunistic spectrum access: A restless bandit approach," in Proc. 30th Annu. IEEE Int. Conf. Comput. Commun., Apr. 2011, pp. 2462-2470.
    • (2011) Proc. 30th Annu. IEEE Int. Conf. Comput. Commun. , pp. 2462-2470
    • Tekin, C.1    Liu, M.2
  • 13
    • 79953827701 scopus 로고    scopus 로고
    • Distributed learning in multi-armed bandit with multiple players
    • Nov.
    • K. Liu and Q. Zhao, "Distributed learning in multi-armed bandit with multiple players," IEEE Trans. Signal Process., vol. 58, no. 11, pp. 5667-5681, Nov. 2010.
    • (2010) IEEE Trans. Signal Process. , vol.58 , Issue.11 , pp. 5667-5681
    • Liu, K.1    Zhao, Q.2
  • 14
    • 79953194834 scopus 로고    scopus 로고
    • Distributed algorithms for learning and cognitive medium access with logarithmic regret
    • Apr.
    • A. Anandkumar, N. Michael, A. Tang, and A. Swami, "Distributed algorithms for learning and cognitive medium access with logarithmic regret," IEEE J. Sel. Areas Commun., vol. 29, no. 4, pp. 731-745, Apr. 2011.
    • (2011) IEEE J. Sel. Areas Commun. , vol.29 , Issue.4 , pp. 731-745
    • Anandkumar, A.1    Michael, N.2    Tang, A.3    Swami, A.4
  • 15
    • 77953180719 scopus 로고    scopus 로고
    • Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation
    • Apr.
    • Y. Gai, B. Krishnamachari, and R. Jain, "Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation," in IEEE Symp. Dyn. Spectrum Access Netw. (DySPAN), Apr. 2010, pp. 1-9.
    • (2010) IEEE Symp. Dyn. Spectrum Access Netw. (DySPAN) , pp. 1-9
    • Gai, Y.1    Krishnamachari, B.2    Jain, R.3
  • 16
    • 0000169010 scopus 로고
    • Bandit processes and dynamic allocation indices
    • J. Gittins, "Bandit processes and dynamic allocation indices," J. Roy. Statist. Soc., vol. 41, no. 2, pp. 148-177, 1979.
    • (1979) J. Roy. Statist. Soc. , vol.41 , Issue.2 , pp. 148-177
    • Gittins, J.1
  • 17
    • 0001043843 scopus 로고
    • Restless bandits: Activity allocation in a changing world
    • Sheffield, U.K.: Applied Probability Trust
    • P. Whittle, , J. Gani, Ed., "Restless bandits: Activity allocation in a changing world," in A Celebration of Applied Probability. Sheffield, U.K.: Applied Probability Trust, 1988, vol. 25A, pp. 287-298.
    • (1988) A Celebration of Applied Probability , vol.25 A , pp. 287-298
    • Whittle, P.1    Gani, J.2
  • 18
    • 69449100462 scopus 로고    scopus 로고
    • Optimality of myopic sensing in multi-channel opportunistic access
    • Sep.
    • S. H. A. Ahmad, M. Liu, T. Javidi, Q. Zhao, and B. Krishnamachari, "Optimality of myopic sensing in multi-channel opportunistic access," IEEE Trans. Inf. Theory, vol. 55, no. 9, pp. 4040-4050, Sep. 2009.
    • (2009) IEEE Trans. Inf. Theory , vol.55 , Issue.9 , pp. 4040-4050
    • Ahmad, S.H.A.1    Liu, M.2    Javidi, T.3    Zhao, Q.4    Krishnamachari, B.5
  • 19
    • 0032222170 scopus 로고    scopus 로고
    • Chernoff-type bound for finite Markov chains
    • P. Lezaud, "Chernoff-type bound for finite Markov chains," Ann. Appl. Probab., vol. 8, pp. 849-867, 1998.
    • (1998) Ann. Appl. Probab. , vol.8 , pp. 849-867
    • Lezaud, P.1
  • 20
    • 62949181077 scopus 로고    scopus 로고
    • Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
    • J. Y. Audibert and R. M. Szepesvári, "Exploration- exploitation tradeoff using variance estimates in multi-armed bandits," Theoretical Comput. Sci., vol. 410, no. 19, pp. 1876-1902, 2009.
    • (2009) Theoretical Comput. Sci. , vol.410 , Issue.19 , pp. 1876-1902
    • Audibert, J.Y.1    Szepesvári, R.M.2
  • 21
    • 84856091352 scopus 로고    scopus 로고
    • Adaptive learning of uncontrolled restless bandits with logarithmic regret
    • Control, Comput., Sep.
    • C. Tekin and M. Liu, "Adaptive learning of uncontrolled restless bandits with logarithmic regret," in Proc. 49th Annu. Allerton Conf. Commun., Control, Comput., Sep. 2011, pp. 983-990.
    • (2011) Proc. 49th Annu. Allerton Conf. Commun. , pp. 983-990
    • Tekin, C.1    Liu, M.2
  • 22
    • 0032628612 scopus 로고    scopus 로고
    • The complexity of optimal queuing network control
    • May
    • C. Papadimitriou and J. Tsitsiklis, "The complexity of optimal queuing network control," Math. Oper. Res., vol. 24, no. 2, pp. 293-305, May 1999.
    • (1999) Math. Oper. Res. , vol.24 , Issue.2 , pp. 293-305
    • Papadimitriou, C.1    Tsitsiklis, J.2
  • 25
    • 84861588214 scopus 로고    scopus 로고
    • Approximately optimal adaptive learning in opportunistic spectrum access
    • Orlando, FL, Mar.
    • C. Tekin and M. Liu, "Approximately optimal adaptive learning in opportunistic spectrum access," presented at the presented at the IEEE INFOCOM, Orlando, FL, Mar. 2012.
    • (2012) IEEE INFOCOM
    • Tekin, C.1    Liu, M.2
  • 26
    • 37349120464 scopus 로고    scopus 로고
    • On the expectation of the maximum of i.i.d. geometric random variables
    • B. Eisenberg, "On the expectation of the maximum of i.i.d. geometric random variables," Statist. Probab. Lett., vol. 78, pp. 135-143, 2008.
    • (2008) Statist. Probab. Lett. , vol.78 , pp. 135-143
    • Eisenberg, B.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.