SCOPUS 정보 검색 플랫폼

Volumn 58, Issue 8, 2012, Pages 5588-5611

Online learning of rested and restless bandits

a UNIVERSITY OF MICHIGAN (United States)

Author keywords

Exploration exploitation tradeoff; multiarmed bandits; online learning; opportunistic spectrum access (OSA); regret; restless bandits

Indexed keywords

EXPLORATION-EXPLOITATION TRADEOFF; MULTI ARMED BANDIT; ONLINE LEARNING; OPPORTUNISTIC SPECTRUM ACCESSES (OSA); REGRET; RESTLESS BANDITS;

EVOLUTIONARY ALGORITHMS; MARKOV PROCESSES;

E-LEARNING;

EID: 84863956678 PISSN: 00189448 EISSN: None Source Type: Journal
DOI: 10.1109/TIT.2012.2198613 Document Type: Article

Times cited : (183)

References (26)

1
- 0037709910
- The nonstochastic multiarmed bandit problem
- Jan.
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire, "The nonstochastic multiarmed bandit problem," SIAM J. Comput., vol. 32, pp. 48-77, Jan. 2002.
- (2002) SIAM J. Comput. , vol.32 , pp. 48-77
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.⁴

2
- 84966203785
- Some aspects of the sequential design of experiments
- H. Robbins, "Some aspects of the sequential design of experiments," Bull. Amer. Math. Soc., vol. 55, pp. 527-535, 1952.
- (1952) Bull. Amer. Math. Soc. , vol.55 , pp. 527-535
- Robbins, H.¹

3
- 0036568025
- Finite-time analysis of the multiarmed bandit problem
- DOI 10.1023/A:1013689704352, Computational Learning Theory
- P. Auer, N. Cesa-Bianchi, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," Mach. Learn., vol. 47, pp. 235-256, 2002. (Pubitemid 34126111)
- (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

4
- 0002899547
- Asymptotically efficient adaptive allocation rules
- T. Lai and H. Robbins, "Asymptotically efficient adaptive allocation rules," Adv. Appl. Math., vol. 6, pp. 4-22, 1985.
- (1985) Adv. Appl. Math. , vol.6 , pp. 4-22
- Lai, T.¹ Robbins, H.²

5
- 0023453059
- Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays - part I: I. I. D. RewardS
- V. Anantharam, P. Varaiya, and J. Walrand, "Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewards," IEEE Trans. Autom. Control, vol. 32, no. 11, pp. 968-976, Nov. 1987. (Pubitemid 18521625)
- (1987) IEEE Transactions on Automatic Control , vol.AC-32 , Issue.11 , pp. 968-976
- Anantharam, V.¹ Varaiya, P.² Walrand, J.³

6
- 0023450663
- Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays - part II: Markovian rewards
- V. Anantharam, P. Varaiya, and J. Walrand, "Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards," IEEE Trans. Autom. Control, vol. 32, no. 11, pp. 977-982, Nov. 1987. (Pubitemid 18521626)
- (1987) IEEE Transactions on Automatic Control , vol.AC-32 , Issue.11 , pp. 977-982
- Anantharam, V.¹ Varaiya, P.² Walrand, J.³

7
- 0000616723
- Samplemean based index policieswith regret for the multi-armed bandit problem
- Dec.
- R.Agrawal, "Samplemean based index policieswith regret for the multi-armed bandit problem," Adv. Appl. Probabil., vol. 27, no. 4, pp. 1054-1078, Dec. 1995.
- (1995) Adv. Appl. Probabil. , vol.27 , Issue.4 , pp. 1054-1078
- Agrawal, R.¹

8
- 84898437076
- The KL-UCB algorithm for bounded stochastic bandits and beyond
- A. Garivier and O. Cappe, "The KL-UCB algorithm for bounded stochastic bandits and beyond," in Proc. JMLR Workshop Conf., 2011, vol. 19, pp. 359-376.
- (2011) Proc. JMLR Workshop Conf. , vol.19 , pp. 359-376
- Garivier, A.¹ Cappe, O.²

9
- 84898072179
- Stochastic linear optimization under bandit feedback
- Jul.
- V. Dani,T.P.Hayes, andS.M.Kakade, "Stochastic linear optimization under bandit feedback," in Proc. 21st Annu. Conf. Learn. Theory, Jul. 2008, pp. 355-366.
- (2008) Proc. 21st Annu. Conf. Learn. Theory , pp. 355-366
- Dani, V.¹ Hayes, T.P.² Kakade, S.M.³

10
- 79952397795
- Online algorithms for the multi-armed bandit problem with Markovian rewards
- Control, Comput., Sep.
- C. Tekin and M. Liu, "Online algorithms for the multi-armed bandit problem with Markovian rewards," in Proc. 48th Annu. Allerton Conf. Commun., Control, Comput., Sep. 2010, pp. 1675-1682.
- (2010) Proc. 48th Annu. Allerton Conf. Commun. , pp. 1675-1682
- Tekin, C.¹ Liu, M.²

11
- 79960884459
- Online learning in opportunistic spectrum access: A restless bandit approach
- Apr.
- C. Tekin and M. Liu, "Online learning in opportunistic spectrum access: A restless bandit approach," in Proc. 30th Annu. IEEE Int. Conf. Comput. Commun., Apr. 2011, pp. 2462-2470.
- (2011) Proc. 30th Annu. IEEE Int. Conf. Comput. Commun. , pp. 2462-2470
- Tekin, C.¹ Liu, M.²

12
- 80051629493
- California Davis, Davis
- H. Liu, K. Liu, and Q. Zhao, Learning in a changing world: Non-Bayesian restless multi-armed bandit Univ. California Davis, Davis, 2010.
- (2010) Learning in A Changing World: Non-Bayesian Restless Multi-armed Bandit Univ
- Liu, H.¹ Liu, K.² Zhao, Q.³

13
- 79953827701
- Distributed learning in multi-armed bandit with multiple players
- Nov.
- K. Liu and Q. Zhao, "Distributed learning in multi-armed bandit with multiple players," IEEE Trans. Signal Process., vol. 58, no. 11, pp. 5667-5681, Nov. 2010.
- (2010) IEEE Trans. Signal Process. , vol.58 , Issue.11 , pp. 5667-5681
- Liu, K.¹ Zhao, Q.²

14
- 79953194834
- Distributed algorithms for learning and cognitive medium access with logarithmic regret
- Apr.
- A. Anandkumar, N. Michael, A. Tang, and A. Swami, "Distributed algorithms for learning and cognitive medium access with logarithmic regret," IEEE J. Sel. Areas Commun., vol. 29, no. 4, pp. 731-745, Apr. 2011.
- (2011) IEEE J. Sel. Areas Commun. , vol.29 , Issue.4 , pp. 731-745
- Anandkumar, A.¹ Michael, N.² Tang, A.³ Swami, A.⁴

15
- 77953180719
- Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation
- Apr.
- Y. Gai, B. Krishnamachari, and R. Jain, "Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation," in IEEE Symp. Dyn. Spectrum Access Netw. (DySPAN), Apr. 2010, pp. 1-9.
- (2010) IEEE Symp. Dyn. Spectrum Access Netw. (DySPAN) , pp. 1-9
- Gai, Y.¹ Krishnamachari, B.² Jain, R.³

16
- 0000169010
- Bandit processes and dynamic allocation indices
- J. Gittins, "Bandit processes and dynamic allocation indices," J. Roy. Statist. Soc., vol. 41, no. 2, pp. 148-177, 1979.
- (1979) J. Roy. Statist. Soc. , vol.41 , Issue.2 , pp. 148-177
- Gittins, J.¹

17
- 0001043843
- Restless bandits: Activity allocation in a changing world
- Sheffield, U.K.: Applied Probability Trust
- P. Whittle, , J. Gani, Ed., "Restless bandits: Activity allocation in a changing world," in A Celebration of Applied Probability. Sheffield, U.K.: Applied Probability Trust, 1988, vol. 25A, pp. 287-298.
- (1988) A Celebration of Applied Probability , vol.25 A , pp. 287-298
- Whittle, P.¹ Gani, J.²

18
- 69449100462
- Optimality of myopic sensing in multi-channel opportunistic access
- Sep.
- S. H. A. Ahmad, M. Liu, T. Javidi, Q. Zhao, and B. Krishnamachari, "Optimality of myopic sensing in multi-channel opportunistic access," IEEE Trans. Inf. Theory, vol. 55, no. 9, pp. 4040-4050, Sep. 2009.
- (2009) IEEE Trans. Inf. Theory , vol.55 , Issue.9 , pp. 4040-4050
- Ahmad, S.H.A.¹ Liu, M.² Javidi, T.³ Zhao, Q.⁴ Krishnamachari, B.⁵

19
- 0032222170
- Chernoff-type bound for finite Markov chains
- P. Lezaud, "Chernoff-type bound for finite Markov chains," Ann. Appl. Probab., vol. 8, pp. 849-867, 1998.
- (1998) Ann. Appl. Probab. , vol.8 , pp. 849-867
- Lezaud, P.¹

20
- 62949181077
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
- J. Y. Audibert and R. M. Szepesvári, "Exploration- exploitation tradeoff using variance estimates in multi-armed bandits," Theoretical Comput. Sci., vol. 410, no. 19, pp. 1876-1902, 2009.
- (2009) Theoretical Comput. Sci. , vol.410 , Issue.19 , pp. 1876-1902
- Audibert, J.Y.¹ Szepesvári, R.M.²

21
- 84856091352
- Adaptive learning of uncontrolled restless bandits with logarithmic regret
- Control, Comput., Sep.
- C. Tekin and M. Liu, "Adaptive learning of uncontrolled restless bandits with logarithmic regret," in Proc. 49th Annu. Allerton Conf. Commun., Control, Comput., Sep. 2011, pp. 983-990.
- (2011) Proc. 49th Annu. Allerton Conf. Commun. , pp. 983-990
- Tekin, C.¹ Liu, M.²

22
- 0032628612
- The complexity of optimal queuing network control
- May
- C. Papadimitriou and J. Tsitsiklis, "The complexity of optimal queuing network control," Math. Oper. Res., vol. 24, no. 2, pp. 293-305, May 1999.
- (1999) Math. Oper. Res. , vol.24 , Issue.2 , pp. 293-305
- Papadimitriou, C.¹ Tsitsiklis, J.²

23
- 80051623306
- The non-Bayesian restless multi-armed bandit: A case of near-logarithmic regret
- May
- W. Dai, Y. Gai, B. Krishnamachari, and Q. Zhao, "The non-Bayesian restless multi-armed bandit: A case of near-logarithmic regret," in Proc. Int. Conf. Acoust., Speech Signal Process., May 2011, pp. 2940-2943.
- (2011) Proc. Int. Conf. Acoust., Speech Signal Process. , pp. 2940-2943
- Dai, W.¹ Gai, Y.² Krishnamachari, B.³ Zhao, Q.⁴

24
- 69449097218
- Approximation algorithms for restless bandit problems
- S. Guha, K. Mungala, and P. Shi, "Approximation algorithms for restless bandit problems," in Proc. 20th ACM-SIAM Symp. Discr. Algorithms, 2009, pp. 28-37.
- (2009) Proc. 20th ACM-SIAM Symp. Discr. Algorithms , pp. 28-37
- Guha, S.¹ Mungala, K.² Shi, P.³

25
- 84861588214
- Approximately optimal adaptive learning in opportunistic spectrum access
- Orlando, FL, Mar.
- C. Tekin and M. Liu, "Approximately optimal adaptive learning in opportunistic spectrum access," presented at the presented at the IEEE INFOCOM, Orlando, FL, Mar. 2012.
- (2012) IEEE INFOCOM
- Tekin, C.¹ Liu, M.²

26
- 37349120464
- On the expectation of the maximum of i.i.d. geometric random variables
- B. Eisenberg, "On the expectation of the maximum of i.i.d. geometric random variables," Statist. Probab. Lett., vol. 78, pp. 135-143, 2008.
- (2008) Statist. Probab. Lett. , vol.78 , pp. 135-143
- Eisenberg, B.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.