SCOPUS 정보 검색 플랫폼

2011 49th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2011

Volumn , Issue , 2011, Pages 983-990

Adaptive learning of uncontrolled restless bandits with logarithmic regret

(2) Tekin, Cem a Liu, Mingyan a

a UNIVERSITY OF MICHIGAN (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ADAPTIVE LEARNING; AVERAGE REWARD; FINITE HORIZONS; INFINITE HORIZONS; OPTIMAL POLICIES; OPTIMAL STATIONARY POLICY; RESTLESS BANDIT; STATE TRANSITIONS; TRANSITION PROBABILITIES;

COMMUNICATION; LEARNING ALGORITHMS;

STRUCTURAL OPTIMIZATION;

EID: 84856091352 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/Allerton.2011.6120273 Document Type: Conference Paper

Times cited : (19)

References (21)

1
- 0019037868
- Optimal infinite-horizon undiscounted control of finite probabilistic systems
- L. K. Platzman, "Optimal infinite-horizon undiscounted control of finite probabilistic systems," SIAM J. Control Optim., vol. 18, pp. 362-380, 1980.
- (1980) SIAM J. Control Optim. , vol.18 , pp. 362-380
- Platzman, L.K.¹

2
- 28544443262
- On the existence of stationary optimal policies for partially observed MDPs under the long-run average cost criterion
- DOI 10.1016/j.sysconle.2005.06.009, PII S016769110500109X
- S. P. Hsu, D. M. Chuang, and A. Arapostathis, "On the existence of stationary optimal policies for partially observed mdps under the long-run average cost criterion," Systems and Control Letters, vol. 55, pp. 165-173, 2006. (Pubitemid 41745435)
- (2006) Systems and Control Letters , vol.55 , Issue.2 , pp. 165-173
- Hsu, S.-P.¹ Chuang, D.-M.² Arapostathis, A.³

3
- 0002899547
- Asymptotically efficient adaptive allocation rules
- T. Lai and H. Robbins, "Asymptotically efficient adaptive allocation rules," Advances in Applied Mathematics, vol. 6, pp. 4-22, 1985.
- (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
- Lai, T.¹ Robbins, H.²

4
- 0000616723
- Sample mean based index policies with o(log n) regret for the multi-armed bandit problem
- December
- R. Agrawal, "Sample mean based index policies with o(log n) regret for the multi-armed bandit problem," Advances in Applied Probability, vol. 27, no. 4, pp. 1054-1078, December 1995.
- (1995) Advances in Applied Probability , vol.27 , Issue.4 , pp. 1054-1078
- Agrawal, R.¹

5
- 0023453059
- Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part i: Iid rewards
- November
- V. Anantharam, P. Varaiya, and J. . Walrand, "Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part i: Iid rewards," IEEE Trans. Automat. Contr., pp. 968-975, November 1987.
- (1987) IEEE Trans. Automat. Contr. , pp. 968-975
- Anantharam, V.¹ Varaiya, P.² Walrand, J.³

6
- 0023450663
- Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part ii: Markovian rewards
- -, "Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part ii: Markovian rewards," IEEE Trans. Automat. Contr., pp. 977-982, November 1987.
- (1987) IEEE Trans. Automat. Contr. , Issue.NOVEMBER , pp. 977-982
- Anantharam, V.¹ Varaiya, P.² Walrand, J.³

7
- 0036568025
- Finite-time analysis of the multiarmed bandit problem
- P. Auer, N. Cesa-Bianchi, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," Machine Learning, vol. 47, p. 235256, 2002.
- (2002) Machine Learning , vol.47 , pp. 235256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

8
- 79952397795
- Online algorithms for the multi-armed bandit problem with markovian rewards
- C. Tekin and M. Liu, "Online algorithms for the multi-armed bandit problem with markovian rewards," in Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computation, September.
- Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computation, September
- Tekin, C.¹ Liu, M.²

9
- 79960884459
- Online learning in opportunistic spectrum access: A restless bandit approach
- -, "Online learning in opportunistic spectrum access: A restless bandit approach," in 30th IEEE International Conference on Computer Communications (INFOCOM), April 2011.
- 30th IEEE International Conference on Computer Communications (INFOCOM), April 2011
- Tekin, C.¹ Liu, M.²

10
- 84856092940
- -, "Online learning of rested and restless bandits,"http:// arxiv.org/abs/1102.3508v1.
- Online Learning of Rested and Restless Bandits
- Tekin, C.¹ Liu, M.²

11
- 84874338536
- Performance and convergence of multi-user online learning
- -, "Performance and convergence of multi-user online learning,"in 2nd International ICST Conference on Game Theory for Networks (GAMENETS), April 2011.
- 2nd International ICST Conference on Game Theory for Networks (GAMENETS), April 2011
- Tekin, C.¹ Liu, M.²

12
- 79960867716
- K. Liu and Q. Zhao, "Distributed learning in multi-armed bandit with multiple players, http://arxiv.org/abs/0910.2065."
- Distributed Learning in Multi-armed Bandit with Multiple Players
- Liu, K.¹ Zhao, Q.²

13
- 77953320021
- Opportunistic spectrum access with multiple players: Learning under competition
- A. Anandkumar, N. Michael, and A. Tang, "Opportunistic spectrum access with multiple players: Learning under competition," in Proc. of IEEE INFOCOM, March 2010.
- Proc. of IEEE INFOCOM, March 2010
- Anandkumar, A.¹ Michael, N.² Tang, A.³

14
- 0031070051
- Optimal adaptive policies for markov decision processes
- A. N. Burnetas and M. N. Katehakis, "Optimal adaptive policies for markov decision processes," Mathematics of Operations Research, vol. 22, no. 1, pp. 222-255, 1997.
- (1997) Mathematics of Operations Research , vol.22 , Issue.1 , pp. 222-255
- Burnetas, A.N.¹ Katehakis, M.N.²

15
- 85162041468
- Optimistic linear programming gives logarithmic regret for irreducible mdps
- A. Tewari and P. Bartlett, "Optimistic linear programming gives logarithmic regret for irreducible mdps," Advances in Neural Information Processing Systems, vol. 20, pp. 1505-1512, 2008.
- (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 1505-1512
- Tewari, A.¹ Bartlett, P.²

16
- 84856092939
- C. Tekin and M. Liu, "Adaptive learning of uncontrolled restless bandits with logarithmic regret," http://arxiv.org/abs/1107.4042.
- Adaptive Learning of Uncontrolled Restless Bandits with Logarithmic Regret
- Tekin, C.¹ Liu, M.²

17
- 27944497396
- Senstivity and convergence of uniformly ergodic markov chains
- A. Y. Mitrophanov, "Senstivity and convergence of uniformly ergodic markov chains," J. Appl. Prob., vol. 42, pp. 1003-1014, 2005.
- (2005) J. Appl. Prob. , vol.42 , pp. 1003-1014
- Mitrophanov, A.Y.¹

18
- 0032628612
- The complexity of optimal queuing network control
- C. H. Papadimitriou and J. N. Tsitsiklis, "The complexity of optimal queuing network control," Math. Oper. Res., vol. 24, no. 2, pp. 293-305, 1999.
- (1999) Math. Oper. Res. , vol.24 , Issue.2 , pp. 293-305
- Papadimitriou, C.H.¹ Tsitsiklis, J.N.²

19
- 0001043843
- Restless bandits
- P. Whitlle, "Restless bandits," J. Appl. Prob., pp. 301-313, 1988.
- (1988) J. Appl. Prob. , pp. 301-313
- Whitlle, P.¹

20
- 69449097218
- Approximation algorithms for restless bandit problems
- S. Guha, K. Mungala, and P. Shi, "Approximation algorithms for restless bandit problems," 20th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 28-37, 2009.
- (2009) 20th ACM-SIAM Symp. on Discrete Algorithms (SODA) , pp. 28-37
- Guha, S.¹ Mungala, K.² Shi, P.³

21
- 69449100462
- Optimality of myopic sensing in multi-channel opportunistic access
- September
- S. H. A. Ahmad, M. Liu, T. Javidi, Q. Zhao, and B. Krishnamachari, "Optimality of myopic sensing in multi-channel opportunistic access,"IEEE Transactions on Information Theory, vol. 55, no. 9, pp. 4040-4050, September 2009.
- (2009) IEEE Transactions on Information Theory , vol.55 , Issue.9 , pp. 4040-4050
- Ahmad, S.H.A.¹ Liu, M.² Javidi, T.³ Zhao, Q.⁴ Krishnamachari, B.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.