SCOPUS 정보 검색 플랫폼

Proceedings of the 29th International Conference on Machine Learning, ICML 2012

Volumn 2, Issue , 2012, Pages 1503-1510

Online bandit learning against an adaptive adversary: From regret to policy regret

(3) Arora, Raman a Dekel, Ofer b Tewari, Ambuj c

a TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO (United States)

b MICROSOFT RESEARCH (United States)

c University of Texas at Austin (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ADAPTIVE ADVERSARY; FORMAL DEFINITION; GAME-THEORETIC; INTERNAL REGRET; ON-LINE ALGORITHMS; ONLINE LEARNING ALGORITHMS; STANDARD DEFINITIONS; SUBLINEAR; UNBOUNDED MEMORY;

GAME THEORY; LEARNING ALGORITHMS;

LEARNING SYSTEMS;

EID: 84867129684 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (150)

References (25)

1
- 84898063697
- Competing in the dark: An efficient algorithm for bandit linear optimization
- Abernethy, J., Hazan, E., and Rakhlin, A. Competing in the dark: An efficient algorithm for bandit linear optimization. In COLT, pp. 263-274, 2008.
- (2008) COLT , pp. 263-274
- Abernethy, J.¹ Hazan, E.² Rakhlin, A.³

2
- 0037709910
- The nonstochastic multiarmed bandit problem
- Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48-77, 2002.
- (2002) SIAM Journal on Computing , vol.32 , Issue.1 , pp. 48-77
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.⁴

3
- 4544345025
- Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches
- Awerbuch, B. and Kleinberg, R. D. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In STOC, pp. 45-53, 2004.
- (2004) STOC , pp. 45-53
- Awerbuch, B.¹ Kleinberg, R.D.²

4
- 80555137396
- High-probability regret bounds for bandit online linear optimization
- Bartlett, P. L., Dani, V., Hayes, T. P., Kakade, S., Rakhlin, A., and Tewari, A. High-probability regret bounds for bandit online linear optimization. In COLT, pp. 335-342, 2008.
- (2008) COLT , pp. 335-342
- Bartlett, P.L.¹ Dani, V.² Hayes, T.P.³ Kakade, S.⁴ Rakhlin, A.⁵ Tewari, A.⁶

5
- 34547254640
- From external to internal regret
- Blum, A. and Mansour, Y. From external to internal regret. JMLR, 8:1307-1324, 2007.
- (2007) JMLR , vol.8 , pp. 1307-1324
- Blum, A.¹ Mansour, Y.²

6
- 0004134209
- Cambridge University Press
- Borodin, A. and El-Yaniv, R. Online computation and competitive analysis. Cambridge University Press, 1998.
- (1998) Online Computation and Competitive Analysis
- Borodin, A.¹ El-Yaniv, R.²

7
- 84926078662
- Cambridge University Press
- Cesa-Bianchi, N. and Lugosi, G. Prediction, learning, and games. Cambridge University Press, 2006.
- (2006) Prediction, Learning, and Games
- Cesa-Bianchi, N.¹ Lugosi, G.²

8
- 33244456637
- Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary
- Dani, V. and Hayes, T. P. Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. In SODA, 2006.
- (2006) SODA
- Dani, V.¹ Hayes, T.P.²

9
- 33845302015
- Combining expert advice in reactive environments
- de Farias, D. P. and Megiddo, N. Combining expert advice in reactive environments. Journal of the ACM, 53(5): 762-799, 2006.
- (2006) Journal of the ACM , vol.53 , Issue.5 , pp. 762-799
- De Farias, D.P.¹ Megiddo, N.²

10
- 80053446822
- Optimal distributed online prediction
- Dekel, O., Gilad-Bachrach, R., Shamir, Ohad, and Xiao, Lin. Optimal distributed online prediction. In ICML, 2011.
- (2011) ICML
- Dekel, O.¹ Gilad-Bachrach, R.² Shamir, O.³ Xiao, L.⁴

11
- 70349277420
- Online Markov decision processes
- Even-Dar, E., Kakade, S. M., and Mansour, Y. Online Markov decision processes. Math. of Operations Research, 34(3):726-736, 2009.
- (2009) Math. of Operations Research , vol.34 , Issue.3 , pp. 726-736
- Even-Dar, E.¹ Kakade, S.M.² Mansour, Y.³

12
- 20744454447
- Online convex optimization in the bandit setting: Gradient descent without a gradient
- Flaxman, A. D., Kalai, A. Tauman, and McMahan, B. H. Online convex optimization in the bandit setting: gradient descent without a gradient. In SODA, pp. 385-394, 2005.
- (2005) SODA , pp. 385-394
- Flaxman, A.D.¹ Tauman, K.A.² McMahan, B.H.³

13
- 80053442097
- Online submodular minimization
- Hazan, E. and Kale, S. Online submodular minimization. In Advances in Neural Information Processing Systems (NIPS), 2009.
- (2009) Advances in Neural Information Processing Systems (NIPS)
- Hazan, E.¹ Kale, S.²

14
- 33749256026
- Logarithmic regret algorithms for online convex optimization
- Hazan, E., Kalai, A., Kale, S., and Agarwal, A. Logarithmic regret algorithms for online convex optimization. In COLT, 2006.
- (2006) COLT
- Hazan, E.¹ Kalai, A.² Kale, S.³ Agarwal, A.⁴

15
- 38049011420
- Nearly tight bounds for the continuumarmed bandit problem
- Kleinberg, R. Nearly tight bounds for the continuumarmed bandit problem. In NIPS, pp. 697-704, 2004.
- (2004) NIPS , pp. 697-704
- Kleinberg, R.¹

16
- 84867136683
- Adaptive bandits: Towards the best history-dependent strategy
- Maillard, O. and Munos, R. Adaptive bandits: Towards the best history-dependent strategy. In AISTATS, 2010.
- (2010) AISTATS
- Maillard, O.¹ Munos, R.²

17
- 24644470905
- Online geometric optimization in the bandit setting against an adaptive adversary
- McMahan, H. B. and Blum, A. Online geometric optimization in the bandit setting against an adaptive adversary. In COLT, 2004.
- (2004) COLT
- McMahan, H.B.¹ Blum, A.²

18
- 0036649565
- Sequential strategies for loss functions with memory
- Merhav, N., Ordentlich, E., Seroussi, C., and Weinberger, M.J. Sequential strategies for loss functions with memory. IEEE IT, 48(7):1947-1958, 2002.
- (2002) IEEE IT , vol.48 , Issue.7 , pp. 1947-1958
- Merhav, N.¹ Ordentlich, E.² Seroussi, C.³ Weinberger, M.J.⁴

19
- 0003254250
- Interior point polynomial algorithms in convex programming
- Nesterov, Y. E. and Nemirovsky, A. S. Interior point polynomial algorithms in convex programming. SIAM, 1994.
- (1994) SIAM
- Nesterov, Y.E.¹ Nemirovsky, A.S.²

20
- 85162052729
- Online Markov decision processes under bandit feedback
- Neu, G., György, A., Szepesvári, C., and Antos, A. Online Markov decision processes under bandit feedback. In NIPS, pp. 1804-1812, 2010.
- (2010) NIPS , pp. 1804-1812
- Neu, G.¹ György, A.² Szepesvári, C.³ Antos, A.⁴

21
- 84966203785
- Some aspects of the sequential design of experiments
- Robbins, H. Some aspects of the sequential design of experiments. Bulletin of the AMS, 58:527-535, 1952.
- (1952) Bulletin of the AMS , vol.58 , pp. 527-535
- Robbins, H.¹

22
- 77949509398
- On the possibility of learning in reactive environments with arbitrary dependence
- Ryabko, D. and Hutter, M. On the possibility of learning in reactive environments with arbitrary dependence. Theor. Comput. Sci., 405(3):274-284, 2008.
- (2008) Theor. Comput. Sci. , vol.405 , Issue.3 , pp. 274-284
- Ryabko, D.¹ Hutter, M.²

23
- 77955790905
- Algorithms for Reinforcement Learning
- Morgan & Claypool Publishers
- Szepesvári, C. Algorithms for Reinforcement Learning. Synth. Lectures in A.I. and Machine Learning. Morgan & Claypool Publishers, 2010.
- (2010) Synth. Lectures in A.I. and Machine Learning
- Szepesvári, C.¹

24
- 70349280578
- Markov decision processes with arbitrary reward processes
- Yu, J. Y., Mannor, S., and Shimkin, N. Markov decision processes with arbitrary reward processes. Math. of Operations Research, 34(3):737-757, 2009.
- (2009) Math. of Operations Research , vol.34 , Issue.3 , pp. 737-757
- Yu, J.Y.¹ Mannor, S.² Shimkin, N.³

25
- 1942484421
- Online convex programming and generalized infinitesimal gradient ascent
- Zinkevich, M. Online convex programming and generalized infinitesimal gradient ascent. In ICML, 2003.
- (2003) ICML
- Zinkevich, M.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.