메뉴 건너뛰기




Volumn 2, Issue , 2012, Pages 1503-1510

Online bandit learning against an adaptive adversary: From regret to policy regret

Author keywords

[No Author keywords available]

Indexed keywords

ADAPTIVE ADVERSARY; FORMAL DEFINITION; GAME-THEORETIC; INTERNAL REGRET; ON-LINE ALGORITHMS; ONLINE LEARNING ALGORITHMS; STANDARD DEFINITIONS; SUBLINEAR; UNBOUNDED MEMORY;

EID: 84867129684     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (150)

References (25)
  • 1
    • 84898063697 scopus 로고    scopus 로고
    • Competing in the dark: An efficient algorithm for bandit linear optimization
    • Abernethy, J., Hazan, E., and Rakhlin, A. Competing in the dark: An efficient algorithm for bandit linear optimization. In COLT, pp. 263-274, 2008.
    • (2008) COLT , pp. 263-274
    • Abernethy, J.1    Hazan, E.2    Rakhlin, A.3
  • 3
    • 4544345025 scopus 로고    scopus 로고
    • Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches
    • Awerbuch, B. and Kleinberg, R. D. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In STOC, pp. 45-53, 2004.
    • (2004) STOC , pp. 45-53
    • Awerbuch, B.1    Kleinberg, R.D.2
  • 4
    • 80555137396 scopus 로고    scopus 로고
    • High-probability regret bounds for bandit online linear optimization
    • Bartlett, P. L., Dani, V., Hayes, T. P., Kakade, S., Rakhlin, A., and Tewari, A. High-probability regret bounds for bandit online linear optimization. In COLT, pp. 335-342, 2008.
    • (2008) COLT , pp. 335-342
    • Bartlett, P.L.1    Dani, V.2    Hayes, T.P.3    Kakade, S.4    Rakhlin, A.5    Tewari, A.6
  • 5
    • 34547254640 scopus 로고    scopus 로고
    • From external to internal regret
    • Blum, A. and Mansour, Y. From external to internal regret. JMLR, 8:1307-1324, 2007.
    • (2007) JMLR , vol.8 , pp. 1307-1324
    • Blum, A.1    Mansour, Y.2
  • 8
    • 33244456637 scopus 로고    scopus 로고
    • Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary
    • Dani, V. and Hayes, T. P. Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. In SODA, 2006.
    • (2006) SODA
    • Dani, V.1    Hayes, T.P.2
  • 9
    • 33845302015 scopus 로고    scopus 로고
    • Combining expert advice in reactive environments
    • de Farias, D. P. and Megiddo, N. Combining expert advice in reactive environments. Journal of the ACM, 53(5): 762-799, 2006.
    • (2006) Journal of the ACM , vol.53 , Issue.5 , pp. 762-799
    • De Farias, D.P.1    Megiddo, N.2
  • 12
    • 20744454447 scopus 로고    scopus 로고
    • Online convex optimization in the bandit setting: Gradient descent without a gradient
    • Flaxman, A. D., Kalai, A. Tauman, and McMahan, B. H. Online convex optimization in the bandit setting: gradient descent without a gradient. In SODA, pp. 385-394, 2005.
    • (2005) SODA , pp. 385-394
    • Flaxman, A.D.1    Tauman, K.A.2    McMahan, B.H.3
  • 14
    • 33749256026 scopus 로고    scopus 로고
    • Logarithmic regret algorithms for online convex optimization
    • Hazan, E., Kalai, A., Kale, S., and Agarwal, A. Logarithmic regret algorithms for online convex optimization. In COLT, 2006.
    • (2006) COLT
    • Hazan, E.1    Kalai, A.2    Kale, S.3    Agarwal, A.4
  • 15
    • 38049011420 scopus 로고    scopus 로고
    • Nearly tight bounds for the continuumarmed bandit problem
    • Kleinberg, R. Nearly tight bounds for the continuumarmed bandit problem. In NIPS, pp. 697-704, 2004.
    • (2004) NIPS , pp. 697-704
    • Kleinberg, R.1
  • 16
    • 84867136683 scopus 로고    scopus 로고
    • Adaptive bandits: Towards the best history-dependent strategy
    • Maillard, O. and Munos, R. Adaptive bandits: Towards the best history-dependent strategy. In AISTATS, 2010.
    • (2010) AISTATS
    • Maillard, O.1    Munos, R.2
  • 17
    • 24644470905 scopus 로고    scopus 로고
    • Online geometric optimization in the bandit setting against an adaptive adversary
    • McMahan, H. B. and Blum, A. Online geometric optimization in the bandit setting against an adaptive adversary. In COLT, 2004.
    • (2004) COLT
    • McMahan, H.B.1    Blum, A.2
  • 18
    • 0036649565 scopus 로고    scopus 로고
    • Sequential strategies for loss functions with memory
    • Merhav, N., Ordentlich, E., Seroussi, C., and Weinberger, M.J. Sequential strategies for loss functions with memory. IEEE IT, 48(7):1947-1958, 2002.
    • (2002) IEEE IT , vol.48 , Issue.7 , pp. 1947-1958
    • Merhav, N.1    Ordentlich, E.2    Seroussi, C.3    Weinberger, M.J.4
  • 19
    • 0003254250 scopus 로고
    • Interior point polynomial algorithms in convex programming
    • Nesterov, Y. E. and Nemirovsky, A. S. Interior point polynomial algorithms in convex programming. SIAM, 1994.
    • (1994) SIAM
    • Nesterov, Y.E.1    Nemirovsky, A.S.2
  • 20
    • 85162052729 scopus 로고    scopus 로고
    • Online Markov decision processes under bandit feedback
    • Neu, G., György, A., Szepesvári, C., and Antos, A. Online Markov decision processes under bandit feedback. In NIPS, pp. 1804-1812, 2010.
    • (2010) NIPS , pp. 1804-1812
    • Neu, G.1    György, A.2    Szepesvári, C.3    Antos, A.4
  • 21
    • 84966203785 scopus 로고
    • Some aspects of the sequential design of experiments
    • Robbins, H. Some aspects of the sequential design of experiments. Bulletin of the AMS, 58:527-535, 1952.
    • (1952) Bulletin of the AMS , vol.58 , pp. 527-535
    • Robbins, H.1
  • 22
    • 77949509398 scopus 로고    scopus 로고
    • On the possibility of learning in reactive environments with arbitrary dependence
    • Ryabko, D. and Hutter, M. On the possibility of learning in reactive environments with arbitrary dependence. Theor. Comput. Sci., 405(3):274-284, 2008.
    • (2008) Theor. Comput. Sci. , vol.405 , Issue.3 , pp. 274-284
    • Ryabko, D.1    Hutter, M.2
  • 24
    • 70349280578 scopus 로고    scopus 로고
    • Markov decision processes with arbitrary reward processes
    • Yu, J. Y., Mannor, S., and Shimkin, N. Markov decision processes with arbitrary reward processes. Math. of Operations Research, 34(3):737-757, 2009.
    • (2009) Math. of Operations Research , vol.34 , Issue.3 , pp. 737-757
    • Yu, J.Y.1    Mannor, S.2    Shimkin, N.3
  • 25
    • 1942484421 scopus 로고    scopus 로고
    • Online convex programming and generalized infinitesimal gradient ascent
    • Zinkevich, M. Online convex programming and generalized infinitesimal gradient ascent. In ICML, 2003.
    • (2003) ICML
    • Zinkevich, M.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.