-
1
-
-
84898063697
-
Competing in the dark: An efficient algorithm for bandit linear optimization
-
Abernethy, J., Hazan, E., and Rakhlin, A. Competing in the dark: An efficient algorithm for bandit linear optimization. In COLT, pp. 263-274, 2008.
-
(2008)
COLT
, pp. 263-274
-
-
Abernethy, J.1
Hazan, E.2
Rakhlin, A.3
-
2
-
-
0037709910
-
The nonstochastic multiarmed bandit problem
-
Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48-77, 2002.
-
(2002)
SIAM Journal on Computing
, vol.32
, Issue.1
, pp. 48-77
-
-
Auer, P.1
Cesa-Bianchi, N.2
Freund, Y.3
Schapire, R.4
-
3
-
-
4544345025
-
Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches
-
Awerbuch, B. and Kleinberg, R. D. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In STOC, pp. 45-53, 2004.
-
(2004)
STOC
, pp. 45-53
-
-
Awerbuch, B.1
Kleinberg, R.D.2
-
4
-
-
80555137396
-
High-probability regret bounds for bandit online linear optimization
-
Bartlett, P. L., Dani, V., Hayes, T. P., Kakade, S., Rakhlin, A., and Tewari, A. High-probability regret bounds for bandit online linear optimization. In COLT, pp. 335-342, 2008.
-
(2008)
COLT
, pp. 335-342
-
-
Bartlett, P.L.1
Dani, V.2
Hayes, T.P.3
Kakade, S.4
Rakhlin, A.5
Tewari, A.6
-
5
-
-
34547254640
-
From external to internal regret
-
Blum, A. and Mansour, Y. From external to internal regret. JMLR, 8:1307-1324, 2007.
-
(2007)
JMLR
, vol.8
, pp. 1307-1324
-
-
Blum, A.1
Mansour, Y.2
-
8
-
-
33244456637
-
Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary
-
Dani, V. and Hayes, T. P. Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. In SODA, 2006.
-
(2006)
SODA
-
-
Dani, V.1
Hayes, T.P.2
-
9
-
-
33845302015
-
Combining expert advice in reactive environments
-
de Farias, D. P. and Megiddo, N. Combining expert advice in reactive environments. Journal of the ACM, 53(5): 762-799, 2006.
-
(2006)
Journal of the ACM
, vol.53
, Issue.5
, pp. 762-799
-
-
De Farias, D.P.1
Megiddo, N.2
-
10
-
-
80053446822
-
Optimal distributed online prediction
-
Dekel, O., Gilad-Bachrach, R., Shamir, Ohad, and Xiao, Lin. Optimal distributed online prediction. In ICML, 2011.
-
(2011)
ICML
-
-
Dekel, O.1
Gilad-Bachrach, R.2
Shamir, O.3
Xiao, L.4
-
11
-
-
70349277420
-
Online Markov decision processes
-
Even-Dar, E., Kakade, S. M., and Mansour, Y. Online Markov decision processes. Math. of Operations Research, 34(3):726-736, 2009.
-
(2009)
Math. of Operations Research
, vol.34
, Issue.3
, pp. 726-736
-
-
Even-Dar, E.1
Kakade, S.M.2
Mansour, Y.3
-
12
-
-
20744454447
-
Online convex optimization in the bandit setting: Gradient descent without a gradient
-
Flaxman, A. D., Kalai, A. Tauman, and McMahan, B. H. Online convex optimization in the bandit setting: gradient descent without a gradient. In SODA, pp. 385-394, 2005.
-
(2005)
SODA
, pp. 385-394
-
-
Flaxman, A.D.1
Tauman, K.A.2
McMahan, B.H.3
-
14
-
-
33749256026
-
Logarithmic regret algorithms for online convex optimization
-
Hazan, E., Kalai, A., Kale, S., and Agarwal, A. Logarithmic regret algorithms for online convex optimization. In COLT, 2006.
-
(2006)
COLT
-
-
Hazan, E.1
Kalai, A.2
Kale, S.3
Agarwal, A.4
-
15
-
-
38049011420
-
Nearly tight bounds for the continuumarmed bandit problem
-
Kleinberg, R. Nearly tight bounds for the continuumarmed bandit problem. In NIPS, pp. 697-704, 2004.
-
(2004)
NIPS
, pp. 697-704
-
-
Kleinberg, R.1
-
16
-
-
84867136683
-
Adaptive bandits: Towards the best history-dependent strategy
-
Maillard, O. and Munos, R. Adaptive bandits: Towards the best history-dependent strategy. In AISTATS, 2010.
-
(2010)
AISTATS
-
-
Maillard, O.1
Munos, R.2
-
17
-
-
24644470905
-
Online geometric optimization in the bandit setting against an adaptive adversary
-
McMahan, H. B. and Blum, A. Online geometric optimization in the bandit setting against an adaptive adversary. In COLT, 2004.
-
(2004)
COLT
-
-
McMahan, H.B.1
Blum, A.2
-
18
-
-
0036649565
-
Sequential strategies for loss functions with memory
-
Merhav, N., Ordentlich, E., Seroussi, C., and Weinberger, M.J. Sequential strategies for loss functions with memory. IEEE IT, 48(7):1947-1958, 2002.
-
(2002)
IEEE IT
, vol.48
, Issue.7
, pp. 1947-1958
-
-
Merhav, N.1
Ordentlich, E.2
Seroussi, C.3
Weinberger, M.J.4
-
19
-
-
0003254250
-
Interior point polynomial algorithms in convex programming
-
Nesterov, Y. E. and Nemirovsky, A. S. Interior point polynomial algorithms in convex programming. SIAM, 1994.
-
(1994)
SIAM
-
-
Nesterov, Y.E.1
Nemirovsky, A.S.2
-
20
-
-
85162052729
-
Online Markov decision processes under bandit feedback
-
Neu, G., György, A., Szepesvári, C., and Antos, A. Online Markov decision processes under bandit feedback. In NIPS, pp. 1804-1812, 2010.
-
(2010)
NIPS
, pp. 1804-1812
-
-
Neu, G.1
György, A.2
Szepesvári, C.3
Antos, A.4
-
21
-
-
84966203785
-
Some aspects of the sequential design of experiments
-
Robbins, H. Some aspects of the sequential design of experiments. Bulletin of the AMS, 58:527-535, 1952.
-
(1952)
Bulletin of the AMS
, vol.58
, pp. 527-535
-
-
Robbins, H.1
-
22
-
-
77949509398
-
On the possibility of learning in reactive environments with arbitrary dependence
-
Ryabko, D. and Hutter, M. On the possibility of learning in reactive environments with arbitrary dependence. Theor. Comput. Sci., 405(3):274-284, 2008.
-
(2008)
Theor. Comput. Sci.
, vol.405
, Issue.3
, pp. 274-284
-
-
Ryabko, D.1
Hutter, M.2
-
24
-
-
70349280578
-
Markov decision processes with arbitrary reward processes
-
Yu, J. Y., Mannor, S., and Shimkin, N. Markov decision processes with arbitrary reward processes. Math. of Operations Research, 34(3):737-757, 2009.
-
(2009)
Math. of Operations Research
, vol.34
, Issue.3
, pp. 737-757
-
-
Yu, J.Y.1
Mannor, S.2
Shimkin, N.3
-
25
-
-
1942484421
-
Online convex programming and generalized infinitesimal gradient ascent
-
Zinkevich, M. Online convex programming and generalized infinitesimal gradient ascent. In ICML, 2003.
-
(2003)
ICML
-
-
Zinkevich, M.1
|