-
3
-
-
0037709910
-
The nonstochastic multiarmed bandit problem
-
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48-77, 2002.
-
(2002)
SIAM Journal on Computing
, vol.32
, Issue.1
, pp. 48-77
-
-
Auer, P.1
Cesa-Bianchi, N.2
Freund, Y.3
Schapire, R.4
-
7
-
-
65749318481
-
Uber den variabilitatsbereich der fourierschen konstanten von positiven harmonischen funktionen
-
C. Caratheodory. Uber den variabilitatsbereich der fourierschen konstanten von positiven harmonischen funktionen. Rendiconti del Circolo Matematico di Palermo, 32:193-217, 1911.
-
(1911)
Rendiconti Del Circolo Matematico di Palermo
, vol.32
, pp. 193-217
-
-
Caratheodory, C.1
-
8
-
-
33845302015
-
Combining expert advice in reactive environments
-
D. P. de Farias and N. Megiddo. Combining expert advice in reactive environments. Journal of the ACM, 53(5):762-799, 2006.
-
(2006)
Journal of the ACM
, vol.53
, Issue.5
, pp. 762-799
-
-
De Farias, D.P.1
Megiddo, N.2
-
9
-
-
70349277420
-
Online Markov decision processes
-
E. Even-Dar, S. M. Kakade, and Y. Mansour. Online Markov decision processes. Mathematics of Operations Research, 34(3):726-736, 2009.
-
(2009)
Mathematics of Operations Research
, vol.34
, Issue.3
, pp. 726-736
-
-
Even-Dar, E.1
Kakade, S.M.2
Mansour, Y.3
-
10
-
-
77951573287
-
Universal reinforcement learning
-
V. F. Farias, C. C. Moallemi, B. Van Roy, and T. Weissman. Universal reinforcement learning. IEEE Transactions on Information Theory, 56(5): 2441-2454, 2010.
-
(2010)
IEEE Transactions on Information Theory
, vol.56
, Issue.5
, pp. 2441-2454
-
-
Farias, V.F.1
Moallemi, C.C.2
Van Roy, B.3
Weissman, T.4
-
11
-
-
50249167647
-
On polynomial cases of the unichain classification problem for markov decision processes
-
E. A. Feinberg and F. Yang. On polynomial cases of the unichain classification problem for Markov decision processes. Operations Research Letters, 36(5): 527-530, 2008.
-
(2008)
Operations Research Letters
, vol.36
, Issue.5
, pp. 527-530
-
-
Feinberg, E.A.1
Yang, F.2
-
13
-
-
35948943542
-
The on-line shortest path problem under partial monitoring
-
A. Gyorgy, T. Linder, G. Lugosi, and G. Ottucsak. The on-line shortest path problem under partial monitoring. Journal of Machine Learning Research, 8:2369-2403, 2007.
-
(2007)
Journal of Machine Learning Research
, vol.8
, pp. 2369-2403
-
-
Gyorgy, A.1
Linder, T.2
Lugosi, G.3
Ottucsak, G.4
-
16
-
-
85162052729
-
Online Markov decision processes under bandit feedback
-
MIT Press
-
G. Neu, A. Gyorgy, C. Szepesvari, and A. Antos. Online Markov decision processes under bandit feedback. In Advances in Neural Information Processing Systems 23, pages 1804-1812. MIT Press, 2010.
-
(2010)
Advances in Neural Information Processing Systems
, vol.23
, pp. 1804-1812
-
-
Neu, G.1
Gyorgy, A.2
Szepesvari, C.3
Antos, A.4
-
17
-
-
77953539718
-
Online regret bounds for Markov decision processes with deterministic transitions
-
R. Ortner. Online regret bounds for Markov decision processes with deterministic transitions. Theoretical Computer Science, 411(29-30):2684-2695, 2010.
-
(2010)
Theoretical Computer Science
, vol.411
, Issue.29-30
, pp. 2684-2695
-
-
Ortner, R.1
-
18
-
-
77949509398
-
On the possibility of learning in reactive environments with arbitrary dependence
-
D. Ryabko and M. Hutter. On the possibility of learning in reactive environments with arbitrary dependence. Theoretical Computer Science, 405(3):274-284, 2008.
-
(2008)
Theoretical Computer Science
, vol.405
, Issue.3
, pp. 274-284
-
-
Ryabko, D.1
Hutter, M.2
-
21
-
-
70349280578
-
Markov decision processes with arbitrary reward processes
-
J. Y. Yu, S. Mannor, and N. Shimkin. Markov decision processes with arbitrary reward processes. Mathematics of Operations Research, 34(3):737-757, 2009.
-
(2009)
Mathematics of Operations Research
, vol.34
, Issue.3
, pp. 737-757
-
-
Yu, J.Y.1
Mannor, S.2
Shimkin, N.3
|