-
1
-
-
84862535425
-
Interior-point methods for full-information and bandit online learning
-
Abernethy, J., Hazan, E., and Rakhlin, A. Interior-point methods for full-information and bandit online learning. IEEE Transactions on Information Theory, 58(7):4164-4175, 2012.
-
(2012)
IEEE Transactions on Information Theory
, vol.58
, Issue.7
, pp. 4164-4175
-
-
Abernethy, J.1
Hazan, E.2
Rakhlin, A.3
-
3
-
-
84886067084
-
Deterministic MDPs with adversarial rewards and bandit feedback
-
Arora, R., Dekel, O., and Tewari, A. Deterministic MDPs with adversarial rewards and bandit feedback. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, pp. 93-101, 2012.
-
(2012)
Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence
, pp. 93-101
-
-
Arora, R.1
Dekel, O.2
Tewari, A.3
-
4
-
-
84897474760
-
Minimax policies for combinatorial prediction games
-
Audibert, Jean-Yves, Bubeck, Sébastien, and Lugosi, Gábor. Minimax policies for combinatorial prediction games. Journal of Machine Learning Research - Proceedings Track, 19:107-132, 2011.
-
(2011)
Journal of Machine Learning Research - Proceedings Track
, vol.19
, pp. 107-132
-
-
Audibert, J.-Y.1
Bubeck, S.2
Lugosi, G.3
-
7
-
-
85162050055
-
The price of bandit information for online optimization
-
Dani, Varsha, Hayes, Thomas P., and Kakade, Sham. The price of bandit information for online optimization. In Advances in Neural Information Processing Systems 20, 2007.
-
(2007)
Advances in Neural Information Processing Systems
, vol.20
-
-
Dani, V.1
Hayes, T.P.2
Kakade, S.3
-
8
-
-
0344560051
-
Periods of connected networks and powers of nonnegative matrices
-
Denardo, E. V. Periods of connected networks and powers of nonnegative matrices. Mathematics of Operations Research, 2(1):20-24, 1977.
-
(1977)
Mathematics of Operations Research
, vol.2
, Issue.1
, pp. 20-24
-
-
Denardo, E.V.1
-
9
-
-
70349277420
-
Online markov decision processes
-
Even-Dar, E., Kakade, S., and Mansour, Y. Online markov decision processes. Mathematics of Operations Research, 34(3):726-736, 2009.
-
(2009)
Mathematics of Operations Research
, vol.34
, Issue.3
, pp. 726-736
-
-
Even-Dar, E.1
Kakade, S.2
Mansour, Y.3
-
10
-
-
77951573287
-
Universal reinforcement learning
-
Farias, V. F., Moallemi, C. C., Roy, B. Van, and Weissman, T. Universal reinforcement learning. IEEE Transactions on Information Theory, 56(5):2441-2454, 2010.
-
(2010)
IEEE Transactions on Information Theory
, vol.56
, Issue.5
, pp. 2441-2454
-
-
Farias, V.F.1
Moallemi, C.C.2
Van Roy, B.3
Weissman, T.4
-
11
-
-
50249167647
-
On polynomial cases of the unichain classification problem for Markov decision processes
-
Feinberg, E. A. and Yang, F. On polynomial cases of the unichain classification problem for Markov decision processes. Operations Research Letters, 36(5): 527-530, 2008.
-
(2008)
Operations Research Letters
, vol.36
, Issue.5
, pp. 527-530
-
-
Feinberg, E.A.1
Yang, F.2
-
12
-
-
85162052729
-
Online Markov decision processes under bandit feedback
-
Neu, G., György, A., Szepesvári, C., and Antos, A. Online Markov decision processes under bandit feedback. In Advances in Neural Information Processing Systems 23, pp. 1804-1812, 2010.
-
(2010)
Advances in Neural Information Processing Systems
, vol.23
, pp. 1804-1812
-
-
Neu, G.1
György, A.2
Szepesvári, C.3
Antos, A.4
-
13
-
-
77953539718
-
Online regret bounds for Markov decision processes with deterministic transitions
-
Ortner, R. Online regret bounds for Markov decision processes with deterministic transitions. Theoretical Computer Science, 411 (29-30):2684-2695, 2010.
-
(2010)
Theoretical Computer Science
, vol.411
, Issue.29-30
, pp. 2684-2695
-
-
Ortner, R.1
-
16
-
-
70349280578
-
Markov decision processes with arbitrary reward processes
-
Yu, J. Y., Mannor, S., and Shimkin, N. Markov decision processes with arbitrary reward processes. Mathematics of Operations Research, 34(3):737-757, 2009.
-
(2009)
Mathematics of Operations Research
, vol.34
, Issue.3
, pp. 737-757
-
-
Yu, J.Y.1
Mannor, S.2
Shimkin, N.3
|