-
2
-
-
0000616723
-
Sample mean based index policies with o(logn) regret for the multi-armed bandit problem
-
R. Agrawal. Sample mean based index policies with o(logn) regret for the multi-armed bandit problem. Advances in Applied Mathematics, 27:1054-1078, 1995a.
-
(1995)
Advances in Applied Mathematics
, vol.27
, pp. 1054-1078
-
-
Agrawal, R.1
-
4
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
DOI 10.1023/A:1013689704352, Computational Learning Theory
-
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning Journal, 47(2-3):235-256, 2002a. (Pubitemid 34126111)
-
(2002)
Machine Learning
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
5
-
-
0037709910
-
The non-stochastic multi-armed bandit problem
-
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. The non-stochastic multi-armed bandit problem. SIAM Journal on Computing, 32(1):48-77, 2002b.
-
(2002)
SIAM Journal on Computing
, vol.32
, Issue.1
, pp. 48-77
-
-
Auer, P.1
Cesa-Bianchi, N.2
Freund, Y.3
Schapire, R.4
-
8
-
-
77952027689
-
Online optimization in X-armed bandits
-
D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors
-
S. Bubeck, R. Munos, G. Stoltz, and Cs. Szepesvari. Online optimization in X-armed bandits. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 201-208, 2009.
-
(2009)
Advances in Neural Information Processing Systems
, vol.21
, pp. 201-208
-
-
Bubeck, S.1
Munos, R.2
Stoltz, G.3
Szepesvari, Cs.4
-
11
-
-
55249127519
-
Progressive strategies for Monte-Carlo tree search
-
G.M.J. Chaslot, M.H.M. Winands, H. Herik, J. Uiterwijk, and B. Bouzy. Progressive strategies for Monte-Carlo tree search. New Mathematics and Natural Computation, 4(3):343-357, 2008.
-
(2008)
New Mathematics and Natural Computation
, vol.4
, Issue.3
, pp. 343-357
-
-
Chaslot, G.M.J.1
Winands, M.H.M.2
Herik, H.3
Uiterwijk, J.4
Bouzy, B.5
-
12
-
-
67649577204
-
Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces
-
E. Cope. Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces. IEEE Transactions on Automatic Control, 54(6):1243-1253, 2009.
-
(2009)
IEEE Transactions on Automatic Control
, vol.54
, Issue.6
, pp. 1243-1253
-
-
Cope, E.1
-
19
-
-
84891584370
-
-
Wiley-Interscience Series in Systems and Optimization. Wiley, Chichester, NY
-
J. C. Gittins. Multi-armed Bandit Allocation Indices. Wiley-Interscience Series in Systems and Optimization. Wiley, Chichester, NY, 1989.
-
(1989)
Multi-armed Bandit Allocation Indices
-
-
Gittins, J.C.1
-
20
-
-
84947403595
-
Probability inequalities for sums of bounded random variables
-
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13-30, 1963.
-
(1963)
Journal of the American Statistical Association
, vol.58
, pp. 13-30
-
-
Hoeffding, W.1
-
25
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
-
(1985)
Advances in Applied Mathematics
, vol.6
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
26
-
-
84966203785
-
Some aspects of the sequential design of experiments
-
H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematics Society, 58:527-535, 1952.
-
(1952)
Bulletin of the American Mathematics Society
, vol.58
, pp. 527-535
-
-
Robbins, H.1
-
27
-
-
78649814982
-
Addressing NP-complete puzzles with Monte-Carlo methods
-
The Society for the study of Artificial Intelligence and Simulation of Behaviour
-
M.P.D. Schadd, M.H.M. Winands, H.J. van den Herik, and H. Aldewereld. Addressing NP-complete puzzles with Monte-Carlo methods. In Proceedings of the AISB 2008 Symposium on Logic and the Simulation of Interaction and Reasoning, volume 9, pages 55-61. The Society for the study of Artificial Intelligence and Simulation of Behaviour, 2008.
-
(2008)
Proceedings of the AISB 2008 Symposium on Logic and the Simulation of Interaction and Reasoning
, vol.9
, pp. 55-61
-
-
Schadd, M.P.D.1
Winands, M.H.M.2
Van Den Herik, H.J.3
Aldewereld, H.4
|