-
3
-
-
0018454769
-
Fast probabilistic algorithms for Hamiltonian circuits and matchings
-
D. Angluin and L. G. Valiant. Fast probabilistic algorithms for Hamiltonian circuits and matchings. Journal of Computer and System Sciences, 18:155-193, 1979.
-
(1979)
Journal of Computer and System Sciences
, vol.18
, pp. 155-193
-
-
Angluin, D.1
Valiant, L.G.2
-
4
-
-
0029513526
-
Gambling in a rigged casino: The adversarial multi-armed bandit problem
-
IEEE Computer Society Press
-
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proc. 36th Annual Symposium on Foundations of Computer Science, pages 322-331. IEEE Computer Society Press, 1995.
-
(1995)
Proc. 36th Annual Symposium on Foundations of Computer Science
, pp. 322-331
-
-
Auer, P.1
Cesa-Bianchi, N.2
Freund, Y.3
Schapire, R.E.4
-
5
-
-
0037709910
-
The non-stochastic multi-armed bandit problem
-
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The non-stochastic multi-armed bandit problem. SIAM J. on Computing, 32(1):48-77, 2002.
-
(2002)
SIAM J. on Computing
, vol.32
, Issue.1
, pp. 48-77
-
-
Auer, P.1
Cesa-Bianchi, N.2
Freund, Y.3
Schapire, R.E.4
-
9
-
-
0033876515
-
The O.D.E. method for convergence of stochastic approximation and reinforcement learning
-
V. S. Borkar and S.P Meyn. The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim., 38(2):447-469, 2000.
-
(2000)
SIAM J. Control Optim.
, vol.38
, Issue.2
, pp. 447-469
-
-
Borkar, V.S.1
Meyn, S.P.2
-
12
-
-
84947403595
-
Probability inequalities for sums of bounded random variables
-
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301): 13-30, 1963.
-
(1963)
Journal of the American Statistical Association
, vol.58
, Issue.301
, pp. 13-30
-
-
Hoeffding, W.1
-
15
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209-232, 2002.
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 209-232
-
-
Kearns, M.1
Singh, S.2
-
17
-
-
84899026236
-
Finite-sample convergence rates for Q-learning and indirect algorithms
-
M. Kearns and S. P. Singh. Finite-sample convergence rates for Q-learning and indirect algorithms. In Neural Information Processing Systems 10, pages 996-1002, 1998.
-
(1998)
Neural Information Processing Systems
, vol.10
, pp. 996-1002
-
-
Kearns, M.1
Singh, S.P.2
-
18
-
-
3142742614
-
Buffer over-flow management in QoS switches
-
A. Kesselman, Z. Lotker, Y. Mansour, B. Patt-Shamir, B. Schieber, and M. Sviridenko. Buffer over-flow management in QoS switches. SIAM J. on Computing, 33(3):563-583, 2004.
-
(2004)
SIAM J. on Computing
, vol.33
, Issue.3
, pp. 563-583
-
-
Kesselman, A.1
Lotker, Z.2
Mansour, Y.3
Patt-Shamir, B.4
Schieber, B.5
Sviridenko, M.6
-
20
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
-
(1985)
Advances in Applied Mathematics
, vol.6
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
21
-
-
0006193487
-
A modified dynamic programming method for Markov decision problems
-
J. MacQueen. A modified dynamic programming method for Markov decision problems. J. Math. Anal. Appl., 14:38-43, 1966.
-
(1966)
J. Math. Anal. Appl.
, vol.14
, pp. 38-43
-
-
MacQueen, J.1
-
22
-
-
30044441333
-
The sample complexity of exploration in the multi-armed bandit problem
-
S. Mannor and J. N. Tsitsiklis. The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research, 5:623-648, 2004.
-
(2004)
Journal of Machine Learning Research
, vol.5
, pp. 623-648
-
-
Mannor, S.1
Tsitsiklis, J.N.2
-
24
-
-
0036832956
-
Kernel-based reinforcement learning
-
D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 49(2-3): 161-178, 2002.
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 161-178
-
-
Ormoneit, D.1
Sen, S.2
-
26
-
-
84966203785
-
Some aspects of sequential design of experiments
-
H. Robbins. Some aspects of sequential design of experiments. Bull. Amer. Math. Soc., 55:527-535, 1952.
-
(1952)
Bull. Amer. Math. Soc.
, vol.55
, pp. 527-535
-
-
Robbins, H.1
-
27
-
-
0028497385
-
An upper bound on the loss from approximate optimal-value functions
-
S. P. Singh and R. C. Yee. An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16(3):227-233, 1994.
-
(1994)
Machine Learning
, vol.16
, Issue.3
, pp. 227-233
-
-
Singh, S.P.1
Yee, R.C.2
|