-
11
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1): 4-22, 1985.
-
(1985)
Advances in Applied Mathematics
, vol.6
, Issue.1
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
12
-
-
84924051598
-
Mnih. Human-level control through deep reinforcement learning
-
Volodymyr et al. Mnih. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533, 2015.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Volodymyr1
-
13
-
-
84899019264
-
(More) efficient reinforcement learning via posterior sampling
-
Curran Associates, Inc.
-
Ian Osband, Daniel Russo, and Benjamin Van Roy. (More) efficient reinforcement learning via posterior sampling. In NIPS, pages 3003-3011. Curran Associates, Inc., 2013.
-
(2013)
NIPS
, pp. 3003-3011
-
-
Osband, I.1
Russo, D.2
Van Roy, B.3
-
17
-
-
84863522108
-
Bootstrapping data arrays of arbitrary order
-
Art B Owen, Dean Eckles, et al. Bootstrapping data arrays of arbitrary order. The Annals of Applied Statistics, 6(3): 895-927, 2012.
-
(2012)
The Annals of Applied Statistics
, vol.6
, Issue.3
, pp. 895-927
-
-
Owen, A.B.1
Eckles, D.2
-
19
-
-
84904163933
-
Dropout: A simple way to prevent neural networks from overfitting
-
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1): 1929-1958, 2014.
-
(2014)
The Journal of Machine Learning Research
, vol.15
, Issue.1
, pp. 1929-1958
-
-
Srivastava, N.1
Hinton, G.2
Krizhevsky, A.3
Sutskever, I.4
Salakhutdinov, R.5
-
21
-
-
14344258433
-
A Bayesian framework for reinforcement learning
-
Malcolm J. A. Strens. A bayesian framework for reinforcement learning. In ICML, pages 943-950, 2000.
-
(2000)
ICML
, pp. 943-950
-
-
Strens, M.J.A.1
-
23
-
-
0029276036
-
Temporal difference learning and td-gammon
-
Gerald Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3): 58-68, 1995.
-
(1995)
Communications of the ACM
, vol.38
, Issue.3
, pp. 58-68
-
-
Tesauro, G.1
-
24
-
-
0001395850
-
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
-
W.R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4): 285-294, 1933.
-
(1933)
Biometrika
, vol.25
, Issue.3-4
, pp. 285-294
-
-
Thompson, W.R.1
-
27
-
-
84899020590
-
Efficient exploration and value function generalization in deterministic systems
-
Zheng Wen and Benjamin Van Roy. Efficient exploration and value function generalization in deterministic systems. In NIPS, pages 3021-3029, 2013.
-
(2013)
NIPS
, pp. 3021-3029
-
-
Wen, Z.1
Van Roy, B.2
|