-
1
-
-
0000616723
-
Sample mean based index policies with O(log n) regret for the multiarmed bandit problem
-
Agrawal, R. 1995. Sample mean based index policies with O(log n) regret for the multiarmed bandit problem. Advances Appl. Probab. 27 1054-1078.
-
(1995)
Advances Appl. Probab.
, vol.27
, pp. 1054-1078
-
-
Agrawal, R.1
-
2
-
-
0024886640
-
Asymptotically efficient adaptive allocation schemes for controlled Markov chains: Finite parameter space
-
Agrawal, R., D. Teneketzis, V. Anantharam. 1989. Asymptotically efficient adaptive allocation schemes for controlled Markov chains: Finite parameter space. IEEE Trans. Automat. Control 34 1249-1259.
-
(1989)
IEEE Trans. Automat. Control
, vol.34
, pp. 1249-1259
-
-
Agrawal, R.1
Teneketzis, D.2
Anantharam, V.3
-
3
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
Auer, P., N. Cesa-Bianchi, P. Fisher. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47 235-256.
-
(2002)
Machine Learning
, vol.47
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fisher, P.3
-
5
-
-
0008831286
-
Differential training of rollout policies
-
Allerton Park, IL
-
Bertsekas, D. P. 1997. Differential training of rollout policies. Proc. 35th Allerton Conf. Communication, Control, Comput., Allerton Park, IL, 913-922.
-
(1997)
Proc. 35th Allerton Conf. Communication, Control, Comput.
, pp. 913-922
-
-
Bertsekas, D.P.1
-
6
-
-
0034264701
-
A survey of computational complexity results in systems and control
-
Blondel, V. D., J. Tsitsiklis. 2000. A survey of computational complexity results in systems and control. Automatica 36 1249-1274.
-
(2000)
Automatica
, vol.36
, pp. 1249-1274
-
-
Blondel, V.D.1
Tsitsiklis, J.2
-
7
-
-
0031590025
-
Pricing American-style securities using simulation
-
Broadie, M., P. Glasserman. 1997. Pricing American-style securities using simulation. J. Econom. Dynamics Control 21 1323-1352.
-
(1997)
J. Econom. Dynamics Control
, vol.21
, pp. 1323-1352
-
-
Broadie, M.1
Glasserman, P.2
-
8
-
-
0007163041
-
Finite-time regret bounds for the multiarmed bandit problem
-
Morgan Kaufmann Publishers, San Francisco, CA
-
Cesa-Bianchi, N., P. Fisher. 1998. Finite-time regret bounds for the multiarmed bandit problem. Proc. 15th Int. Conf. Machine Learning. Morgan Kaufmann Publishers, San Francisco, CA, 101-108.
-
(1998)
Proc. 15th Int. Conf. Machine Learning
, pp. 101-108
-
-
Cesa-Bianchi, N.1
Fisher, P.2
-
9
-
-
0004116989
-
-
MIT Press, Cambridge, MA
-
Cormen, T. H., C. E. Leiserson, R. L. Rivest. 1990. Introduction to Algorithms. MIT Press, Cambridge, MA.
-
(1990)
Introduction to Algorithms
-
-
Cormen, T.H.1
Leiserson, C.E.2
Rivest, R.L.3
-
10
-
-
0031145551
-
Asymptotically efficient adaptive choice of control laws in controlled Markov chains
-
Graves, T. L., T. L. Lai. 1997. Asymptotically efficient adaptive choice of control laws in controlled Markov chains. SIAM J. Control Optim. 35 715-743.
-
(1997)
SIAM J. Control Optim.
, vol.35
, pp. 715-743
-
-
Graves, T.L.1
Lai, T.L.2
-
12
-
-
0025502594
-
Error bounds for rolling horizon policies in discrete-time Markov control processes
-
Hernández-Lerma, O., J. B. Lasserre. 1990. Error bounds for rolling horizon policies in discrete-time Markov control processes. IEEE Trans. Automat. Control 35 1118-1124.
-
(1990)
IEEE Trans. Automat. Control
, vol.35
, pp. 1118-1124
-
-
Hernández-Lerma, O.1
Lasserre, J.B.2
-
13
-
-
84947403595
-
Probability inequalities for sums of bounded random variables
-
Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13-30.
-
(1963)
J. Amer. Statist. Assoc.
, vol.58
, pp. 13-30
-
-
Hoeffding, W.1
-
14
-
-
0036832951
-
A sparse sampling algorithm for near-optimal planning in large Markov decision processes
-
Kearns, M., Y. Mansour, A. Y. Ng. 2001. A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning 49 193-208.
-
(2001)
Machine Learning
, vol.49
, pp. 193-208
-
-
Kearns, M.1
Mansour, Y.2
Ng, A.Y.3
-
15
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
Lai, T., H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances Appl. Math. 6 4-22.
-
(1985)
Advances Appl. Math.
, vol.6
, pp. 4-22
-
-
Lai, T.1
Robbins, H.2
|