-
2
-
-
0000616723
-
Sample mean based index policies with o (log n) regret for the multi-armed bandit problem
-
Agrawal, R. (1995b). Sample mean based index policies with o (log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27, 1054-1078.
-
(1995)
Advances in Applied Probability
, vol.27
, pp. 1054-1078
-
-
Agrawal, R.1
-
3
-
-
84898079018
-
Minimax policies for adversarial and stochastic bandits
-
Montreal: Omnipress
-
Audibert, J.-Y., & Bubeck, S. (2009). Minimax policies for adversarial and stochastic bandits. In Proceedings of COLT 2009. Montreal: Omnipress.
-
(2009)
Proceedings of COLT 2009
-
-
Audibert, J.-Y.1
Bubeck, S.2
-
4
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
DOI 10.1023/A:1013689704352, Computational Learning Theory
-
Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002a). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47, 235-256. (Pubitemid 34126111)
-
(2002)
Machine Learning
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
5
-
-
0037709910
-
The nonstochastic multiarmed bandit problem
-
Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2002b). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32, 48-77.
-
(2002)
SIAM Journal on Computing
, vol.32
, pp. 48-77
-
-
Auer, P.1
Cesa-Bianchi, N.2
Freund, Y.3
Schapire, R.E.4
-
7
-
-
0030159874
-
Optimal adaptive policies for sequential allocation problems
-
DOI 10.1006/aama.1996.0007
-
Burnetas, A. N., & Katehakis, M. N. (1996). Optimal adaptive policies for sequential allocation problems. Advances in Applied Mathematics, 17, 122-142. (Pubitemid 126160351)
-
(1996)
Advances in Applied Mathematics
, vol.17
, Issue.2
, pp. 122-142
-
-
Burnetas, A.N.1
Katehakis, M.N.2
-
9
-
-
84937398609
-
Pac bounds for multi-armed bandit and Markov decision processes
-
London: Springer
-
Even-Dar, E., Mannor, S., & Mansour, Y. (2002). Pac bounds for multi-armed bandit and Markov decision processes. In Proceedings of COLT 2002 (pp. 255-270). London: Springer.
-
(2002)
Proceedings of COLT 2002
, pp. 255-270
-
-
Even-Dar, E.1
Mannor, S.2
Mansour, Y.3
-
12
-
-
84898077171
-
An asymptotically optimal bandit algorithm for bounded support models
-
Haifa, Israel
-
Honda, J., & Takemura, A. (2010). An asymptotically optimal bandit algorithm for bounded support models. In Proceedings of COLT 2010, Haifa, Israel (pp. 67-79).
-
(2010)
Proceedings of COLT 2010
, pp. 67-79
-
-
Honda, J.1
Takemura, A.2
-
15
-
-
84898981061
-
Nearly tight bounds for the continuum-armed bandit problem
-
New York: MIT Press
-
Kleinberg, R. (2005). Nearly tight bounds for the continuum-armed bandit problem. In Proceedings of NIPS 2005 (pp. 697-704). New York: MIT Press.
-
(2005)
Proceedings of NIPS 2005
, pp. 697-704
-
-
Kleinberg, R.1
-
16
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6, 4-22.
-
(1985)
Advances in Applied Mathematics
, vol.6
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
17
-
-
0032679082
-
Exploration of multi-state environments: Local measures and backpropagation of uncertainty
-
Meuleau, N., & Bourgine, P. (1999). Exploration of multi-state environments: Local measures and backpropagation of uncertainty. Machine Learning, 35, 117-154.
-
(1999)
Machine Learning
, vol.35
, pp. 117-154
-
-
Meuleau, N.1
Bourgine, P.2
-
19
-
-
84966203785
-
Some aspects of the sequential design of experiments
-
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58, 527-535.
-
(1952)
Bulletin of the American Mathematical Society
, vol.58
, pp. 527-535
-
-
Robbins, H.1
-
20
-
-
14344258433
-
A Bayesian framework for reinforcement learning
-
San Francisco: Kaufmann
-
Strens, M. (2000). A Bayesian framework for reinforcement learning. In Proceedings of ICML 2000 (pp. 943-950). San Francisco: Kaufmann.
-
(2000)
Proceedings of ICML 2000
, pp. 943-950
-
-
Strens, M.1
-
21
-
-
33646406807
-
Multi-armed bandit algorithms and empirical evaluation
-
Porto, Portugal, Berlin: Springer
-
Vermorel, J., & Mohri, M. (2005). Multi-armed bandit algorithms and empirical evaluation. In Proceedings of ECML 2005, Porto, Portugal (pp. 437-448). Berlin: Springer.
-
(2005)
Proceedings of ECML 2005
, pp. 437-448
-
-
Vermorel, J.1
Mohri, M.2
-
22
-
-
0008954974
-
-
Doctoral dissertation, Department of Artificial Intelligence, University of Edinburgh
-
Wyatt, J. (1997). Exploration and inference in learning from reinforcement. Doctoral dissertation, Department of Artificial Intelligence, University of Edinburgh.
-
(1997)
Exploration and Inference in Learning From Reinforcement
-
-
Wyatt, J.1
|