-
2
-
-
0004049893
-
-
PhD thesis, University of Cambridge, Cambridge, England
-
Watkins, C.: Learning from Delayed Rewards. PhD thesis, University of Cambridge, Cambridge, England (1989)
-
(1989)
Learning from Delayed Rewards
-
-
Watkins, C.1
-
4
-
-
0041965975
-
R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
-
Brafman, R.I., Tennenholtz, M.: R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213-231 (2002)
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
5
-
-
0036592028
-
Control of exploitation-exploration metaparameter in reinforcement learning
-
Ishii, S., Yoshida, W., Yoshimoto, J.: Control of exploitation- exploration metaparameter in reinforcement learning. Neural Networks 15(4-6), 665-687 (2002)
-
(2002)
Neural Networks
, vol.15
, Issue.4-6
, pp. 665-687
-
-
Ishii, S.1
Yoshida, W.2
Yoshimoto, J.3
-
7
-
-
33646406807
-
Multi-armed bandit algorithms and empirical evaluation
-
Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) LNCS (LNAI) Springer, Heidelberg
-
Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437-448. Springer, Heidelberg (2005)
-
(2005)
ECML 2005
, vol.3720
, pp. 437-448
-
-
Vermorel, J.1
Mohri, M.2
-
8
-
-
58349084664
-
Improving the exploration strategy in bandit algorithms
-
Maniezzo, V., Battiti, R., Watson, J.-P. (eds.) II. LNCS Springer, Heidelberg
-
Caelen, O., Bontempi, G.: Improving the exploration strategy in bandit algorithms. In: Maniezzo, V., Battiti, R., Watson, J.-P. (eds.) LION 2007 II. LNCS, vol. 5313, pp. 56-68. Springer, Heidelberg (2008)
-
(2008)
LION 2007
, vol.5313
, pp. 56-68
-
-
Caelen, O.1
Bontempi, G.2
-
11
-
-
33748998787
-
Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
-
George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learning 65(1), 167-198 (2006)
-
(2006)
Machine Learning
, vol.65
, Issue.1
, pp. 167-198
-
-
George, A.P.1
Powell, W.B.2
-
12
-
-
84966203785
-
Some aspects of the sequential design of experiments
-
Robbins, H.: Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 58, 527-535 (1952)
-
(1952)
Bulletin of the American Mathematical Society
, vol.58
, pp. 527-535
-
-
Robbins, H.1
-
13
-
-
4544345025
-
Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches
-
Chicago, IL, USA ACM, New York
-
Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, pp. 45-53. ACM, New York (2004)
-
(2004)
Proceedings of the 36th Annual ACM Symposium on Theory of Computing
, pp. 45-53
-
-
Awerbuch, B.1
Kleinberg, R.D.2
-
14
-
-
4243096065
-
Exploitation vs. exploration: Choosing a supplier in an environment of incomplete information
-
Azoulay-Schwartz, R., Kraus, S., Wilkenfeld, J.: Exploitation vs. exploration: Choosing a supplier in an environment of incomplete information. Decision Support Systems 38(1), 1-18 (2004)
-
(2004)
Decision Support Systems
, vol.38
, Issue.1
, pp. 1-18
-
-
Azoulay-Schwartz, R.1
Kraus, S.2
Wilkenfeld, J.3
|