-
2
-
-
0004049893
-
-
PhD thesis, University of Cambridge, Cambridge, England
-
Watkins, C.: Learning from Delayed Rewards. PhD thesis, University of Cambridge, Cambridge, England (1989)
-
(1989)
Learning from Delayed Rewards
-
-
Watkins, C.1
-
3
-
-
0003411271
-
-
Technical Report CMU-CS-92-102, Carnegie Mellon University, Pittsburgh, PA, USA
-
Thrun, S.B.: Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie Mellon University, Pittsburgh, PA, USA (1992)
-
(1992)
Efficient Exploration in Reinforcement Learning
-
-
Thrun, S.B.1
-
5
-
-
0041966002
-
Using confidence bounds for exploitation-exploration trade-offs
-
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3, 397-422 (2002)
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 397-422
-
-
Auer, P.1
-
6
-
-
33646406807
-
Multi-armed bandit algorithms and empirical evaluation
-
Gama, J., Camacho, R., Brazdil, P., Jorge, A., Torgo, L. (eds.) ECML 2005. Springer, Heidelberg
-
Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P., Jorge, A., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437-448. Springer, Heidelberg (2005)
-
(2005)
LNCS (LNAI)
, vol.3720
, pp. 437-448
-
-
Vermorel, J.1
Mohri, M.2
-
7
-
-
78349266245
-
Interview with Richard S. Sutton
-
Heidrich-Meisner, V.: Interview with Richard S. Sutton. In: Künstliche Intelligenz, vol. 3, pp. 41-43 (2009)
-
(2009)
Künstliche Intelligenz
, vol.3
, pp. 41-43
-
-
Heidrich-Meisner, V.1
-
8
-
-
78349245906
-
Adaptive ε-greedy exploration in reinforcement learning based on value differences
-
Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds.) KI 2010. Springer, Heidelberg
-
Tokic, M.: Adaptive ε-greedy exploration in reinforcement learning based on value differences. In: Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds.) KI 2010. LNCS, vol. 6359, pp. 203-210. Springer, Heidelberg (2010)
-
(2010)
LNCS
, vol.6359
, pp. 203-210
-
-
Tokic, M.1
-
9
-
-
84966203785
-
Some aspects of the sequential design of experiments
-
Robbins, H.: Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 58, 527-535 (1952)
-
(1952)
Bulletin of the American Mathematical Society
, vol.58
, pp. 527-535
-
-
Robbins, H.1
-
12
-
-
33748998787
-
Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
-
George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learning 65(1), 167-198 (2006)
-
(2006)
Machine Learning
, vol.65
, Issue.1
, pp. 167-198
-
-
George, A.P.1
Powell, W.B.2
-
13
-
-
34249833101
-
Technical note: Q-learning
-
Watkins, C., Dayan, P.: Technical note: Q-learning. Machine Learning 8(3), 279-292 (1992)
-
(1992)
Machine Learning
, vol.8
, Issue.3
, pp. 279-292
-
-
Watkins, C.1
Dayan, P.2
-
14
-
-
33745223257
-
Cortical substrates for exploratory decisions in humans
-
Daw, N.D., O'Doherty, J.P., Dayan, P., Seymour, B., Dolan, R.J.: Cortical substrates for exploratory decisions in humans. Nature 441, 876-879 (2006)
-
(2006)
Nature
, vol.441
, pp. 876-879
-
-
Daw, N.D.1
O'Doherty, J.P.2
Dayan, P.3
Seymour, B.4
Dolan, R.J.5
|