-
3
-
-
0041966002
-
Using confidence bounds for exploitation-exploration trade-offs
-
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397-422 (2002)
-
(2002)
J. Mach. Learn. Res.
, vol.3
, pp. 397-422
-
-
Auer, P.1
-
4
-
-
33750293964
-
Bandit based monte-carlo planning
-
Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. Springer, Heidelberg
-
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282-293. Springer, Heidelberg (2006)
-
(2006)
LNCS (LNAI)
, vol.4212
, pp. 282-293
-
-
Kocsis, L.1
Szepesvári, C.2
-
5
-
-
0037258402
-
Meta-learning in reinforcement learning
-
Schweighofer, N., Doya, K.: Meta-learning in reinforcement learning. Neural Netw. 16(1), 5-9 (2003)
-
(2003)
Neural Netw.
, vol.16
, Issue.1
, pp. 5-9
-
-
Schweighofer, N.1
Doya, K.2
-
6
-
-
76649092973
-
A meta-learning method based on temporal difference error
-
Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009, Part I. Springer, Heidelberg
-
Kobayashi, K., Mizoue, H., Kuremoto, T., Obayashi, M.: A meta-learning method based on temporal difference error. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009, Part I. LNCS, vol. 5863, pp. 530-537. Springer, Heidelberg (2009)
-
(2009)
LNCS
, vol.5863
, pp. 530-537
-
-
Kobayashi, K.1
Mizoue, H.2
Kuremoto, T.3
Obayashi, M.4
-
7
-
-
80054004135
-
Value-difference based exploration: Adaptive control between epsilon-greedy and softmax
-
Bach, J., Edelkamp, S. (eds.) KI 2011. Springer, Heidelberg
-
Tokic, M., Palm, G.: Value-difference based exploration: Adaptive control between epsilon-greedy and softmax. In: Bach, J., Edelkamp, S. (eds.) KI 2011. LNCS, vol. 7006, pp. 335-346. Springer, Heidelberg (2011)
-
(2011)
LNCS
, vol.7006
, pp. 335-346
-
-
Tokic, M.1
Palm, G.2
-
8
-
-
84867036077
-
Robust exploration/ exploitation trade-offs in safety-critical applications
-
IFAC
-
Tokic, M., Ertle, P., Palm, G., Söffker, D., Voos, H.: Robust exploration/ exploitation trade-offs in safety-critical applications. In: Proceedings of the 8th International Symposium on Fault Detection, Supervision and Safety of Technical Processes, Mexico City, Mexico, IFAC, pp. 660-665 (2012)
-
(2012)
Proceedings of the 8th International Symposium on Fault Detection, Supervision and Safety of Technical Processes, Mexico City, Mexico
, pp. 660-665
-
-
Tokic, M.1
Ertle, P.2
Palm, G.3
Söffker, D.4
Voos, H.5
-
9
-
-
84867688326
-
Adaptive exploration using stochastic neurons
-
Villa, A.E., Duch, W., Érdi, P., Palm, G. (eds.) ICANN 2012, Part II. Springer, Heidelberg
-
Tokic, M., Palm, G.: Adaptive exploration using stochastic neurons. In: Villa, A.E., Duch, W., Érdi, P., Palm, G. (eds.) ICANN 2012, Part II. LNCS, vol. 7553, pp. 42-49. Springer, Heidelberg (2012)
-
(2012)
LNCS
, vol.7553
, pp. 42-49
-
-
Tokic, M.1
Palm, G.2
-
10
-
-
84867632816
-
Gradient algorithms for exploration/exploitation trade-offs: Global and local variants
-
Mana, N., Schwenker, F., Trentin, E. (eds.) ANNPR 2012. Springer, Heidelberg
-
Tokic, M., Palm, G.: Gradient algorithms for exploration/exploitation trade-offs: Global and local variants. In: Mana, N., Schwenker, F., Trentin, E. (eds.) ANNPR 2012. LNCS, vol. 7477, pp. 60-71. Springer, Heidelberg (2012)
-
(2012)
LNCS
, vol.7477
, pp. 60-71
-
-
Tokic, M.1
Palm, G.2
-
11
-
-
0029753630
-
Reinforcement learning with replacing eligibility traces
-
Singh, S., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22, 123-158 (1996)
-
(1996)
Mach. Learn.
, vol.22
, pp. 123-158
-
-
Singh, S.1
Sutton, R.S.2
-
12
-
-
33747628123
-
Choice values
-
Niv, Y., Daw, N.D., Dayan, P.: Choice values. Nat. Neurosci. 9(8), 987-988 (2006)
-
(2006)
Nat. Neurosci.
, vol.9
, Issue.8
, pp. 987-988
-
-
Niv, Y.1
Daw, N.D.2
Dayan, P.3
-
13
-
-
67349283062
-
Reinforcement learning in the brain
-
Niv, Y.: Reinforcement learning in the brain. J. Math. Psychol. 53(3), 139-154 (2009)
-
(2009)
J. Math. Psychol.
, vol.53
, Issue.3
, pp. 139-154
-
-
Niv, Y.1
-
14
-
-
0004049893
-
-
Ph.D. thesis, University of Cambridge, England
-
Watkins, C.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge, England (1989)
-
(1989)
Learning from Delayed Rewards
-
-
Watkins, C.1
-
15
-
-
33748998787
-
Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
-
George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Mach. Learn. 65(1), 167-198 (2006)
-
(2006)
Mach. Learn.
, vol.65
, Issue.1
, pp. 167-198
-
-
George, A.P.1
Powell, W.B.2
-
16
-
-
33646406807
-
Multi-armed bandit algorithms and empirical evaluation
-
Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. Springer, Heidelberg
-
Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437-448. Springer, Heidelberg (2005)
-
(2005)
LNCS (LNAI)
, vol.3720
, pp. 437-448
-
-
Vermorel, J.1
Mohri, M.2
-
18
-
-
78349245906
-
Adaptive ε-greedy exploration in reinforcement learning based on value differences
-
Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds.) KI 2010. Springer, Heidelberg
-
Tokic, M.: Adaptive ε-greedy exploration in reinforcement learning based on value differences. In: Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds.) KI 2010. LNCS, vol. 6359, pp. 203-210. Springer, Heidelberg (2010)
-
(2010)
LNCS
, vol.6359
, pp. 203-210
-
-
Tokic, M.1
-
19
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229-256 (1992)
-
(1992)
Mach. Learn.
, vol.8
, pp. 229-256
-
-
Williams, R.J.1
-
20
-
-
84890890418
-
Teaching reinforcement learning using a physical robot
-
Tokic, M., Bou Ammar, H.: Teaching reinforcement learning using a physical robot. In: Proceedings of the Workshop on Teaching Machine Learning at the 29th International Conference on Machine Learning, Edinburgh, UK, pp. 1-4 (2012)
-
(2012)
Proceedings of the Workshop on Teaching Machine Learning at the 29th International Conference on Machine Learning, Edinburgh, UK
, pp. 1-4
-
-
Tokic, M.1
Bou Ammar, H.2
-
21
-
-
0001251942
-
Reinforcement learning in POMDPs with function approximation
-
Morgan Kaufmann Publishers Inc.
-
Kimura, H., Miyazaki, K., Kobayashi, S.: Reinforcement learning in POMDPs with function approximation. In: Proceedings of the 14th International Conference on Machine Learning, San Francisco, CA, USA, pp. 152-160. Morgan Kaufmann Publishers Inc. (1997)
-
(1997)
Proceedings of the 14th International Conference on Machine Learning, San Francisco, CA, USA
, pp. 152-160
-
-
Kimura, H.1
Miyazaki, K.2
Kobayashi, S.3
-
22
-
-
33646398129
-
Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method
-
Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. Springer, Heidelberg
-
Riedmiller, M.: Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317-328. Springer, Heidelberg (2005)
-
(2005)
LNCS (LNAI)
, vol.3720
, pp. 317-328
-
-
Riedmiller, M.1
-
23
-
-
49349084771
-
Learning to drive a real car in 20 minutes
-
Riedmiller, M., Montemerlo, M., Dahlkamp, H.: Learning to drive a real car in 20 minutes. In: Proceedings of the FBIT 2007 Conference, Jeju, Korea. Special Track on, autonomous robots (2007)
-
Proceedings of the FBIT 2007 Conference, Jeju, Korea. Special Track on, Autonomous Robots (2007)
-
-
Riedmiller, M.1
Montemerlo, M.2
Dahlkamp, H.3
-
24
-
-
78149474967
-
Learning a strategy with neural approximated temporal-difference methods in english draughts
-
IEEE Computer Society
-
Faußer, S., Schwenker, F.: Learning a strategy with neural approximated temporal-difference methods in english draughts. In: Proceedings of the 20th International Conference on Pattern Recognition, pp. 2925-2928. IEEE Computer Society (2010)
-
(2010)
Proceedings of the 20th International Conference on Pattern Recognition
, pp. 2925-2928
-
-
Faußer, S.1
Schwenker, F.2
-
25
-
-
54249110923
-
Neural approximation of monte carlo policy evaluation deployed in connect four
-
Prevost, L., Marinai, S., Schwenker, F. (eds.) ANNPR 2008. Springer, Heidelberg
-
Faußer, S., Schwenker, F.: Neural approximation of monte carlo policy evaluation deployed in connect four. In: Prevost, L., Marinai, S., Schwenker, F. (eds.) ANNPR 2008. LNCS (LNAI), vol. 5064, pp. 90-100. Springer, Heidelberg (2008)
-
(2008)
LNCS (LNAI)
, vol.5064
, pp. 90-100
-
-
Faußer, S.1
Schwenker, F.2
-
26
-
-
0033213819
-
What are the computations of the cerebellum, the basal ganglia and the cerebral cortex?
-
Doya, K.: What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw. 12(7-8), 961-974 (1999)
-
(1999)
Neural Netw.
, vol.12
, Issue.7-8
, pp. 961-974
-
-
Doya, K.1
-
27
-
-
77952378014
-
The basal ganglia communicate with the cerebellum
-
Bostan, A.C., Dum, R.P., Strick, P.L.: The basal ganglia communicate with the cerebellum. Proc. Nat. Acad. Sci. 107(18), 8452-8456 (2010)
-
(2010)
Proc. Nat. Acad. Sci.
, vol.107
, Issue.18
, pp. 8452-8456
-
-
Bostan, A.C.1
Dum, R.P.2
Strick, P.L.3
|