SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 8183 LNAI, Issue , 2013, Pages 68-79

Meta-learning of exploration and exploitation parameters with replacing eligibility traces

(3) Tokic, Michel a,b Schwenker, Friedhelm a Palm, Günther a

a UNIVERSITY OF ULM (Germany)

b UNIVERSITY OF APPLIED SCIENCES RAVENSBURG WEINGARTEN (Germany)

Author keywords

[No Author keywords available]

Indexed keywords

INTELLIGENT AGENTS; REINFORCEMENT LEARNING;

AUTONOMOUS LEARNING; BASIC MECHANISM; DELAYED REWARDS; ELIGIBILITY TRACES; EXPLORATION AND EXPLOITATION; EXPLORATION CONTROLS; LEARNING PARAMETERS; LEARNING PERFORMANCE;

SUPERVISED LEARNING;

EID: 84890878755 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-642-40705-5_7 Document Type: Conference Paper

Times cited : (5)

References (27)

1
- 0004102479
- MIT Press, Cambridge
- Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

2
- 0003411271
- Technical Report CMU-CS-92-102, Carnegie Mellon University, Pittsburgh
- Thrun, S.B.: Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie Mellon University, Pittsburgh (1992)
- (1992) Efficient Exploration in Reinforcement Learning
- Thrun, S.B.¹

3
- 0041966002
- Using confidence bounds for exploitation-exploration trade-offs
- Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397-422 (2002)
- (2002) J. Mach. Learn. Res. , vol.3 , pp. 397-422
- Auer, P.¹

4
- 33750293964
- Bandit based monte-carlo planning
- Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. Springer, Heidelberg
- Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282-293. Springer, Heidelberg (2006)
- (2006) LNCS (LNAI) , vol.4212 , pp. 282-293
- Kocsis, L.¹ Szepesvári, C.²

5
- 0037258402
- Meta-learning in reinforcement learning
- Schweighofer, N., Doya, K.: Meta-learning in reinforcement learning. Neural Netw. 16(1), 5-9 (2003)
- (2003) Neural Netw. , vol.16 , Issue.1 , pp. 5-9
- Schweighofer, N.¹ Doya, K.²

6
- 76649092973
- A meta-learning method based on temporal difference error
- Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009, Part I. Springer, Heidelberg
- Kobayashi, K., Mizoue, H., Kuremoto, T., Obayashi, M.: A meta-learning method based on temporal difference error. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009, Part I. LNCS, vol. 5863, pp. 530-537. Springer, Heidelberg (2009)
- (2009) LNCS , vol.5863 , pp. 530-537
- Kobayashi, K.¹ Mizoue, H.² Kuremoto, T.³ Obayashi, M.⁴

7
- 80054004135
- Value-difference based exploration: Adaptive control between epsilon-greedy and softmax
- Bach, J., Edelkamp, S. (eds.) KI 2011. Springer, Heidelberg
- Tokic, M., Palm, G.: Value-difference based exploration: Adaptive control between epsilon-greedy and softmax. In: Bach, J., Edelkamp, S. (eds.) KI 2011. LNCS, vol. 7006, pp. 335-346. Springer, Heidelberg (2011)
- (2011) LNCS , vol.7006 , pp. 335-346
- Tokic, M.¹ Palm, G.²

8
- 84867036077
- Robust exploration/ exploitation trade-offs in safety-critical applications
- IFAC
- Tokic, M., Ertle, P., Palm, G., Söffker, D., Voos, H.: Robust exploration/ exploitation trade-offs in safety-critical applications. In: Proceedings of the 8th International Symposium on Fault Detection, Supervision and Safety of Technical Processes, Mexico City, Mexico, IFAC, pp. 660-665 (2012)
- (2012) Proceedings of the 8th International Symposium on Fault Detection, Supervision and Safety of Technical Processes, Mexico City, Mexico , pp. 660-665
- Tokic, M.¹ Ertle, P.² Palm, G.³ Söffker, D.⁴ Voos, H.⁵

9
- 84867688326
- Adaptive exploration using stochastic neurons
- Villa, A.E., Duch, W., Érdi, P., Palm, G. (eds.) ICANN 2012, Part II. Springer, Heidelberg
- Tokic, M., Palm, G.: Adaptive exploration using stochastic neurons. In: Villa, A.E., Duch, W., Érdi, P., Palm, G. (eds.) ICANN 2012, Part II. LNCS, vol. 7553, pp. 42-49. Springer, Heidelberg (2012)
- (2012) LNCS , vol.7553 , pp. 42-49
- Tokic, M.¹ Palm, G.²

10
- 84867632816
- Gradient algorithms for exploration/exploitation trade-offs: Global and local variants
- Mana, N., Schwenker, F., Trentin, E. (eds.) ANNPR 2012. Springer, Heidelberg
- Tokic, M., Palm, G.: Gradient algorithms for exploration/exploitation trade-offs: Global and local variants. In: Mana, N., Schwenker, F., Trentin, E. (eds.) ANNPR 2012. LNCS, vol. 7477, pp. 60-71. Springer, Heidelberg (2012)
- (2012) LNCS , vol.7477 , pp. 60-71
- Tokic, M.¹ Palm, G.²

11
- 0029753630
- Reinforcement learning with replacing eligibility traces
- Singh, S., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22, 123-158 (1996)
- (1996) Mach. Learn. , vol.22 , pp. 123-158
- Singh, S.¹ Sutton, R.S.²

12
- 33747628123
- Choice values
- Niv, Y., Daw, N.D., Dayan, P.: Choice values. Nat. Neurosci. 9(8), 987-988 (2006)
- (2006) Nat. Neurosci. , vol.9 , Issue.8 , pp. 987-988
- Niv, Y.¹ Daw, N.D.² Dayan, P.³

13
- 67349283062
- Reinforcement learning in the brain
- Niv, Y.: Reinforcement learning in the brain. J. Math. Psychol. 53(3), 139-154 (2009)
- (2009) J. Math. Psychol. , vol.53 , Issue.3 , pp. 139-154
- Niv, Y.¹

14
- 0004049893
- Ph.D. thesis, University of Cambridge, England
- Watkins, C.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge, England (1989)
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

15
- 33748998787
- Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
- George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Mach. Learn. 65(1), 167-198 (2006)
- (2006) Mach. Learn. , vol.65 , Issue.1 , pp. 167-198
- George, A.P.¹ Powell, W.B.²

16
- 33646406807
- Multi-armed bandit algorithms and empirical evaluation
- Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. Springer, Heidelberg
- Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437-448. Springer, Heidelberg (2005)
- (2005) LNCS (LNAI) , vol.3720 , pp. 437-448
- Vermorel, J.¹ Mohri, M.²

17
- 0345161977
- PhD thesis, University of Amsterdam, Amsterdam
- Wiering, M.: Explorations in efficient reinforcement learning. PhD thesis, University of Amsterdam, Amsterdam (1999)
- (1999) Explorations in Efficient Reinforcement Learning
- Wiering, M.¹

18
- 78349245906
- Adaptive ε-greedy exploration in reinforcement learning based on value differences
- Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds.) KI 2010. Springer, Heidelberg
- Tokic, M.: Adaptive ε-greedy exploration in reinforcement learning based on value differences. In: Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds.) KI 2010. LNCS, vol. 6359, pp. 203-210. Springer, Heidelberg (2010)
- (2010) LNCS , vol.6359 , pp. 203-210
- Tokic, M.¹

19
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229-256 (1992)
- (1992) Mach. Learn. , vol.8 , pp. 229-256
- Williams, R.J.¹

20
- 84890890418
- Teaching reinforcement learning using a physical robot
- Tokic, M., Bou Ammar, H.: Teaching reinforcement learning using a physical robot. In: Proceedings of the Workshop on Teaching Machine Learning at the 29th International Conference on Machine Learning, Edinburgh, UK, pp. 1-4 (2012)
- (2012) Proceedings of the Workshop on Teaching Machine Learning at the 29th International Conference on Machine Learning, Edinburgh, UK , pp. 1-4
- Tokic, M.¹ Bou Ammar, H.²

21
- 0001251942
- Reinforcement learning in POMDPs with function approximation
- Morgan Kaufmann Publishers Inc.
- Kimura, H., Miyazaki, K., Kobayashi, S.: Reinforcement learning in POMDPs with function approximation. In: Proceedings of the 14th International Conference on Machine Learning, San Francisco, CA, USA, pp. 152-160. Morgan Kaufmann Publishers Inc. (1997)
- (1997) Proceedings of the 14th International Conference on Machine Learning, San Francisco, CA, USA , pp. 152-160
- Kimura, H.¹ Miyazaki, K.² Kobayashi, S.³

22
- 33646398129
- Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method
- Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. Springer, Heidelberg
- Riedmiller, M.: Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317-328. Springer, Heidelberg (2005)
- (2005) LNCS (LNAI) , vol.3720 , pp. 317-328
- Riedmiller, M.¹

23
- 49349084771
- Learning to drive a real car in 20 minutes
- Riedmiller, M., Montemerlo, M., Dahlkamp, H.: Learning to drive a real car in 20 minutes. In: Proceedings of the FBIT 2007 Conference, Jeju, Korea. Special Track on, autonomous robots (2007)
- Proceedings of the FBIT 2007 Conference, Jeju, Korea. Special Track on, Autonomous Robots (2007)
- Riedmiller, M.¹ Montemerlo, M.² Dahlkamp, H.³

24
- 78149474967
- Learning a strategy with neural approximated temporal-difference methods in english draughts
- IEEE Computer Society
- Faußer, S., Schwenker, F.: Learning a strategy with neural approximated temporal-difference methods in english draughts. In: Proceedings of the 20th International Conference on Pattern Recognition, pp. 2925-2928. IEEE Computer Society (2010)
- (2010) Proceedings of the 20th International Conference on Pattern Recognition , pp. 2925-2928
- Faußer, S.¹ Schwenker, F.²

25
- 54249110923
- Neural approximation of monte carlo policy evaluation deployed in connect four
- Prevost, L., Marinai, S., Schwenker, F. (eds.) ANNPR 2008. Springer, Heidelberg
- Faußer, S., Schwenker, F.: Neural approximation of monte carlo policy evaluation deployed in connect four. In: Prevost, L., Marinai, S., Schwenker, F. (eds.) ANNPR 2008. LNCS (LNAI), vol. 5064, pp. 90-100. Springer, Heidelberg (2008)
- (2008) LNCS (LNAI) , vol.5064 , pp. 90-100
- Faußer, S.¹ Schwenker, F.²

26
- 0033213819
- What are the computations of the cerebellum, the basal ganglia and the cerebral cortex?
- Doya, K.: What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw. 12(7-8), 961-974 (1999)
- (1999) Neural Netw. , vol.12 , Issue.7-8 , pp. 961-974
- Doya, K.¹

27
- 77952378014
- The basal ganglia communicate with the cerebellum
- Bostan, A.C., Dum, R.P., Strick, P.L.: The basal ganglia communicate with the cerebellum. Proc. Nat. Acad. Sci. 107(18), 8452-8456 (2010)
- (2010) Proc. Nat. Acad. Sci. , vol.107 , Issue.18 , pp. 8452-8456
- Bostan, A.C.¹ Dum, R.P.² Strick, P.L.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.