메뉴 건너뛰기




Volumn 8183 LNAI, Issue , 2013, Pages 68-79

Meta-learning of exploration and exploitation parameters with replacing eligibility traces

Author keywords

[No Author keywords available]

Indexed keywords

INTELLIGENT AGENTS; REINFORCEMENT LEARNING;

EID: 84890878755     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-642-40705-5_7     Document Type: Conference Paper
Times cited : (5)

References (27)
  • 3
    • 0041966002 scopus 로고    scopus 로고
    • Using confidence bounds for exploitation-exploration trade-offs
    • Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397-422 (2002)
    • (2002) J. Mach. Learn. Res. , vol.3 , pp. 397-422
    • Auer, P.1
  • 4
    • 33750293964 scopus 로고    scopus 로고
    • Bandit based monte-carlo planning
    • Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. Springer, Heidelberg
    • Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282-293. Springer, Heidelberg (2006)
    • (2006) LNCS (LNAI) , vol.4212 , pp. 282-293
    • Kocsis, L.1    Szepesvári, C.2
  • 5
    • 0037258402 scopus 로고    scopus 로고
    • Meta-learning in reinforcement learning
    • Schweighofer, N., Doya, K.: Meta-learning in reinforcement learning. Neural Netw. 16(1), 5-9 (2003)
    • (2003) Neural Netw. , vol.16 , Issue.1 , pp. 5-9
    • Schweighofer, N.1    Doya, K.2
  • 6
    • 76649092973 scopus 로고    scopus 로고
    • A meta-learning method based on temporal difference error
    • Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009, Part I. Springer, Heidelberg
    • Kobayashi, K., Mizoue, H., Kuremoto, T., Obayashi, M.: A meta-learning method based on temporal difference error. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009, Part I. LNCS, vol. 5863, pp. 530-537. Springer, Heidelberg (2009)
    • (2009) LNCS , vol.5863 , pp. 530-537
    • Kobayashi, K.1    Mizoue, H.2    Kuremoto, T.3    Obayashi, M.4
  • 7
    • 80054004135 scopus 로고    scopus 로고
    • Value-difference based exploration: Adaptive control between epsilon-greedy and softmax
    • Bach, J., Edelkamp, S. (eds.) KI 2011. Springer, Heidelberg
    • Tokic, M., Palm, G.: Value-difference based exploration: Adaptive control between epsilon-greedy and softmax. In: Bach, J., Edelkamp, S. (eds.) KI 2011. LNCS, vol. 7006, pp. 335-346. Springer, Heidelberg (2011)
    • (2011) LNCS , vol.7006 , pp. 335-346
    • Tokic, M.1    Palm, G.2
  • 9
    • 84867688326 scopus 로고    scopus 로고
    • Adaptive exploration using stochastic neurons
    • Villa, A.E., Duch, W., Érdi, P., Palm, G. (eds.) ICANN 2012, Part II. Springer, Heidelberg
    • Tokic, M., Palm, G.: Adaptive exploration using stochastic neurons. In: Villa, A.E., Duch, W., Érdi, P., Palm, G. (eds.) ICANN 2012, Part II. LNCS, vol. 7553, pp. 42-49. Springer, Heidelberg (2012)
    • (2012) LNCS , vol.7553 , pp. 42-49
    • Tokic, M.1    Palm, G.2
  • 10
    • 84867632816 scopus 로고    scopus 로고
    • Gradient algorithms for exploration/exploitation trade-offs: Global and local variants
    • Mana, N., Schwenker, F., Trentin, E. (eds.) ANNPR 2012. Springer, Heidelberg
    • Tokic, M., Palm, G.: Gradient algorithms for exploration/exploitation trade-offs: Global and local variants. In: Mana, N., Schwenker, F., Trentin, E. (eds.) ANNPR 2012. LNCS, vol. 7477, pp. 60-71. Springer, Heidelberg (2012)
    • (2012) LNCS , vol.7477 , pp. 60-71
    • Tokic, M.1    Palm, G.2
  • 11
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • Singh, S., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22, 123-158 (1996)
    • (1996) Mach. Learn. , vol.22 , pp. 123-158
    • Singh, S.1    Sutton, R.S.2
  • 13
    • 67349283062 scopus 로고    scopus 로고
    • Reinforcement learning in the brain
    • Niv, Y.: Reinforcement learning in the brain. J. Math. Psychol. 53(3), 139-154 (2009)
    • (2009) J. Math. Psychol. , vol.53 , Issue.3 , pp. 139-154
    • Niv, Y.1
  • 14
  • 15
    • 33748998787 scopus 로고    scopus 로고
    • Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
    • George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Mach. Learn. 65(1), 167-198 (2006)
    • (2006) Mach. Learn. , vol.65 , Issue.1 , pp. 167-198
    • George, A.P.1    Powell, W.B.2
  • 16
    • 33646406807 scopus 로고    scopus 로고
    • Multi-armed bandit algorithms and empirical evaluation
    • Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. Springer, Heidelberg
    • Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437-448. Springer, Heidelberg (2005)
    • (2005) LNCS (LNAI) , vol.3720 , pp. 437-448
    • Vermorel, J.1    Mohri, M.2
  • 18
    • 78349245906 scopus 로고    scopus 로고
    • Adaptive ε-greedy exploration in reinforcement learning based on value differences
    • Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds.) KI 2010. Springer, Heidelberg
    • Tokic, M.: Adaptive ε-greedy exploration in reinforcement learning based on value differences. In: Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds.) KI 2010. LNCS, vol. 6359, pp. 203-210. Springer, Heidelberg (2010)
    • (2010) LNCS , vol.6359 , pp. 203-210
    • Tokic, M.1
  • 19
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229-256 (1992)
    • (1992) Mach. Learn. , vol.8 , pp. 229-256
    • Williams, R.J.1
  • 22
    • 33646398129 scopus 로고    scopus 로고
    • Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method
    • Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. Springer, Heidelberg
    • Riedmiller, M.: Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317-328. Springer, Heidelberg (2005)
    • (2005) LNCS (LNAI) , vol.3720 , pp. 317-328
    • Riedmiller, M.1
  • 24
    • 78149474967 scopus 로고    scopus 로고
    • Learning a strategy with neural approximated temporal-difference methods in english draughts
    • IEEE Computer Society
    • Faußer, S., Schwenker, F.: Learning a strategy with neural approximated temporal-difference methods in english draughts. In: Proceedings of the 20th International Conference on Pattern Recognition, pp. 2925-2928. IEEE Computer Society (2010)
    • (2010) Proceedings of the 20th International Conference on Pattern Recognition , pp. 2925-2928
    • Faußer, S.1    Schwenker, F.2
  • 25
    • 54249110923 scopus 로고    scopus 로고
    • Neural approximation of monte carlo policy evaluation deployed in connect four
    • Prevost, L., Marinai, S., Schwenker, F. (eds.) ANNPR 2008. Springer, Heidelberg
    • Faußer, S., Schwenker, F.: Neural approximation of monte carlo policy evaluation deployed in connect four. In: Prevost, L., Marinai, S., Schwenker, F. (eds.) ANNPR 2008. LNCS (LNAI), vol. 5064, pp. 90-100. Springer, Heidelberg (2008)
    • (2008) LNCS (LNAI) , vol.5064 , pp. 90-100
    • Faußer, S.1    Schwenker, F.2
  • 26
    • 0033213819 scopus 로고    scopus 로고
    • What are the computations of the cerebellum, the basal ganglia and the cerebral cortex?
    • Doya, K.: What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw. 12(7-8), 961-974 (1999)
    • (1999) Neural Netw. , vol.12 , Issue.7-8 , pp. 961-974
    • Doya, K.1
  • 27
    • 77952378014 scopus 로고    scopus 로고
    • The basal ganglia communicate with the cerebellum
    • Bostan, A.C., Dum, R.P., Strick, P.L.: The basal ganglia communicate with the cerebellum. Proc. Nat. Acad. Sci. 107(18), 8452-8456 (2010)
    • (2010) Proc. Nat. Acad. Sci. , vol.107 , Issue.18 , pp. 8452-8456
    • Bostan, A.C.1    Dum, R.P.2    Strick, P.L.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.