메뉴 건너뛰기




Volumn 6359 LNAI, Issue , 2010, Pages 203-210

Adaptive ε-greedy exploration in reinforcement learning based on value differences

Author keywords

[No Author keywords available]

Indexed keywords

AD HOC APPROACH; COMMONLY USED; EXPLORATION/EXPLOITATION DILEMMAS; GREEDY EXPLORATION; MULTI ARMED BANDIT; TEMPORAL DIFFERENCE ERRORS; VALUE FUNCTIONS;

EID: 78349245906     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-642-16111-7_23     Document Type: Conference Paper
Times cited : (269)

References (14)
  • 2
    • 0004049893 scopus 로고
    • PhD thesis, University of Cambridge, Cambridge, England
    • Watkins, C.: Learning from Delayed Rewards. PhD thesis, University of Cambridge, Cambridge, England (1989)
    • (1989) Learning from Delayed Rewards
    • Watkins, C.1
  • 4
    • 0041965975 scopus 로고    scopus 로고
    • R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
    • Brafman, R.I., Tennenholtz, M.: R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213-231 (2002)
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 5
    • 0036592028 scopus 로고    scopus 로고
    • Control of exploitation-exploration metaparameter in reinforcement learning
    • Ishii, S., Yoshida, W., Yoshimoto, J.: Control of exploitation- exploration metaparameter in reinforcement learning. Neural Networks 15(4-6), 665-687 (2002)
    • (2002) Neural Networks , vol.15 , Issue.4-6 , pp. 665-687
    • Ishii, S.1    Yoshida, W.2    Yoshimoto, J.3
  • 7
    • 33646406807 scopus 로고    scopus 로고
    • Multi-armed bandit algorithms and empirical evaluation
    • Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) LNCS (LNAI) Springer, Heidelberg
    • Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437-448. Springer, Heidelberg (2005)
    • (2005) ECML 2005 , vol.3720 , pp. 437-448
    • Vermorel, J.1    Mohri, M.2
  • 8
    • 58349084664 scopus 로고    scopus 로고
    • Improving the exploration strategy in bandit algorithms
    • Maniezzo, V., Battiti, R., Watson, J.-P. (eds.) II. LNCS Springer, Heidelberg
    • Caelen, O., Bontempi, G.: Improving the exploration strategy in bandit algorithms. In: Maniezzo, V., Battiti, R., Watson, J.-P. (eds.) LION 2007 II. LNCS, vol. 5313, pp. 56-68. Springer, Heidelberg (2008)
    • (2008) LION 2007 , vol.5313 , pp. 56-68
    • Caelen, O.1    Bontempi, G.2
  • 11
    • 33748998787 scopus 로고    scopus 로고
    • Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
    • George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learning 65(1), 167-198 (2006)
    • (2006) Machine Learning , vol.65 , Issue.1 , pp. 167-198
    • George, A.P.1    Powell, W.B.2
  • 12
  • 13
    • 4544345025 scopus 로고    scopus 로고
    • Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches
    • Chicago, IL, USA ACM, New York
    • Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, pp. 45-53. ACM, New York (2004)
    • (2004) Proceedings of the 36th Annual ACM Symposium on Theory of Computing , pp. 45-53
    • Awerbuch, B.1    Kleinberg, R.D.2
  • 14
    • 4243096065 scopus 로고    scopus 로고
    • Exploitation vs. exploration: Choosing a supplier in an environment of incomplete information
    • Azoulay-Schwartz, R., Kraus, S., Wilkenfeld, J.: Exploitation vs. exploration: Choosing a supplier in an environment of incomplete information. Decision Support Systems 38(1), 1-18 (2004)
    • (2004) Decision Support Systems , vol.38 , Issue.1 , pp. 1-18
    • Azoulay-Schwartz, R.1    Kraus, S.2    Wilkenfeld, J.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.