메뉴 건너뛰기




Volumn 7006 LNAI, Issue , 2011, Pages 335-346

Value-difference based exploration: Adaptive control between epsilon-greedy and softmax

Author keywords

[No Author keywords available]

Indexed keywords

ACTION SELECTION; ADAPTIVE CONTROL; Q-LEARNING; TEMPORAL DIFFERENCE LEARNING;

EID: 80054004135     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-642-24455-1_33     Document Type: Conference Paper
Times cited : (204)

References (14)
  • 2
    • 0004049893 scopus 로고
    • PhD thesis, University of Cambridge, Cambridge, England
    • Watkins, C.: Learning from Delayed Rewards. PhD thesis, University of Cambridge, Cambridge, England (1989)
    • (1989) Learning from Delayed Rewards
    • Watkins, C.1
  • 3
    • 0003411271 scopus 로고
    • Technical Report CMU-CS-92-102, Carnegie Mellon University, Pittsburgh, PA, USA
    • Thrun, S.B.: Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie Mellon University, Pittsburgh, PA, USA (1992)
    • (1992) Efficient Exploration in Reinforcement Learning
    • Thrun, S.B.1
  • 5
    • 0041966002 scopus 로고    scopus 로고
    • Using confidence bounds for exploitation-exploration trade-offs
    • Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3, 397-422 (2002)
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 397-422
    • Auer, P.1
  • 6
    • 33646406807 scopus 로고    scopus 로고
    • Multi-armed bandit algorithms and empirical evaluation
    • Gama, J., Camacho, R., Brazdil, P., Jorge, A., Torgo, L. (eds.) ECML 2005. Springer, Heidelberg
    • Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P., Jorge, A., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437-448. Springer, Heidelberg (2005)
    • (2005) LNCS (LNAI) , vol.3720 , pp. 437-448
    • Vermorel, J.1    Mohri, M.2
  • 7
    • 78349266245 scopus 로고    scopus 로고
    • Interview with Richard S. Sutton
    • Heidrich-Meisner, V.: Interview with Richard S. Sutton. In: Künstliche Intelligenz, vol. 3, pp. 41-43 (2009)
    • (2009) Künstliche Intelligenz , vol.3 , pp. 41-43
    • Heidrich-Meisner, V.1
  • 8
    • 78349245906 scopus 로고    scopus 로고
    • Adaptive ε-greedy exploration in reinforcement learning based on value differences
    • Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds.) KI 2010. Springer, Heidelberg
    • Tokic, M.: Adaptive ε-greedy exploration in reinforcement learning based on value differences. In: Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds.) KI 2010. LNCS, vol. 6359, pp. 203-210. Springer, Heidelberg (2010)
    • (2010) LNCS , vol.6359 , pp. 203-210
    • Tokic, M.1
  • 12
    • 33748998787 scopus 로고    scopus 로고
    • Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
    • George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learning 65(1), 167-198 (2006)
    • (2006) Machine Learning , vol.65 , Issue.1 , pp. 167-198
    • George, A.P.1    Powell, W.B.2
  • 13
    • 34249833101 scopus 로고
    • Technical note: Q-learning
    • Watkins, C., Dayan, P.: Technical note: Q-learning. Machine Learning 8(3), 279-292 (1992)
    • (1992) Machine Learning , vol.8 , Issue.3 , pp. 279-292
    • Watkins, C.1    Dayan, P.2
  • 14
    • 33745223257 scopus 로고    scopus 로고
    • Cortical substrates for exploratory decisions in humans
    • Daw, N.D., O'Doherty, J.P., Dayan, P., Seymour, B., Dolan, R.J.: Cortical substrates for exploratory decisions in humans. Nature 441, 876-879 (2006)
    • (2006) Nature , vol.441 , pp. 876-879
    • Daw, N.D.1    O'Doherty, J.P.2    Dayan, P.3    Seymour, B.4    Dolan, R.J.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.