메뉴 건너뛰기




Volumn 15, Issue 4-6, 2002, Pages 665-687

Control of exploitation-exploration meta-parameter in reinforcement learning

Author keywords

Attention; Exploitation exploration problem; Neuromodulator; Partially observable Markov decision process; Reinforcement learning

Indexed keywords

BRAIN; NEUROLOGY; SENSORY PERCEPTION;

EID: 0036592028     PISSN: 08936080     EISSN: None     Source Type: Journal    
DOI: 10.1016/S0893-6080(02)00056-4     Document Type: Article
Times cited : (190)

References (65)
  • 6
    • 2142715567 scopus 로고    scopus 로고
    • Brafman, R.I., & Tennenholtz, M (2001). R-max: A general polynomial time algorithm for near-optimal reinforcement learning. Proceedings of the 17th International Joint Conference on Artificial Intelligence (pp. 953-958).
  • 13
    • 2142828108 scopus 로고    scopus 로고
    • Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, San Francisco, CA: Morgan Kaufman, pp. 150-159
    • (1999)
    • Dearden, R.1    Friedman, N.2    Andre, D.3
  • 15
    • 0002337786 scopus 로고    scopus 로고
    • Metalearning, neuromodulation, and emotion
    • Hatano G., Okada N., Takabe H. (Eds.), Affective minds, Amsterdam: Elsevier
    • (2000) , pp. 101-104
    • Doya, K.1
  • 17
    • 2142660331 scopus 로고
    • Optimal control systems, New York, NY: Academic Press
    • (1965)
    • Fe'ldbaum, A.A.1
  • 21
    • 0002370418 scopus 로고    scopus 로고
    • A tutorial on learning with Bayesian networks
    • Jordan M.I. (Ed.), Learning in graphical models, Cambridge, MA: MIT Press
    • (1999) , pp. 301-354
    • Heckerman, D.1
  • 26
    • 2142809796 scopus 로고
    • Learning in embedded systems, Cambridge, MA: MIT Press
    • (1993)
    • Kaelbling, L.1
  • 28
    • 2142657495 scopus 로고    scopus 로고
    • Proceedings of the 15th International Conference on Machine Learning, San Mateo, CA: Morgan Kaufmann, pp. 260-268
    • (1998)
    • Kearns, M.1    Singh, S.2
  • 30
    • 0033213255 scopus 로고    scopus 로고
    • Effect of expected reward magnitude on the response of neurons in the dorsolateral prefrontal cortex of the macaque
    • (1999) Neuron , vol.24 , pp. 415-425
    • Leon, M.I.1    Shadlen, M.N.2
  • 48
    • 0032515019 scopus 로고    scopus 로고
    • Role for cingulate motor area cells in voluntary movement selection based on reward
    • (1998) Science , vol.282 , pp. 1335-1338
    • Shima, K.1    Tanji, J.2
  • 53
    • 2142712659 scopus 로고    scopus 로고
    • Sutton, R.S (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Machine Learning: Proceeding of the Seventh International Conference (pp. 216-224).
  • 56
    • 0022495758 scopus 로고
    • 6-Hydroxydopamine lesions of the nucleus accumbens, but not of the caudate nucleus, attenuate enhanced responding with reward-related stimuli produced by intra-accumbens d-amphetamine
    • (1986) Psychopharmacology , vol.90 , pp. 390-397
    • Taylor, J.R.1    Robbins, T.W.2
  • 57
    • 2142696808 scopus 로고
    • Handbook of intelligent control: Neural, fuzzy and adaptive approaches, Frorence, KY: Van Nostrand Reinhold
    • (1992)
    • Thrun, S.B.1
  • 62
    • 0029782802 scopus 로고    scopus 로고
    • Reward expectancy in primate prefrontal neurons
    • (1996) Nature , vol.382 , pp. 629-632
    • Watanabe, M.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.