메뉴 건너뛰기




Volumn 22, Issue 6, 2010, Pages 1511-1527

Hyperbolically discounted temporal difference learning

Author keywords

[No Author keywords available]

Indexed keywords

ACTION POTENTIAL; ALGORITHM; ANIMAL; ARTICLE; ARTIFICIAL INTELLIGENCE; ARTIFICIAL NEURAL NETWORK; BRAIN; COMPUTER SIMULATION; DECISION MAKING; MATHEMATICAL PHENOMENA; NERVE CELL; NERVE CELL NETWORK; PHYSIOLOGY; TIME; TIME PERCEPTION;

EID: 77953497419     PISSN: 08997667     EISSN: 1530888X     Source Type: Journal    
DOI: 10.1162/neco.2010.08-09-1080     Document Type: Article
Times cited : (35)

References (18)
  • 1
    • 0033790898 scopus 로고    scopus 로고
    • Reward-predicting and rewarddetecting neuronal activity in the primate supplementary eye field
    • Amador, N., Schlag-Rey, M., & Schlag, J. (2000). Reward-predicting and rewarddetecting neuronal activity in the primate supplementary eye field. J. Neurophysiol., 84(4), 2166-2170.
    • (2000) J. Neurophysiol. , vol.84 , Issue.4 , pp. 2166-2170
    • Amador, N.1    Schlag-Rey, M.2    Schlag, J.3
  • 2
    • 0033012067 scopus 로고    scopus 로고
    • Preference for sequences of rewards: Further tests of a parallel discounting model
    • Brunner, D. (1999). Preference for sequences of rewards: Further tests of a parallel discounting model. Behavioural Processes, 45(1-3), 87-99.
    • (1999) Behavioural Processes , vol.45 , Issue.1-3 , pp. 87-99
    • Brunner, D.1
  • 3
    • 0033722074 scopus 로고    scopus 로고
    • Behavioral considerations suggest an average reward TD model of the dopamine system
    • 679-684
    • Daw, N. D., & Touretzky, D. S. (2000). Behavioral considerations suggest an average reward TD model of the dopamine system. Neurocomputing: An International Journal, 32-33, 679-684.
    • (2000) Neurocomputing: An International Journal , pp. 32-33
    • Daw, N.D.1    Touretzky, D.S.2
  • 4
    • 0036835734 scopus 로고    scopus 로고
    • Long-term reward prediction in TD models of the dopamine system
    • Daw, N. D., & Touretzky, D. S. (2002). Long-term reward prediction in TD models of the dopamine system. Neural Comput., 14(11), 2567-2587.
    • (2002) Neural Comput , vol.14 , Issue.11 , pp. 2567-2587
    • Daw, N.D.1    Touretzky, D.S.2
  • 5
    • 4243535556 scopus 로고    scopus 로고
    • Exponential versus hyperbolic discounting of delayed outcomes: Risk and waiting time
    • Green, L., & Myerson, J. (1996). Exponential versus hyperbolic discounting of delayed outcomes: Risk and waiting time. Amer. Zool., 36(4), 496-505.
    • (1996) Amer. Zool. , vol.36 , Issue.4 , pp. 496-505
    • Green, L.1    Myerson, J.2
  • 6
    • 33746106102 scopus 로고    scopus 로고
    • Risky theories-The effects of variance on foraging decisions
    • Kacelnik, A., & Bateson,M. (1996). Risky theories-The effects of variance on foraging decisions. Amer. Zool., 36(4), 402-434.
    • (1996) Amer. Zool. , vol.36 , Issue.4 , pp. 402-434
    • Kacelnik, A.1    Bateson, M.2
  • 7
    • 50349093022 scopus 로고    scopus 로고
    • Influence of reward delays on responses of dopamine neurons
    • Kobayashi, S., & Schultz, W. (2008). Influence of reward delays on responses of dopamine neurons. J. Neurosci., 28(31), 7837-7846.
    • (2008) J. Neurosci. , vol.28 , Issue.31 , pp. 7837-7846
    • Kobayashi, S.1    Schultz, W.2
  • 8
    • 0003114728 scopus 로고
    • An adjusting procedure for studying delayed reinforcement
    • M. L. Commons, J. E.Mazur, J. A. Nevin,&H. Rachlin (Eds.) Mahwah, NJ: Erlbaum
    • Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. In M. L. Commons, J. E.Mazur, J. A. Nevin,&H. Rachlin (Eds.), The effect of delay and intervening events on reinforcement value (Vol. 5, pp. 55-73).Mahwah, NJ: Erlbaum.
    • (1987) The effect of delay and intervening events on reinforcement value , vol.5 , pp. 55-73
    • Mazur, J.E.1
  • 9
    • 84993845533 scopus 로고
    • Discounting of delayed rewards: Models of individual choice
    • Myerson, J., & Green, L. (1995). Discounting of delayed rewards: Models of individual choice. J. Exp. Anal. Behav., 64(3), 263-276.
    • (1995) J. Exp. Anal. Behav. , vol.64 , Issue.3 , pp. 263-276
    • Myerson, J.1    Green, L.2
  • 10
    • 0027135269 scopus 로고
    • Rewardrelated activity in the monkey striatum and substantia nigra
    • Schultz, W., Apicella, P., Ljungberg, T., Romo, R., & Scarnati, E. (1993). Rewardrelated activity in the monkey striatum and substantia nigra. Prog. Brain Res., 99, 227-235.
    • (1993) Prog Brain Res , vol.99 , pp. 227-235
    • Schultz, W.1    Apicella, P.2    Ljungberg, T.3    Romo, R.4    Scarnati, E.5
  • 11
    • 0026442752 scopus 로고
    • Neuronal activity in monkey ventral striatum related to the expectation of reward
    • Schultz, W., Apicella, P., Scarnati, E., & Ljungberg, T. (1992). Neuronal activity in monkey ventral striatum related to the expectation of reward. J. Neurosci., 12(12), 4595-4610.
    • (1992) J. Neurosci. , vol.12 , Issue.12 , pp. 4595-4610
    • Schultz, W.1    Apicella, P.2    Scarnati, E.3    Ljungberg, T.4
  • 12
    • 0034061495 scopus 로고    scopus 로고
    • Reward processing in primate orbitofrontal cortex and basal ganglia
    • Schultz, W., Tremblay, L., & Hollerman, J. R. (2000). Reward processing in primate orbitofrontal cortex and basal ganglia. Cereb. Cortex, 10(3), 272-284.
    • (2000) Cereb. Cortex , vol.10 , Issue.3 , pp. 272-284
    • Schultz, W.1    Tremblay, L.2    Hollerman, J.R.3
  • 15
    • 0035315989 scopus 로고    scopus 로고
    • Temporal difference model reproduces anticipatory neural activity
    • Suri, R. E., & Schultz,W. (2001). Temporal difference model reproduces anticipatory neural activity. Neural Comput., 13(4), 841-862.
    • (2001) Neural Comput , vol.13 , Issue.4 , pp. 841-862
    • Suri, R.E.1    Schultz, W.2
  • 16
    • 0003066891 scopus 로고
    • Time-derivative models of Pavlovian reinforcement
    • M. Gabriel & J. Moore (Eds.), Cambridge, MA: MIT Press
    • Sutton, R. S., & Barto, A. G. (1990). Time-derivative models of Pavlovian reinforcement. In M. Gabriel & J. Moore (Eds.), Learning and computational neuroscience (pp. 497-537). Cambridge, MA: MIT Press.
    • (1990) Learning and computational neuroscience , pp. 497-537
    • Sutton, R.S.1    Barto, A.G.2
  • 17
    • 0033221519 scopus 로고    scopus 로고
    • Average cost temporal-difference learning
    • Tsitsiklis, J. N., & Van Roy, B. (1999). Average cost temporal-difference learning. Automatica, 35(11), 1799-1808.
    • (1999) Automatica , vol.35 , Issue.11 , pp. 1799-1808
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 18
    • 0036832957 scopus 로고    scopus 로고
    • On average versus discounted reward temporaldifference learning
    • Tsitsiklis, J.N.,&Van Roy, B. (2002).On average versus discounted reward temporaldifference learning. Machine Learning, 49(2-3), 179-191.
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 179-191
    • Tsitsiklis, J.N.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.