메뉴 건너뛰기




Volumn 20, Issue 1, 2008, Pages 227-251

The actor-critic learning is behind the matching law: Matching versus optimal behaviors

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHM; ANIMAL; ARTICLE; ARTIFICIAL INTELLIGENCE; BEHAVIOR; BIOLOGICAL MODEL; COMPUTER SIMULATION; DECISION MAKING; HUMAN; LEARNING; PHYSIOLOGY; REINFORCEMENT; REWARD;

EID: 37749023538     PISSN: 08997667     EISSN: 1530888X     Source Type: Journal    
DOI: 10.1162/neco.2008.20.1.227     Document Type: Article
Times cited : (45)

References (44)
  • 1
    • 1842612383 scopus 로고    scopus 로고
    • Prefrontal cortex and decision making in a mixed-strategy game
    • Barraclough, D., Conroy, M., & Lee, D. (2004). Prefrontal cortex and decision making in a mixed-strategy game. Nature Neuroscience, 7(4), 404-410.
    • (2004) Nature Neuroscience , vol.7 , Issue.4 , pp. 404-410
    • Barraclough, D.1    Conroy, M.2    Lee, D.3
  • 2
    • 84980245918 scopus 로고
    • Optimization and the matching law as accounts of instrumental behavior
    • Baum, W. M. (1981). Optimization and the matching law as accounts of instrumental behavior. Journal of the Experimental Analysis of Behavior, 36, 387-402.
    • (1981) Journal of the Experimental Analysis of Behavior , vol.36 , pp. 387-402
    • Baum, W.M.1
  • 4
    • 0034988599 scopus 로고    scopus 로고
    • Functional imaging of neural responses to expectancy and experience of monetary gains and losses
    • Breiter, H. C., Aharon, I., Kahneman, D., Dale, A., & Shizgal, P. (2001). Functional imaging of neural responses to expectancy and experience of monetary gains and losses. Neuron, 30, 619-639.
    • (2001) Neuron , vol.30 , pp. 619-639
    • Breiter, H.C.1    Aharon, I.2    Kahneman, D.3    Dale, A.4    Shizgal, P.5
  • 6
    • 0035384099 scopus 로고    scopus 로고
    • Operant behavior suggests attentional gating of dopamine system inputs
    • Daw, N. D., & Touretzky, D. S. (2001). Operant behavior suggests attentional gating of dopamine system inputs. Neurocomputing, 38-40, 1161-1167.
    • (2001) Neurocomputing , vol.38-40 , pp. 1161-1167
    • Daw, N.D.1    Touretzky, D.S.2
  • 7
    • 0036835734 scopus 로고    scopus 로고
    • Long-term reward prediction in TD models of the dopamine system
    • Daw, N. D., & Touretzky, D. S. (2002). Long-term reward prediction in TD models of the dopamine system. Neural Computation, 14, 2567-2583.
    • (2002) Neural Computation , vol.14 , pp. 2567-2583
    • Daw, N.D.1    Touretzky, D.S.2
  • 9
    • 0037057808 scopus 로고    scopus 로고
    • Reward, motivation, and reinforcement learning
    • Dayan, P., & Balleine, B. W. (2002). Reward, motivation, and reinforcement learning. Neuron, 36, 285-298.
    • (2002) Neuron , vol.36 , pp. 285-298
    • Dayan, P.1    Balleine, B.W.2
  • 11
    • 0034524427 scopus 로고    scopus 로고
    • Complementary roles of basal ganglia and cerebellum in learning and motor control
    • Doya, K. (2000). Complementary roles of basal ganglia and cerebellum in learning and motor control. Current Opinion in Neurobiology, 10, 732-739.
    • (2000) Current Opinion in Neurobiology , vol.10 , pp. 732-739
    • Doya, K.1
  • 12
    • 0035490184 scopus 로고    scopus 로고
    • The rat approximates an ideal detector of changes in rates of reward: Implications for the law of effect
    • Gallistel, C., Mark, T., King, A., & Latham, P. (2001). The rat approximates an ideal detector of changes in rates of reward: Implications for the law of effect. J. Exp. Psychol. Anim. Behav. Processes, 27, 354-372.
    • (2001) J. Exp. Psychol. Anim. Behav. Processes , vol.27 , pp. 354-372
    • Gallistel, C.1    Mark, T.2    King, A.3    Latham, P.4
  • 13
    • 1242319297 scopus 로고    scopus 로고
    • A neural correlate of reward-based behavioral learning in caudate nucleus: A functional magnetic resonance imaging study of a stochastic decision task
    • Haruno, M., Kuroda, T., Doya, K., Toyama, K., Kimura, M., Samejima, K., Imamizu, H., & Kawato, M. (2004). A neural correlate of reward-based behavioral learning in caudate nucleus: A functional magnetic resonance imaging study of a stochastic decision task. J. Neurosci., 24(7), 1660-1665.
    • (2004) J. Neurosci , vol.24 , Issue.7 , pp. 1660-1665
    • Haruno, M.1    Kuroda, T.2    Doya, K.3    Toyama, K.4    Kimura, M.5    Samejima, K.6    Imamizu, H.7    Kawato, M.8
  • 15
    • 0018425070 scopus 로고
    • Is matching compatible with reinforcement maximization on concurrent variable interval, variable ratio?
    • Herrnstein, R. J., & Heyman, G. M. (1979). Is matching compatible with reinforcement maximization on concurrent variable interval, variable ratio? Journal of the Experimental Analysis of Behavior, 31, 209-223.
    • (1979) Journal of the Experimental Analysis of Behavior , vol.31 , pp. 209-223
    • Herrnstein, R.J.1    Heyman, G.M.2
  • 17
    • 0018340904 scopus 로고
    • A Markov model description of changeover probabilities on concurrent variable-interval schedules
    • Heyman, G. M. (1979). A Markov model description of changeover probabilities on concurrent variable-interval schedules. Journal of the Experimental Analysis of Behavior, 31, 41-51.
    • (1979) Journal of the Experimental Analysis of Behavior , vol.31 , pp. 41-51
    • Heyman, G.M.1
  • 18
    • 84993911704 scopus 로고
    • Reinforcer magnitude (sucrose concentration) and the matching law theory of response strength
    • Heyman, G., & Monaghan, M. (1994). Reinforcer magnitude (sucrose concentration) and the matching law theory of response strength. Journal of the Experimental Analysis of Behavior, 61, 505-516.
    • (1994) Journal of the Experimental Analysis of Behavior , vol.61 , pp. 505-516
    • Heyman, G.1    Monaghan, M.2
  • 21
    • 0029686765 scopus 로고    scopus 로고
    • Humans' choices in situations of time-based diminishing returns: Effects of fixed-interval duration and progressive-interval step size
    • Jacobs, E. A., & Hackenberg, T. D. (1996). Humans' choices in situations of time-based diminishing returns: Effects of fixed-interval duration and progressive-interval step size. Journal of the Experimental Analysis of Behavior, 65, 5-19.
    • (1996) Journal of the Experimental Analysis of Behavior , vol.65 , pp. 5-19
    • Jacobs, E.A.1    Hackenberg, T.D.2
  • 22
    • 0006731671 scopus 로고    scopus 로고
    • Anticipation of increasing monetary reward selectively recruits nucleus accumbens
    • Knutson, B., Adams, C. M., Fong, G. W., & Hommer, D. J. (2001). Anticipation of increasing monetary reward selectively recruits nucleus accumbens. J. Neuroscience, 15, 1-5.
    • (2001) J. Neuroscience , vol.15 , pp. 1-5
    • Knutson, B.1    Adams, C.M.2    Fong, G.W.3    Hommer, D.J.4
  • 23
    • 0019789844 scopus 로고
    • Optimization theory fails to predict performance of pigeons in a two-response situation
    • Mazur, J. (1981). Optimization theory fails to predict performance of pigeons in a two-response situation. Science, 224(4522), 823-825.
    • (1981) Science , vol.224 , Issue.4522 , pp. 823-825
    • Mazur, J.1
  • 24
    • 37749042362 scopus 로고    scopus 로고
    • Mazur, J. E. (2005). Learning and behavior.(6th ed.). Upper Saddle River, NJ: Prentice Hall.
    • Mazur, J. E. (2005). Learning and behavior.(6th ed.). Upper Saddle River, NJ: Prentice Hall.
  • 25
    • 0037650217 scopus 로고    scopus 로고
    • Temporal prediction errors in a passive learning task activate human striatum
    • McClure, S., Berns, G. S., & Montague, P. (2003). Temporal prediction errors in a passive learning task activate human striatum. Neuron, 38(2), 339-346.
    • (2003) Neuron , vol.38 , Issue.2 , pp. 339-346
    • McClure, S.1    Berns, G.S.2    Montague, P.3
  • 26
    • 0037057753 scopus 로고    scopus 로고
    • Neural economics and the biological substrates of valuation
    • Montague, P., & Berns, G. (2002). Neural economics and the biological substrates of valuation. Neuron, 36(2), 265-284.
    • (2002) Neuron , vol.36 , Issue.2 , pp. 265-284
    • Montague, P.1    Berns, G.2
  • 27
    • 0029981543 scopus 로고    scopus 로고
    • A framework for mesencephalic dopamine systems based on predictive Hebbian learning
    • Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neuroscience, 16, 1936-1947.
    • (1996) J. Neuroscience , vol.16 , pp. 1936-1947
    • Montague, P.R.1    Dayan, P.2    Sejnowski, T.J.3
  • 28
    • 3242673464 scopus 로고    scopus 로고
    • Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons
    • Morris, G., Arkadir, D., Nevet, A., Vaadia, E., & Bergman, H. (2004). Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron, 43, 133-143.
    • (2004) Neuron , vol.43 , pp. 133-143
    • Morris, G.1    Arkadir, D.2    Nevet, A.3    Vaadia, E.4    Bergman, H.5
  • 29
    • 0033566079 scopus 로고    scopus 로고
    • Neural correlates of decision variables in parietal cortex
    • Platt, M., & Glimcher, P. (1999). Neural correlates of decision variables in parietal cortex. Nature, 400(6741), 233-238.
    • (1999) Nature , vol.400 , Issue.6741 , pp. 233-238
    • Platt, M.1    Glimcher, P.2
  • 30
    • 0000783533 scopus 로고
    • Economic demand theory and psychological studies of choice
    • G. Bower Ed, New York: Academic Press
    • Rachlin, H., Green, L., Kagel, J., & Battalio, R. (1976). Economic demand theory and psychological studies of choice. In G. Bower (Ed.), The psychology of learning and motivation (Vol. 10, pp. 129-154). New York: Academic Press.
    • (1976) The psychology of learning and motivation , vol.10 , pp. 129-154
    • Rachlin, H.1    Green, L.2    Kagel, J.3    Battalio, R.4
  • 32
    • 28144449057 scopus 로고    scopus 로고
    • Representation of actionspecific-reward value in the striatum
    • Samejima, K., Ueda, Y., Doya, K., & Kimura, M. (2005). Representation of actionspecific-reward value in the striatum. Science, 310, 1337-1340.
    • (2005) Science , vol.310 , pp. 1337-1340
    • Samejima, K.1    Ueda, Y.2    Doya, K.3    Kimura, M.4
  • 34
    • 1842684992 scopus 로고    scopus 로고
    • Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioural ecology
    • Schultz, W. (2004). Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioural ecology. Current Opinion in Neurobiology, 14, 139-147.
    • (2004) Current Opinion in Neurobiology , vol.14 , pp. 139-147
    • Schultz, W.1
  • 35
    • 0030896968 scopus 로고    scopus 로고
    • A neural substrate of prediction and reward
    • Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275, 1593-1599.
    • (1997) Science , vol.275 , pp. 1593-1599
    • Schultz, W.1    Dayan, P.2    Montague, P.R.3
  • 36
    • 0347362917 scopus 로고    scopus 로고
    • Learning in spiking neural networks by reinforcement of stochastic synaptic transmission
    • Seung, H. (2003). Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron, 40(6), 1063-1073.
    • (2003) Neuron , vol.40 , Issue.6 , pp. 1063-1073
    • Seung, H.1
  • 38
    • 0021093494 scopus 로고
    • Optimization: A result or a mechanism?
    • Staddon, J., & Hinson, J. (1983). Optimization: A result or a mechanism? Science, 221, 976-977.
    • (1983) Science , vol.221 , pp. 976-977
    • Staddon, J.1    Hinson, J.2
  • 39
    • 0017340908 scopus 로고
    • Concurrent schedules: A quantitative relation between changeover behavior and its consequences
    • Stubbs, D. A., Pliskoff, S. S., & Reid, H. M. (1977). Concurrent schedules: A quantitative relation between changeover behavior and its consequences. Journal of the Experimental Analysis of Behavior, 27, 85-96.
    • (1977) Journal of the Experimental Analysis of Behavior , vol.27 , pp. 85-96
    • Stubbs, D.A.1    Pliskoff, S.S.2    Reid, H.M.3
  • 40
    • 2942726234 scopus 로고    scopus 로고
    • Matching behavior and the representation of value in the parietal cortex
    • Sugrue, L. P., Corrado, G. S., & Newsome, W. T. (2004). Matching behavior and the representation of value in the parietal cortex. Science, 304, 1782-1787.
    • (2004) Science , vol.304 , pp. 1782-1787
    • Sugrue, L.P.1    Corrado, G.S.2    Newsome, W.T.3
  • 42
    • 3343026029 scopus 로고    scopus 로고
    • Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops
    • Tanaka, S. C., Doya, K., Okada, G., Ueda, K., Okamoto, Y., & Yamawaki, S. (2004). Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nature Neuroscience, 7, 887-893.
    • (2004) Nature Neuroscience , vol.7 , pp. 887-893
    • Tanaka, S.C.1    Doya, K.2    Okada, G.3    Ueda, K.4    Okamoto, Y.5    Yamawaki, S.6
  • 43
    • 84985071750 scopus 로고
    • Maximizing versus matching on concurrent variable-interval schedules
    • Vyse, S. A., & Belke, T. W. (1992). Maximizing versus matching on concurrent variable-interval schedules. Journal of the Experimental Analysis of Behavior, 58, 325-334.
    • (1992) Journal of the Experimental Analysis of Behavior , vol.58 , pp. 325-334
    • Vyse, S.A.1    Belke, T.W.2
  • 44
    • 0037028039 scopus 로고    scopus 로고
    • Probabilistic decision making by slow reverberation in cortical circuits
    • Wang, X. (2002). Probabilistic decision making by slow reverberation in cortical circuits. Neuron, 36(5), 955-968.
    • (2002) Neuron , vol.36 , Issue.5 , pp. 955-968
    • Wang, X.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.