SCOPUS 정보 검색 플랫폼

Neural Computation

Volumn 20, Issue 1, 2008, Pages 227-251

The actor-critic learning is behind the matching law: Matching versus optimal behaviors

(2) Sakai, Yutaka a Fukai, Tomoki b

a TAMAGAWA UNIVERSITY (Japan)

b RIKEN BRAIN SCIENCE INSTITUTE (Japan)

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHM; ANIMAL; ARTICLE; ARTIFICIAL INTELLIGENCE; BEHAVIOR; BIOLOGICAL MODEL; COMPUTER SIMULATION; DECISION MAKING; HUMAN; LEARNING; PHYSIOLOGY; REINFORCEMENT; REWARD;

ALGORITHMS; ANIMALS; ARTIFICIAL INTELLIGENCE; BEHAVIOR; COMPUTER SIMULATION; DECISION MAKING; HUMANS; LEARNING; MODELS, NEUROLOGICAL; REINFORCEMENT (PSYCHOLOGY); REWARD;

EID: 37749023538 PISSN: 08997667 EISSN: 1530888X Source Type: Journal
DOI: 10.1162/neco.2008.20.1.227 Document Type: Article

Times cited : (45)

References (44)

1
- 1842612383
- Prefrontal cortex and decision making in a mixed-strategy game
- Barraclough, D., Conroy, M., & Lee, D. (2004). Prefrontal cortex and decision making in a mixed-strategy game. Nature Neuroscience, 7(4), 404-410.
- (2004) Nature Neuroscience , vol.7 , Issue.4 , pp. 404-410
- Barraclough, D.¹ Conroy, M.² Lee, D.³

2
- 84980245918
- Optimization and the matching law as accounts of instrumental behavior
- Baum, W. M. (1981). Optimization and the matching law as accounts of instrumental behavior. Journal of the Experimental Analysis of Behavior, 36, 387-402.
- (1981) Journal of the Experimental Analysis of Behavior , vol.36 , pp. 387-402
- Baum, W.M.¹

3
- 84982335829
- Choice as time allocation
- Baum, W., & Rachlin, H. (1969). Choice as time allocation. Journal of the Experimental Analysis of Behavior, 12, 861-874.
- (1969) Journal of the Experimental Analysis of Behavior , vol.12 , pp. 861-874
- Baum, W.¹ Rachlin, H.²

4
- 0034988599
- Functional imaging of neural responses to expectancy and experience of monetary gains and losses
- Breiter, H. C., Aharon, I., Kahneman, D., Dale, A., & Shizgal, P. (2001). Functional imaging of neural responses to expectancy and experience of monetary gains and losses. Neuron, 30, 619-639.
- (2001) Neuron , vol.30 , pp. 619-639
- Breiter, H.C.¹ Aharon, I.² Kahneman, D.³ Dale, A.⁴ Shizgal, P.⁵

5
- 0004129897
- Mahwah, NJ: Erlbaum
- Davison, M., & McCarthy, D. (1987). The matching law: A research review. Mahwah, NJ: Erlbaum.
- (1987) The matching law: A research review
- Davison, M.¹ McCarthy, D.²

6
- 0035384099
- Operant behavior suggests attentional gating of dopamine system inputs
- Daw, N. D., & Touretzky, D. S. (2001). Operant behavior suggests attentional gating of dopamine system inputs. Neurocomputing, 38-40, 1161-1167.
- (2001) Neurocomputing , vol.38-40 , pp. 1161-1167
- Daw, N.D.¹ Touretzky, D.S.²

7
- 0036835734
- Long-term reward prediction in TD models of the dopamine system
- Daw, N. D., & Touretzky, D. S. (2002). Long-term reward prediction in TD models of the dopamine system. Neural Computation, 14, 2567-2583.
- (2002) Neural Computation , vol.14 , pp. 2567-2583
- Daw, N.D.¹ Touretzky, D.S.²

8
- 0004291629
- Cambridge, MA: MIT press
- Dayan, P., & Abbott, L. (2001). Theoretical Neuroscience. Cambridge, MA: MIT press.
- (2001) Theoretical Neuroscience
- Dayan, P.¹ Abbott, L.²

9
- 0037057808
- Reward, motivation, and reinforcement learning
- Dayan, P., & Balleine, B. W. (2002). Reward, motivation, and reinforcement learning. Neuron, 36, 285-298.
- (2002) Neuron , vol.36 , pp. 285-298
- Dayan, P.¹ Balleine, B.W.²

10
- 0021975818
- Matching and maximizing with variable-time schedules
- DeCarlo, L. T. (1985). Matching and maximizing with variable-time schedules. Journal of the Experimental Analysis of Behavior, 43, 75-81.
- (1985) Journal of the Experimental Analysis of Behavior , vol.43 , pp. 75-81
- DeCarlo, L.T.¹

11
- 0034524427
- Complementary roles of basal ganglia and cerebellum in learning and motor control
- Doya, K. (2000). Complementary roles of basal ganglia and cerebellum in learning and motor control. Current Opinion in Neurobiology, 10, 732-739.
- (2000) Current Opinion in Neurobiology , vol.10 , pp. 732-739
- Doya, K.¹

12
- 0035490184
- The rat approximates an ideal detector of changes in rates of reward: Implications for the law of effect
- Gallistel, C., Mark, T., King, A., & Latham, P. (2001). The rat approximates an ideal detector of changes in rates of reward: Implications for the law of effect. J. Exp. Psychol. Anim. Behav. Processes, 27, 354-372.
- (2001) J. Exp. Psychol. Anim. Behav. Processes , vol.27 , pp. 354-372
- Gallistel, C.¹ Mark, T.² King, A.³ Latham, P.⁴

13
- 1242319297
- A neural correlate of reward-based behavioral learning in caudate nucleus: A functional magnetic resonance imaging study of a stochastic decision task
- Haruno, M., Kuroda, T., Doya, K., Toyama, K., Kimura, M., Samejima, K., Imamizu, H., & Kawato, M. (2004). A neural correlate of reward-based behavioral learning in caudate nucleus: A functional magnetic resonance imaging study of a stochastic decision task. J. Neurosci., 24(7), 1660-1665.
- (2004) J. Neurosci , vol.24 , Issue.7 , pp. 1660-1665
- Haruno, M.¹ Kuroda, T.² Doya, K.³ Toyama, K.⁴ Kimura, M.⁵ Samejima, K.⁶ Imamizu, H.⁷ Kawato, M.⁸

14
- 0003818321
- Cambridge, MA: Harvard University Press
- Herrnstein, R. J. (1997). The matching law: Papers in psychology and economics. Cambridge, MA: Harvard University Press.
- (1997) The matching law: Papers in psychology and economics
- Herrnstein, R.J.¹

15
- 0018425070
- Is matching compatible with reinforcement maximization on concurrent variable interval, variable ratio?
- Herrnstein, R. J., & Heyman, G. M. (1979). Is matching compatible with reinforcement maximization on concurrent variable interval, variable ratio? Journal of the Experimental Analysis of Behavior, 31, 209-223.
- (1979) Journal of the Experimental Analysis of Behavior , vol.31 , pp. 209-223
- Herrnstein, R.J.¹ Heyman, G.M.²

16
- 0001168732
- Melioration and behavioral allocation
- J. Staddon Ed, New York: Academic Press
- Herrnstein, R. J., & Vaughan, W. J. (1980). Melioration and behavioral allocation. In J. Staddon (Ed.), Limits to action: The allocation of individual behavior. New York: Academic Press.
- (1980) Limits to action: The allocation of individual behavior
- Herrnstein, R.J.¹ Vaughan, W.J.²

17
- 0018340904
- A Markov model description of changeover probabilities on concurrent variable-interval schedules
- Heyman, G. M. (1979). A Markov model description of changeover probabilities on concurrent variable-interval schedules. Journal of the Experimental Analysis of Behavior, 31, 41-51.
- (1979) Journal of the Experimental Analysis of Behavior , vol.31 , pp. 41-51
- Heyman, G.M.¹

18
- 84993911704
- Reinforcer magnitude (sucrose concentration) and the matching law theory of response strength
- Heyman, G., & Monaghan, M. (1994). Reinforcer magnitude (sucrose concentration) and the matching law theory of response strength. Journal of the Experimental Analysis of Behavior, 61, 505-516.
- (1994) Journal of the Experimental Analysis of Behavior , vol.61 , pp. 505-516
- Heyman, G.¹ Monaghan, M.²

19
- 0003894363
- Cambridge, MA: Bradford Books, MIT Press
- Houk, J. C., Davis, J. L., & Beiser, D. G. (1994). Models of information processing in the basal ganglia (computational neuroscience). Cambridge, MA: Bradford Books, MIT Press.
- (1994) Models of information processing in the basal ganglia (computational neuroscience)
- Houk, J.C.¹ Davis, J.L.² Beiser, D.G.³

20
- 84980198872
- How to maximize reward rate in two variable-interval paradigms
- Houston, A. I., & McNamara, J. (1981). How to maximize reward rate in two variable-interval paradigms. Journal of the Experimental Analysis of Behavior, 35, 367-396.
- (1981) Journal of the Experimental Analysis of Behavior , vol.35 , pp. 367-396
- Houston, A.I.¹ McNamara, J.²

21
- 0029686765
- Humans' choices in situations of time-based diminishing returns: Effects of fixed-interval duration and progressive-interval step size
- Jacobs, E. A., & Hackenberg, T. D. (1996). Humans' choices in situations of time-based diminishing returns: Effects of fixed-interval duration and progressive-interval step size. Journal of the Experimental Analysis of Behavior, 65, 5-19.
- (1996) Journal of the Experimental Analysis of Behavior , vol.65 , pp. 5-19
- Jacobs, E.A.¹ Hackenberg, T.D.²

22
- 0006731671
- Anticipation of increasing monetary reward selectively recruits nucleus accumbens
- Knutson, B., Adams, C. M., Fong, G. W., & Hommer, D. J. (2001). Anticipation of increasing monetary reward selectively recruits nucleus accumbens. J. Neuroscience, 15, 1-5.
- (2001) J. Neuroscience , vol.15 , pp. 1-5
- Knutson, B.¹ Adams, C.M.² Fong, G.W.³ Hommer, D.J.⁴

23
- 0019789844
- Optimization theory fails to predict performance of pigeons in a two-response situation
- Mazur, J. (1981). Optimization theory fails to predict performance of pigeons in a two-response situation. Science, 224(4522), 823-825.
- (1981) Science , vol.224 , Issue.4522 , pp. 823-825
- Mazur, J.¹

24
- 37749042362
- Mazur, J. E. (2005). Learning and behavior.(6th ed.). Upper Saddle River, NJ: Prentice Hall.
- Mazur, J. E. (2005). Learning and behavior.(6th ed.). Upper Saddle River, NJ: Prentice Hall.

25
- 0037650217
- Temporal prediction errors in a passive learning task activate human striatum
- McClure, S., Berns, G. S., & Montague, P. (2003). Temporal prediction errors in a passive learning task activate human striatum. Neuron, 38(2), 339-346.
- (2003) Neuron , vol.38 , Issue.2 , pp. 339-346
- McClure, S.¹ Berns, G.S.² Montague, P.³

26
- 0037057753
- Neural economics and the biological substrates of valuation
- Montague, P., & Berns, G. (2002). Neural economics and the biological substrates of valuation. Neuron, 36(2), 265-284.
- (2002) Neuron , vol.36 , Issue.2 , pp. 265-284
- Montague, P.¹ Berns, G.²

27
- 0029981543
- A framework for mesencephalic dopamine systems based on predictive Hebbian learning
- Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neuroscience, 16, 1936-1947.
- (1996) J. Neuroscience , vol.16 , pp. 1936-1947
- Montague, P.R.¹ Dayan, P.² Sejnowski, T.J.³

28
- 3242673464
- Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons
- Morris, G., Arkadir, D., Nevet, A., Vaadia, E., & Bergman, H. (2004). Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron, 43, 133-143.
- (2004) Neuron , vol.43 , pp. 133-143
- Morris, G.¹ Arkadir, D.² Nevet, A.³ Vaadia, E.⁴ Bergman, H.⁵

29
- 0033566079
- Neural correlates of decision variables in parietal cortex
- Platt, M., & Glimcher, P. (1999). Neural correlates of decision variables in parietal cortex. Nature, 400(6741), 233-238.
- (1999) Nature , vol.400 , Issue.6741 , pp. 233-238
- Platt, M.¹ Glimcher, P.²

30
- 0000783533
- Economic demand theory and psychological studies of choice
- G. Bower Ed, New York: Academic Press
- Rachlin, H., Green, L., Kagel, J., & Battalio, R. (1976). Economic demand theory and psychological studies of choice. In G. Bower (Ed.), The psychology of learning and motivation (Vol. 10, pp. 129-154). New York: Academic Press.
- (1976) The psychology of learning and motivation , vol.10 , pp. 129-154
- Rachlin, H.¹ Green, L.² Kagel, J.³ Battalio, R.⁴

31
- 84985154114
- Income maximizing in concurrent interval-ratio schedules
- Sakagami, T., Hursh, S. R., Christensen, J., & Silberberg, A. (1989). Income maximizing in concurrent interval-ratio schedules. Journal of the Experimental Analysis of Behavior, 52, 41-46.
- (1989) Journal of the Experimental Analysis of Behavior , vol.52 , pp. 41-46
- Sakagami, T.¹ Hursh, S.R.² Christensen, J.³ Silberberg, A.⁴

32
- 28144449057
- Representation of actionspecific-reward value in the striatum
- Samejima, K., Ueda, Y., Doya, K., & Kimura, M. (2005). Representation of actionspecific-reward value in the striatum. Science, 310, 1337-1340.
- (2005) Science , vol.310 , pp. 1337-1340
- Samejima, K.¹ Ueda, Y.² Doya, K.³ Kimura, M.⁴

33
- 0028432063
- Human choice in concurrent ratio-interval schedules of reinforcement
- Savastano, H. I., & Fantino, E. (1994). Human choice in concurrent ratio-interval schedules of reinforcement. Journal of the Experimental Analysis of Behavior, 61, 453-463.
- (1994) Journal of the Experimental Analysis of Behavior , vol.61 , pp. 453-463
- Savastano, H.I.¹ Fantino, E.²

34
- 1842684992
- Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioural ecology
- Schultz, W. (2004). Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioural ecology. Current Opinion in Neurobiology, 14, 139-147.
- (2004) Current Opinion in Neurobiology , vol.14 , pp. 139-147
- Schultz, W.¹

35
- 0030896968
- A neural substrate of prediction and reward
- Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275, 1593-1599.
- (1997) Science , vol.275 , pp. 1593-1599
- Schultz, W.¹ Dayan, P.² Montague, P.R.³

36
- 0347362917
- Learning in spiking neural networks by reinforcement of stochastic synaptic transmission
- Seung, H. (2003). Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron, 40(6), 1063-1073.
- (2003) Neuron , vol.40 , Issue.6 , pp. 1063-1073
- Seung, H.¹

37
- 0026253116
- Human choice on concurrent variable-interval, variable-ratio schedules
- Silberberg, A., Thomas, J., & Brendzen, N. (1991). Human choice on concurrent variable-interval, variable-ratio schedules. Journal of the Experimental Analysis of Behavior, 56, 575-584.
- (1991) Journal of the Experimental Analysis of Behavior , vol.56 , pp. 575-584
- Silberberg, A.¹ Thomas, J.² Brendzen, N.³

38
- 0021093494
- Optimization: A result or a mechanism?
- Staddon, J., & Hinson, J. (1983). Optimization: A result or a mechanism? Science, 221, 976-977.
- (1983) Science , vol.221 , pp. 976-977
- Staddon, J.¹ Hinson, J.²

39
- 0017340908
- Concurrent schedules: A quantitative relation between changeover behavior and its consequences
- Stubbs, D. A., Pliskoff, S. S., & Reid, H. M. (1977). Concurrent schedules: A quantitative relation between changeover behavior and its consequences. Journal of the Experimental Analysis of Behavior, 27, 85-96.
- (1977) Journal of the Experimental Analysis of Behavior , vol.27 , pp. 85-96
- Stubbs, D.A.¹ Pliskoff, S.S.² Reid, H.M.³

40
- 2942726234
- Matching behavior and the representation of value in the parietal cortex
- Sugrue, L. P., Corrado, G. S., & Newsome, W. T. (2004). Matching behavior and the representation of value in the parietal cortex. Science, 304, 1782-1787.
- (2004) Science , vol.304 , pp. 1782-1787
- Sugrue, L.P.¹ Corrado, G.S.² Newsome, W.T.³

41
- 0004007508
- Cambridge, MA: MIT press
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning. Cambridge, MA: MIT press.
- (1998) Reinforcement Learning
- Sutton, R.S.¹ Barto, A.G.²

42
- 3343026029
- Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops
- Tanaka, S. C., Doya, K., Okada, G., Ueda, K., Okamoto, Y., & Yamawaki, S. (2004). Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nature Neuroscience, 7, 887-893.
- (2004) Nature Neuroscience , vol.7 , pp. 887-893
- Tanaka, S.C.¹ Doya, K.² Okada, G.³ Ueda, K.⁴ Okamoto, Y.⁵ Yamawaki, S.⁶

43
- 84985071750
- Maximizing versus matching on concurrent variable-interval schedules
- Vyse, S. A., & Belke, T. W. (1992). Maximizing versus matching on concurrent variable-interval schedules. Journal of the Experimental Analysis of Behavior, 58, 325-334.
- (1992) Journal of the Experimental Analysis of Behavior , vol.58 , pp. 325-334
- Vyse, S.A.¹ Belke, T.W.²

44
- 0037028039
- Probabilistic decision making by slow reverberation in cortical circuits
- Wang, X. (2002). Probabilistic decision making by slow reverberation in cortical circuits. Neuron, 36(5), 955-968.
- (2002) Neuron , vol.36 , Issue.5 , pp. 955-968
- Wang, X.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.