SCOPUS 정보 검색 플랫폼

Current Opinion in Neurobiology

Volumn 17, Issue 2, 2007, Pages 205-212

Efficient reinforcement learning: computational theories, neuroscience and robotics

(2) Kawato, Mitsuo a Samejima, Kazuyuki b

a ADVANCED TELECOMMUNICATIONS RESEARCH INSTITUTE INTERNATIONAL (Japan)

b TAMAGAWA UNIVERSITY (Japan)

Author keywords

[No Author keywords available]

Indexed keywords

BASAL GANGLION; BRAIN CORTEX; BRAIN REGION; BRAIN STEM; CEREBELLUM; CONCEPTUAL FRAMEWORK; EXPERIMENTAL STUDY; HUMAN; LEARNING; LEARNING THEORY; MATHEMATICAL COMPUTING; NEUROSCIENCE; NONHUMAN; PRIORITY JOURNAL; PSYCHOLOGICAL THEORY; REINFORCEMENT; REVIEW; ROBOTICS;

ANIMALS; HUMANS; NEURAL NETWORKS (COMPUTER); NEUROSCIENCES; REINFORCEMENT (PSYCHOLOGY); ROBOTICS;

EID: 34147191094 PISSN: 09594388 EISSN: None Source Type: Journal
DOI: 10.1016/j.conb.2007.03.004 Document Type: Review

Times cited : (62)

References (71)

1
- 0020970738
- Neuron-like elements that can solve difficult learning control problems
- Barto A.G., Sutton R.S., and Anderson C.W. Neuron-like elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13 (1983) 835-846
- (1983) IEEE Trans Syst Man Cybern , vol.13 , pp. 835-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

2
- 0004007508
- The MIT Press
- Sutton R.S., and Barto A.G. Reinforcement learning (1998), The MIT Press
- (1998) Reinforcement learning
- Sutton, R.S.¹ Barto, A.G.²

3
- 0030896968
- A neural substrate of prediction and reward
- Schultz W., Dayan P., and Montague P.R. A neural substrate of prediction and reward. Science 275 (1997) 1593-1599
- (1997) Science , vol.275 , pp. 1593-1599
- Schultz, W.¹ Dayan, P.² Montague, P.R.³

4
- 0034078011
- Neuronal coding of prediction errors
- Schultz W., and Dickinson A. Neuronal coding of prediction errors. Annu Rev Neurosci 23 (2000) 473-500
- (2000) Annu Rev Neurosci , vol.23 , pp. 473-500
- Schultz, W.¹ Dickinson, A.²

5
- 34147137418
- Houk JC, Adams JL, Barto AG. Models of information processing in the basal ganglia. Edited by Houk JC, Davis JL, Beiser DG. The MIT Press; 1995.

6
- 0034524427
- Complementary roles of basal ganglia and cerebellum in learning and motor control
- Doya K. Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr Opin Neurobiol 10 (2000) 732-739
- (2000) Curr Opin Neurobiol , vol.10 , pp. 732-739
- Doya, K.¹

7
- 0242440823
- Correlated coding of motivation and outcome of decision by dopamine neurons
- Satoh T., Nakai S., Sato T., and Kimura M. Correlated coding of motivation and outcome of decision by dopamine neurons. J Neurosci 23 (2003) 9913-9923
- (2003) J Neurosci , vol.23 , pp. 9913-9923
- Satoh, T.¹ Nakai, S.² Sato, T.³ Kimura, M.⁴

8
- 4644290200
- A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping
- Takikawa Y., Kawagoe R., and Hikosaka O. A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. J Neurophysiol 92 (2004) 2520-2529
- (2004) J Neurophysiol , vol.92 , pp. 2520-2529
- Takikawa, Y.¹ Kawagoe, R.² Hikosaka, O.³

9
- 0037459319
- Discrete coding of reward probability and uncertainty by dopamine neurons
- Fiorillo C.D., Tobler P.N., and Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299 (2003) 1898-1902
- (2003) Science , vol.299 , pp. 1898-1902
- Fiorillo, C.D.¹ Tobler, P.N.² Schultz, W.³

10
- 14844349975
- Adaptive coding of reward value by dopamine neurons
- When a monkey receives unexpected reward, phasic change of activity of dopaminergic neurons can be explained as encoding the temporal-difference error. These authors added new findings regarding the scaling of this coding: this error signal is scaled by variance of reward distribution within a given context. Thus, the activity of dopaminergic neurons is scaled by the range of reward distribution, and could represent expected 'risk' of reward in the context.
- Tobler P.N., Fiorillo C.D., and Schultz W. Adaptive coding of reward value by dopamine neurons. Science 307 (2005) 1642-1645. When a monkey receives unexpected reward, phasic change of activity of dopaminergic neurons can be explained as encoding the temporal-difference error. These authors added new findings regarding the scaling of this coding: this error signal is scaled by variance of reward distribution within a given context. Thus, the activity of dopaminergic neurons is scaled by the range of reward distribution, and could represent expected 'risk' of reward in the context.
- (2005) Science , vol.307 , pp. 1642-1645
- Tobler, P.N.¹ Fiorillo, C.D.² Schultz, W.³

11
- 33747585633
- Midbrain dopamine neurons encode decisions for future action
- Morris G., Nevet A., Arkadir D., Vaadia E., and Bergman H. Midbrain dopamine neurons encode decisions for future action. Nat Neurosci 9 (2006) 1057-1063
- (2006) Nat Neurosci , vol.9 , pp. 1057-1063
- Morris, G.¹ Nevet, A.² Arkadir, D.³ Vaadia, E.⁴ Bergman, H.⁵

12
- 0345118165
- Neural mechanisms of reward-related motor learning
- Wickens J.R., Reynolds J.N., and Hyland B.I. Neural mechanisms of reward-related motor learning. Curr Opin Neurobiol 13 (2003) 685-690
- (2003) Curr Opin Neurobiol , vol.13 , pp. 685-690
- Wickens, J.R.¹ Reynolds, J.N.² Hyland, B.I.³

13
- 0842349509
- Reward-predicting activity of dopamine and caudate neurons - a possible mechanism of motivational control of saccadic eye movement
- Kawagoe R., Takikawa Y., and Hikosaka O. Reward-predicting activity of dopamine and caudate neurons - a possible mechanism of motivational control of saccadic eye movement. J Neurophysiol 91 (2004) 1013-1024
- (2004) J Neurophysiol , vol.91 , pp. 1013-1024
- Kawagoe, R.¹ Takikawa, Y.² Hikosaka, O.³

14
- 33744971409
- Role of dopamine in the primate caudate nucleus in reward modulation of saccades
- 2 receptors enhanced it. The dopamine-dependent plasticity in corticostriatal synapses could modulate sensorimotor learning dependent on reward.
- 2 receptors enhanced it. The dopamine-dependent plasticity in corticostriatal synapses could modulate sensorimotor learning dependent on reward.
- (2006) J Neurosci , vol.26 , pp. 5360-5369
- Nakamura, K.¹ Hikosaka, O.²

15
- 18844366638
- Relative reward processing in primate striatum
- Cromwell H.C., Hassani O.K., and Schultz W. Relative reward processing in primate striatum. Exp. Brain Res 162 (2005) 520-525
- (2005) Exp. Brain Res , vol.162 , pp. 520-525
- Cromwell, H.C.¹ Hassani, O.K.² Schultz, W.³

16
- 28144449057
- Representation of action-specific reward values in the striatum
- This study demonstrates that a considerable proportion of dorsal striatal neural activity represents the 'action value' predicted by Doya [6]. The authors then directly compared striatal neuronal activities with dynamically changing model parameters in a trial-by-trial manner, using a sophisticated estimation method, and found a good fit.
- Samejima K., Ueda Y., Doya K., and Kimura M. Representation of action-specific reward values in the striatum. Science 310 (2005) 1337-1340. This study demonstrates that a considerable proportion of dorsal striatal neural activity represents the 'action value' predicted by Doya [6]. The authors then directly compared striatal neuronal activities with dynamically changing model parameters in a trial-by-trial manner, using a sophisticated estimation method, and found a good fit.
- (2005) Science , vol.310 , pp. 1337-1340
- Samejima, K.¹ Ueda, Y.² Doya, K.³ Kimura, M.⁴

17
- 3242673464
- Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons
- Morris G., Arkadir D., Nevet A., Vaadia E., and Bergman H. Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43 (2004) 133-143
- (2004) Neuron , vol.43 , pp. 133-143
- Morris, G.¹ Arkadir, D.² Nevet, A.³ Vaadia, E.⁴ Bergman, H.⁵

18
- 33645363841
- Eskandar EN: Selective enhancement of associative learning by microstimulation of the anterior caudate
- This paper provides direct evidence that the striatum contributes to stimulus-action-reward association learning. The authors recorded caudate neuronal discharge that is temporarily reinforced during learning. They also applied electrical microstimulation to the caudate nucleus while the monkey learned arbitrary association between a visual stimulus and movement guided by reward feedback, and the microstimulation selectively enhanced the association learning.
- Williams Z.M. Eskandar EN: Selective enhancement of associative learning by microstimulation of the anterior caudate. Nat Neurosci 9 (2006) 562-568. This paper provides direct evidence that the striatum contributes to stimulus-action-reward association learning. The authors recorded caudate neuronal discharge that is temporarily reinforced during learning. They also applied electrical microstimulation to the caudate nucleus while the monkey learned arbitrary association between a visual stimulus and movement guided by reward feedback, and the microstimulation selectively enhanced the association learning.
- (2006) Nat Neurosci , vol.9 , pp. 562-568
- Williams, Z.M.¹

19
- 14544291512
- Different time courses of learning-related activity in the prefrontal cortex and striatum
- Pasupathy A., and Miller E.K. Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433 (2005) 873-876
- (2005) Nature , vol.433 , pp. 873-876
- Pasupathy, A.¹ Miller, E.K.²

20
- 0038491288
- Neuronal correlates of goal-based motor selection in the prefrontal cortex
- Matsumoto K., Suzuki W., and Tanaka K. Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science 301 (2003) 229-232
- (2003) Science , vol.301 , pp. 229-232
- Matsumoto, K.¹ Suzuki, W.² Tanaka, K.³

21
- 1842612383
- Prefrontal cortex and decision making in a mixed-strategy game
- Barraclough D.J., Conroy M.L., and Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nat Neurosci 7 (2004) 404-410
- (2004) Nat Neurosci , vol.7 , pp. 404-410
- Barraclough, D.J.¹ Conroy, M.L.² Lee, D.³

22
- 2942726234
- Matching behavior and the representation of value in the parietal cortex
- Sugrue L.P., Corrado G.S., and Newsome W.T. Matching behavior and the representation of value in the parietal cortex. Science 304 (2004) 1782-1787
- (2004) Science , vol.304 , pp. 1782-1787
- Sugrue, L.P.¹ Corrado, G.S.² Newsome, W.T.³

23
- 17844396920
- Choosing the greater of two goods: neural currencies for valuation and decision making
- Sugrue L.P., Corrado G.S., and Newsome W.T. Choosing the greater of two goods: neural currencies for valuation and decision making. Nat Rev Neurosci 6 (2005) 363-375
- (2005) Nat Rev Neurosci , vol.6 , pp. 363-375
- Sugrue, L.P.¹ Corrado, G.S.² Newsome, W.T.³

24
- 33644782012
- Dynamic response-by-response models of matching behavior in rhesus monkeys
- Lau B., and Glimcher P.W. Dynamic response-by-response models of matching behavior in rhesus monkeys. J Exp Anal Behav 84 (2005) 555-579
- (2005) J Exp Anal Behav , vol.84 , pp. 555-579
- Lau, B.¹ Glimcher, P.W.²

25
- 33644746612
- Linear-nonlinear-Poisson models of primate choice dynamics
- Corrado G.S., Sugrue L.P., Seung H.S., and Newsome W.T. Linear-nonlinear-Poisson models of primate choice dynamics. J Exp Anal Behav 84 (2005) 581-617
- (2005) J Exp Anal Behav , vol.84 , pp. 581-617
- Corrado, G.S.¹ Sugrue, L.P.² Seung, H.S.³ Newsome, W.T.⁴

26
- 33749050766
- Computational algorithms and neuronal network models underlying decision processes
- This review provides computational explanations of perceptual decision and matching behavior, which have been studied in the field of behavioral psychology, and their relationship to optimal learning theories and algorithms of reinforcement learning.
- Sakai Y., Okamoto H., and Fukai T. Computational algorithms and neuronal network models underlying decision processes. Neural Netw 19 (2006) 1091-1105. This review provides computational explanations of perceptual decision and matching behavior, which have been studied in the field of behavioral psychology, and their relationship to optimal learning theories and algorithms of reinforcement learning.
- (2006) Neural Netw , vol.19 , pp. 1091-1105
- Sakai, Y.¹ Okamoto, H.² Fukai, T.³

27
- 33748337293
- Imaging valuation models in human choice
- Montague P.R., King-Casas B., and Cohen J.D. Imaging valuation models in human choice. Annu Rev Neurosci 29 (2006) 417-448
- (2006) Annu Rev Neurosci , vol.29 , pp. 417-448
- Montague, P.R.¹ King-Casas, B.² Cohen, J.D.³

28
- 33748302924
- Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans
- Using combinations of pharmacological, imaging and computational modeling techniques, the authors directly demonstrate that fMRI signals in the striatum and orbitofrontal cortex are modulated by dopamine.
- Pessiglione M., Seymour B., Flandin G., Dolan R.J., and Frith C.D. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442 (2006) 1042-1045. Using combinations of pharmacological, imaging and computational modeling techniques, the authors directly demonstrate that fMRI signals in the striatum and orbitofrontal cortex are modulated by dopamine.
- (2006) Nature , vol.442 , pp. 1042-1045
- Pessiglione, M.¹ Seymour, B.² Flandin, G.³ Dolan, R.J.⁴ Frith, C.D.⁵

29
- 2942617032
- Temporal difference models describe higher-order learning in humans
- Seymour B., O'Doherty J.P., Dayan P., Koltzenburg M., Jones A.K., Dolan R.J., Friston K.J., and Frackowiak R.S. Temporal difference models describe higher-order learning in humans. Nature 429 (2004) 664-667
- (2004) Nature , vol.429 , pp. 664-667
- Seymour, B.¹ O'Doherty, J.P.² Dayan, P.³ Koltzenburg, M.⁴ Jones, A.K.⁵ Dolan, R.J.⁶ Friston, K.J.⁷ Frackowiak, R.S.⁸

30
- 1942520195
- Dissociable roles of ventral and dorsal striatum in instrumental conditioning
- O'Doherty J., Dayan P., Schultz J., Deichmann R., Friston K., and Dolan R.J. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304 (2004) 452-454
- (2004) Science , vol.304 , pp. 452-454
- O'Doherty, J.¹ Dayan, P.² Schultz, J.³ Deichmann, R.⁴ Friston, K.⁵ Dolan, R.J.⁶

31
- 33644806981
- Human neural learning depends on reward prediction errors in the blocking paradigm
- Tobler P.N., O'Doherty J.P., Dolan R.J., and Schultz W. Human neural learning depends on reward prediction errors in the blocking paradigm. J Neurophysiol 95 (2006) 301-310
- (2006) J Neurophysiol , vol.95 , pp. 301-310
- Tobler, P.N.¹ O'Doherty, J.P.² Dolan, R.J.³ Schultz, W.⁴

32
- 1242319297
- A neural correlate of reward-based behavioral learning in caudate nucleus: a functional magnetic resonance imaging study of a stochastic decision task
- Haruno M., Kuroda T., Doya K., Toyama K., Kimura M., Samejima K., Imamizu H., and Kawato M. A neural correlate of reward-based behavioral learning in caudate nucleus: a functional magnetic resonance imaging study of a stochastic decision task. J Neurosci 24 (2004) 1660-1665
- (2004) J Neurosci , vol.24 , pp. 1660-1665
- Haruno, M.¹ Kuroda, T.² Doya, K.³ Toyama, K.⁴ Kimura, M.⁵ Samejima, K.⁶ Imamizu, H.⁷ Kawato, M.⁸

33
- 3343026029
- Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops
- Tanaka S.C., Doya K., Okada G., Ueda K., Okamoto Y., and Yamawaki S. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat Neurosci 7 (2004) 887-893
- (2004) Nat Neurosci , vol.7 , pp. 887-893
- Tanaka, S.C.¹ Doya, K.² Okada, G.³ Ueda, K.⁴ Okamoto, Y.⁵ Yamawaki, S.⁶

34
- 33749061026
- Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics
- Tanaka S.C., Samejima K., Okada G., Ueda K., Okamoto Y., Yamawaki S., and Doya K. Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics. Neural Netw 19 (2006) 1233-1241
- (2006) Neural Netw , vol.19 , pp. 1233-1241
- Tanaka, S.C.¹ Samejima, K.² Okada, G.³ Ueda, K.⁴ Okamoto, Y.⁵ Yamawaki, S.⁶ Doya, K.⁷

35
- 33746711623
- Neural differentiation of expected reward and risk in human subcortical structures
- Preuschoff K., Bossaerts P., and Quartz S.R. Neural differentiation of expected reward and risk in human subcortical structures. Neuron 51 (2006) 381-390
- (2006) Neuron , vol.51 , pp. 381-390
- Preuschoff, K.¹ Bossaerts, P.² Quartz, S.R.³

36
- 28844452879
- Neural systems responding to degrees of uncertainty in human decision-making
- Hsu M., Bhatt M., Adolphs R., Tranel D., and Camerer C.F. Neural systems responding to degrees of uncertainty in human decision-making. Science 310 (2005) 1680-1683
- (2005) Science , vol.310 , pp. 1680-1683
- Hsu, M.¹ Bhatt, M.² Adolphs, R.³ Tranel, D.⁴ Camerer, C.F.⁵

37
- 33644858743
- Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning
- This study is an example of recent research trends of computational-model-based neuroimaging. Subjects' choice behavior was modeled by a simple reinforcement learning algorithm (Q-learning) while the model was given the same sensory stimuli and received the same rewards. The model reproduced subjects' behavior reasonably well. Internal representations within the model, such as the reward expectation and the reward expectation error, were found to correlate differentially with activities in the putamen and caudate, respectively.
- Haruno M., and Kawato M. Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning. J Neurophysiol 95 (2006) 948-959. This study is an example of recent research trends of computational-model-based neuroimaging. Subjects' choice behavior was modeled by a simple reinforcement learning algorithm (Q-learning) while the model was given the same sensory stimuli and received the same rewards. The model reproduced subjects' behavior reasonably well. Internal representations within the model, such as the reward expectation and the reward expectation error, were found to correlate differentially with activities in the putamen and caudate, respectively.
- (2006) J Neurophysiol , vol.95 , pp. 948-959
- Haruno, M.¹ Kawato, M.²

38
- 33748188120
- The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans
- Hampton A.N., Bossaerts P., and O'Doherty J.P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J Neurosci 26 (2006) 8360-8367
- (2006) J Neurosci , vol.26 , pp. 8360-8367
- Hampton, A.N.¹ Bossaerts, P.² O'Doherty, J.P.³

39
- 0037258402
- Meta-learning in reinforcement learning
- Schweighofer N., and Doya K. Meta-learning in reinforcement learning. Neural Netw 16 (2003) 5-9
- (2003) Neural Netw , vol.16 , pp. 5-9
- Schweighofer, N.¹ Doya, K.²

40
- 0036592023
- Metalearning and neuromodulation
- Doya K. Metalearning and neuromodulation. Neural Netw 15 (2002) 495-506
- (2002) Neural Netw , vol.15 , pp. 495-506
- Doya, K.¹

41
- 33745565701
- Optimal decision making and the anterior cingulate cortex
- This study shows that the anterior cingulate cortex has an important role in utilizing integrated past history of action and reward experiences for reward-based decision making. The authors demonstrate that lesion of the monkey anterior cingulate cortex did not impair performance just after error trials, but it rendered the monkeys unable to maintain optimal choice. The authors also found that in a matching task, monkeys that had lesions of the anterior cingulate cortex took longer to attain the optimum choice ratio.
- Kennerley S.W., Walton M.E., Behrens T.E., Buckley M.J., and Rushworth M.F. Optimal decision making and the anterior cingulate cortex. Nat Neurosci 9 (2006) 940-947. This study shows that the anterior cingulate cortex has an important role in utilizing integrated past history of action and reward experiences for reward-based decision making. The authors demonstrate that lesion of the monkey anterior cingulate cortex did not impair performance just after error trials, but it rendered the monkeys unable to maintain optimal choice. The authors also found that in a matching task, monkeys that had lesions of the anterior cingulate cortex took longer to attain the optimum choice ratio.
- (2006) Nat Neurosci , vol.9 , pp. 940-947
- Kennerley, S.W.¹ Walton, M.E.² Behrens, T.E.³ Buckley, M.J.⁴ Rushworth, M.F.⁵

42
- 0036592028
- Control of exploitation-exploration meta-parameter in reinforcement learning
- Ishii S., Yoshida W., and Yoshimoto J. Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw 15 (2002) 665-687
- (2002) Neural Netw , vol.15 , pp. 665-687
- Ishii, S.¹ Yoshida, W.² Yoshimoto, J.³

43
- 33745743341
- Neural representation of information measure in the primate premotor cortex
- Nakamura K. Neural representation of information measure in the primate premotor cortex. J Neurophysiol 96 (2006) 478-485
- (2006) J Neurophysiol , vol.96 , pp. 478-485
- Nakamura, K.¹

44
- 33646853495
- Resolution of uncertainty in prefrontal cortex
- In many situations, we face difficulty finding ascertaining where we are from a limited set of available sensory inputs. The problem of finding an optimal strategy with only partial information of a current state is called a 'partially observable Markov decision process'. The authors modeled how humans resolve this problem in a task of pathfinding through a maze of which there was only a limited view. The anterior prefrontal BOLD signal was found to correlate with the uncertainty of a current belief state.
- Yoshida W., and Ishii S. Resolution of uncertainty in prefrontal cortex. Neuron 50 (2006) 781-789. In many situations, we face difficulty finding ascertaining where we are from a limited set of available sensory inputs. The problem of finding an optimal strategy with only partial information of a current state is called a 'partially observable Markov decision process'. The authors modeled how humans resolve this problem in a task of pathfinding through a maze of which there was only a limited view. The anterior prefrontal BOLD signal was found to correlate with the uncertainty of a current belief state.
- (2006) Neuron , vol.50 , pp. 781-789
- Yoshida, W.¹ Ishii, S.²

45
- 33745223257
- Cortical substrates for exploratory decisions in humans
- Daw N.D., O'Doherty J.P., Dayan P., Seymour B., and Dolan R.J. Cortical substrates for exploratory decisions in humans. Nature 441 (2006) 876-879
- (2006) Nature , vol.441 , pp. 876-879
- Daw, N.D.¹ O'Doherty, J.P.² Dayan, P.³ Seymour, B.⁴ Dolan, R.J.⁵

46
- 28044450875
- Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control
- The authors propose an interesting computational model consisting of two parallel reinforcement learning modules that could be implemented in prefrontal and subcortical basal ganglia circuits. The prefrontal circuit implements 'model-based reinforcement learning', whereas the corticobasal ganglia circuit including dorsolateral striatum implements 'model-free reinforcement learning'. The former achieves goal-directed behaviors using a tree-search algorithm by simulating the action and state transitions even before actual execution of the action. By contrast, the latter maintains a 'cash' of action value for each state, and it is updated by feedback of reward, step by step through actual action execution.
- Daw N.D., Niv Y., and Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8 (2005) 1704-1711. The authors propose an interesting computational model consisting of two parallel reinforcement learning modules that could be implemented in prefrontal and subcortical basal ganglia circuits. The prefrontal circuit implements 'model-based reinforcement learning', whereas the corticobasal ganglia circuit including dorsolateral striatum implements 'model-free reinforcement learning'. The former achieves goal-directed behaviors using a tree-search algorithm by simulating the action and state transitions even before actual execution of the action. By contrast, the latter maintains a 'cash' of action value for each state, and it is updated by feedback of reward, step by step through actual action execution.
- (2005) Nat Neurosci , vol.8 , pp. 1704-1711
- Daw, N.D.¹ Niv, Y.² Dayan, P.³

47
- 10344225664
- Addiction as a computational process gone awry
- Redish A.D. Addiction as a computational process gone awry. Science 306 (2004) 1944-1947
- (2004) Science , vol.306 , pp. 1944-1947
- Redish, A.D.¹

48
- 32044452698
- Orbitofrontal cortex, decision-making and drug addiction
- Schoenbaum G., Roesch M.R., and Stalnaker T.A. Orbitofrontal cortex, decision-making and drug addiction. Trends Neurosci 29 (2006) 116-124
- (2006) Trends Neurosci , vol.29 , pp. 116-124
- Schoenbaum, G.¹ Roesch, M.R.² Stalnaker, T.A.³

49
- 33746898593
- Encoding of time-discounted rewards in orbitofrontal cortex is independent of value representation
- Roesch M.R., Taylor A.R., and Schoenbaum G. Encoding of time-discounted rewards in orbitofrontal cortex is independent of value representation. Neuron 51 (2006) 509-520
- (2006) Neuron , vol.51 , pp. 509-520
- Roesch, M.R.¹ Taylor, A.R.² Schoenbaum, G.³

50
- 33746920148
- Separate neural pathways process different decision costs
- This rat lesion study shows that two different types of decision costs - the effort cost and the temporal cost in waiting reward - are represented by different prefrontal circuits - the anterior cingulate cortex and orbitofrontal cortex, respectively. A T-maze with an obstacle in the goal arm and temporal delay for reward delivery were used to examine the effort and temporal costs, respectively. Anterior cingulate cortex lesions affected how much effort rats decided to invest for reward. Orbitofrontal cortex lesions affected how long rats decided to wait for reward. This study is very important in understanding how decision is made depending not only on reward but also on multiple types of cost.
- Rudebeck P.H., Walton M.E., Smyth A.N., Bannerman D.M., and Rushworth M.F. Separate neural pathways process different decision costs. Nat Neurosci 9 (2006) 1161-1168. This rat lesion study shows that two different types of decision costs - the effort cost and the temporal cost in waiting reward - are represented by different prefrontal circuits - the anterior cingulate cortex and orbitofrontal cortex, respectively. A T-maze with an obstacle in the goal arm and temporal delay for reward delivery were used to examine the effort and temporal costs, respectively. Anterior cingulate cortex lesions affected how much effort rats decided to invest for reward. Orbitofrontal cortex lesions affected how long rats decided to wait for reward. This study is very important in understanding how decision is made depending not only on reward but also on multiple types of cost.
- (2006) Nat Neurosci , vol.9 , pp. 1161-1168
- Rudebeck, P.H.¹ Walton, M.E.² Smyth, A.N.³ Bannerman, D.M.⁴ Rushworth, M.F.⁵

51
- 9244231144
- Reinforcement learning and decision making in monkeys during a competitive game
- Lee D., Conroy M.L., McGreevy B.P., and Barraclough D.J. Reinforcement learning and decision making in monkeys during a competitive game. Brain Res Cogn Brain Res 22 (2004) 45-58
- (2004) Brain Res Cogn Brain Res , vol.22 , pp. 45-58
- Lee, D.¹ Conroy, M.L.² McGreevy, B.P.³ Barraclough, D.J.⁴

52
- 33748999594
- Neural mechanism for stochastic behaviour during a competitive game
- Soltani A., Lee D., and Wang X.J. Neural mechanism for stochastic behaviour during a competitive game. Neural Netw 19 (2006) 1075-1090
- (2006) Neural Netw , vol.19 , pp. 1075-1090
- Soltani, A.¹ Lee, D.² Wang, X.J.³

53
- 5144223501
- Activity in posterior parietal cortex is correlated with the relative subjective desirability of action
- Dorris M.C., and Glimcher P.W. Activity in posterior parietal cortex is correlated with the relative subjective desirability of action. Neuron 44 (2004) 365-378
- (2004) Neuron , vol.44 , pp. 365-378
- Dorris, M.C.¹ Glimcher, P.W.²

54
- 33646566317
- Neurons in the orbitofrontal cortex encode economic value
- Padoa-Schioppa C., and Assad J.A. Neurons in the orbitofrontal cortex encode economic value. Nature 441 (2006) 223-226
- (2006) Nature , vol.441 , pp. 223-226
- Padoa-Schioppa, C.¹ Assad, J.A.²

55
- 0001027894
- Transfer of learning by composing solutions of elemental sequential tasks
- Singh S. Transfer of learning by composing solutions of elemental sequential tasks. Mach Learn 8 (1992) 323-339
- (1992) Mach Learn , vol.8 , pp. 323-339
- Singh, S.¹

56
- 0031215211
- HQ-learning
- Wiering M., and Schmidhuber J. HQ-learning. Adapt Behav 6 (1997) 219-246
- (1997) Adapt Behav , vol.6 , pp. 219-246
- Wiering, M.¹ Schmidhuber, J.²

57
- 84898956770
- Reinforcement learning with hierarchies of machines
- MIT press
- Parr R., and Russell S. Reinforcement learning with hierarchies of machines. Advances in Neural Information Processing Systems vol 10 (1997), MIT press 1043-1049
- (1997) Advances in Neural Information Processing Systems , vol.10 , pp. 1043-1049
- Parr, R.¹ Russell, S.²

58
- 85152618928
- Planning by incremental dynamic programming
- Morgan Kaufmann
- Sutton R.S. Planning by incremental dynamic programming. Eighteenth International Workshop on Machine Learning; San Mateo, CA (1991), Morgan Kaufmann 353-357
- (1991) Eighteenth International Workshop on Machine Learning; San Mateo, CA , pp. 353-357
- Sutton, R.S.¹

59
- 0033170372
- Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
- Sutton R.S., Precup D., and Singh S. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif Intell 112 (1999) 181-211
- (1999) Artif Intell , vol.112 , pp. 181-211
- Sutton, R.S.¹ Precup, D.² Singh, S.³

60
- 0033280847
- Hierarchical reinforcement learning for motion learning: learning 'stand up' trajectories
- Morimoto J., and Doya K. Hierarchical reinforcement learning for motion learning: learning 'stand up' trajectories. Adv Robot 13 (1999) 267-268
- (1999) Adv Robot , vol.13 , pp. 267-268
- Morimoto, J.¹ Doya, K.²

61
- 0742324926
- Inter-module credit assignment in modular reinforcement learning
- Samejima K., Doya K., and Kawato M. Inter-module credit assignment in modular reinforcement learning. Neural Netw 16 (2003) 985-994
- (2003) Neural Netw , vol.16 , pp. 985-994
- Samejima, K.¹ Doya, K.² Kawato, M.³

62
- 0032192424
- Multiple paired forward and inverse models for motor control
- Wolpert D.M., and Kawato M. Multiple paired forward and inverse models for motor control. Neural Netw 11 (1998) 1317-1329
- (1998) Neural Netw , vol.11 , pp. 1317-1329
- Wolpert, D.M.¹ Kawato, M.²

63
- 0032787485
- Internal models for motor control and trajectory planning
- Kawato M. Internal models for motor control and trajectory planning. Curr Opin Neurobiol 9 (1999) 718-727
- (1999) Curr Opin Neurobiol , vol.9 , pp. 718-727
- Kawato, M.¹

64
- 0035487297
- Mosaic model for sensorimotor learning and control
- Haruno M., Wolpert D.M., and Kawato M. Mosaic model for sensorimotor learning and control. Neural Comput 13 (2001) 2201-2220
- (2001) Neural Comput , vol.13 , pp. 2201-2220
- Haruno, M.¹ Wolpert, D.M.² Kawato, M.³

65
- 0036618011
- Multiple model-based reinforcement learning
- Doya K., Samejima K., Katagiri K., and Kawato M. Multiple model-based reinforcement learning. Neural Comput 14 (2002) 1347-1369
- (2002) Neural Comput , vol.14 , pp. 1347-1369
- Doya, K.¹ Samejima, K.² Katagiri, K.³ Kawato, M.⁴

66
- 4544350592
- Pedunculopontine nucleus and basal ganglia: distant relatives or part of the same family?
- Mena-Segovia J., Bolam J.P., and Magill P.J. Pedunculopontine nucleus and basal ganglia: distant relatives or part of the same family?. Trends Neurosci 27 (2004) 585-588
- (2004) Trends Neurosci , vol.27 , pp. 585-588
- Mena-Segovia, J.¹ Bolam, J.P.² Magill, P.J.³

67
- 34147121799
- Kobayashi Y, Okada K, Inoue Y. Reward predicting activity of pedunculopontine tegmental nucleus neurons during visually guided saccade tasks. In 2002 Abstract Viewer and Itinerary Planner Online (http://sfn.scholarone.com/). Society for Neuroscience; 2002: Program No. 890.5.

68
- 33749080272
- Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning
- In this study, a heterarchical reinforcement learning model is proposed and supporting fMRI data are presented. The interplay between the model and the experiments could resolve theoretical difficulties of the plain reinforcement learning algorithm.
- Haruno M., and Kawato M. Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning. Neural Netw 19 (2006) 1242-1254. In this study, a heterarchical reinforcement learning model is proposed and supporting fMRI data are presented. The interplay between the model and the experiments could resolve theoretical difficulties of the plain reinforcement learning algorithm.
- (2006) Neural Netw , vol.19 , pp. 1242-1254
- Haruno, M.¹ Kawato, M.²

69
- 0347086138
- The primate basal ganglia: parallel and integrative networks
- Haber S.N. The primate basal ganglia: parallel and integrative networks. J Chem Neuroanat 26 (2003) 317-330
- (2003) J Chem Neuroanat , vol.26 , pp. 317-330
- Haber, S.N.¹

70
- 0036333980
- Contribution of pedunculopontine tegmental nucleus neurons to performance of visually guided saccade tasks in monkeys
- Kobayashi Y., Inoue Y., Yamamoto M., Isa T., and Aizawa H. Contribution of pedunculopontine tegmental nucleus neurons to performance of visually guided saccade tasks in monkeys. J Neurophysiol 88 (2002) 715-731
- (2002) J Neurophysiol , vol.88 , pp. 715-731
- Kobayashi, Y.¹ Inoue, Y.² Yamamoto, M.³ Isa, T.⁴ Aizawa, H.⁵

71
- 0141596576
- Policy invariance under reward transformations: theory and application to reward shaping
- Morgan Kaufmann
- Ng A.Y., Harada D., and Russell S. Policy invariance under reward transformations: theory and application to reward shaping. Proceedings of the Sixteenth International Conference on Machine Learning: 1999 (1999), Morgan Kaufmann 278-287
- (1999) Proceedings of the Sixteenth International Conference on Machine Learning: 1999 , pp. 278-287
- Ng, A.Y.¹ Harada, D.² Russell, S.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.