메뉴 건너뛰기




Volumn 17, Issue 2, 2007, Pages 205-212

Efficient reinforcement learning: computational theories, neuroscience and robotics

Author keywords

[No Author keywords available]

Indexed keywords

BASAL GANGLION; BRAIN CORTEX; BRAIN REGION; BRAIN STEM; CEREBELLUM; CONCEPTUAL FRAMEWORK; EXPERIMENTAL STUDY; HUMAN; LEARNING; LEARNING THEORY; MATHEMATICAL COMPUTING; NEUROSCIENCE; NONHUMAN; PRIORITY JOURNAL; PSYCHOLOGICAL THEORY; REINFORCEMENT; REVIEW; ROBOTICS;

EID: 34147191094     PISSN: 09594388     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.conb.2007.03.004     Document Type: Review
Times cited : (62)

References (71)
  • 1
    • 0020970738 scopus 로고
    • Neuron-like elements that can solve difficult learning control problems
    • Barto A.G., Sutton R.S., and Anderson C.W. Neuron-like elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13 (1983) 835-846
    • (1983) IEEE Trans Syst Man Cybern , vol.13 , pp. 835-846
    • Barto, A.G.1    Sutton, R.S.2    Anderson, C.W.3
  • 3
    • 0030896968 scopus 로고    scopus 로고
    • A neural substrate of prediction and reward
    • Schultz W., Dayan P., and Montague P.R. A neural substrate of prediction and reward. Science 275 (1997) 1593-1599
    • (1997) Science , vol.275 , pp. 1593-1599
    • Schultz, W.1    Dayan, P.2    Montague, P.R.3
  • 4
    • 0034078011 scopus 로고    scopus 로고
    • Neuronal coding of prediction errors
    • Schultz W., and Dickinson A. Neuronal coding of prediction errors. Annu Rev Neurosci 23 (2000) 473-500
    • (2000) Annu Rev Neurosci , vol.23 , pp. 473-500
    • Schultz, W.1    Dickinson, A.2
  • 5
    • 34147137418 scopus 로고    scopus 로고
    • Houk JC, Adams JL, Barto AG. Models of information processing in the basal ganglia. Edited by Houk JC, Davis JL, Beiser DG. The MIT Press; 1995.
  • 6
    • 0034524427 scopus 로고    scopus 로고
    • Complementary roles of basal ganglia and cerebellum in learning and motor control
    • Doya K. Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr Opin Neurobiol 10 (2000) 732-739
    • (2000) Curr Opin Neurobiol , vol.10 , pp. 732-739
    • Doya, K.1
  • 7
    • 0242440823 scopus 로고    scopus 로고
    • Correlated coding of motivation and outcome of decision by dopamine neurons
    • Satoh T., Nakai S., Sato T., and Kimura M. Correlated coding of motivation and outcome of decision by dopamine neurons. J Neurosci 23 (2003) 9913-9923
    • (2003) J Neurosci , vol.23 , pp. 9913-9923
    • Satoh, T.1    Nakai, S.2    Sato, T.3    Kimura, M.4
  • 8
    • 4644290200 scopus 로고    scopus 로고
    • A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping
    • Takikawa Y., Kawagoe R., and Hikosaka O. A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. J Neurophysiol 92 (2004) 2520-2529
    • (2004) J Neurophysiol , vol.92 , pp. 2520-2529
    • Takikawa, Y.1    Kawagoe, R.2    Hikosaka, O.3
  • 9
    • 0037459319 scopus 로고    scopus 로고
    • Discrete coding of reward probability and uncertainty by dopamine neurons
    • Fiorillo C.D., Tobler P.N., and Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299 (2003) 1898-1902
    • (2003) Science , vol.299 , pp. 1898-1902
    • Fiorillo, C.D.1    Tobler, P.N.2    Schultz, W.3
  • 10
    • 14844349975 scopus 로고    scopus 로고
    • Adaptive coding of reward value by dopamine neurons
    • When a monkey receives unexpected reward, phasic change of activity of dopaminergic neurons can be explained as encoding the temporal-difference error. These authors added new findings regarding the scaling of this coding: this error signal is scaled by variance of reward distribution within a given context. Thus, the activity of dopaminergic neurons is scaled by the range of reward distribution, and could represent expected 'risk' of reward in the context.
    • Tobler P.N., Fiorillo C.D., and Schultz W. Adaptive coding of reward value by dopamine neurons. Science 307 (2005) 1642-1645. When a monkey receives unexpected reward, phasic change of activity of dopaminergic neurons can be explained as encoding the temporal-difference error. These authors added new findings regarding the scaling of this coding: this error signal is scaled by variance of reward distribution within a given context. Thus, the activity of dopaminergic neurons is scaled by the range of reward distribution, and could represent expected 'risk' of reward in the context.
    • (2005) Science , vol.307 , pp. 1642-1645
    • Tobler, P.N.1    Fiorillo, C.D.2    Schultz, W.3
  • 11
    • 33747585633 scopus 로고    scopus 로고
    • Midbrain dopamine neurons encode decisions for future action
    • Morris G., Nevet A., Arkadir D., Vaadia E., and Bergman H. Midbrain dopamine neurons encode decisions for future action. Nat Neurosci 9 (2006) 1057-1063
    • (2006) Nat Neurosci , vol.9 , pp. 1057-1063
    • Morris, G.1    Nevet, A.2    Arkadir, D.3    Vaadia, E.4    Bergman, H.5
  • 13
    • 0842349509 scopus 로고    scopus 로고
    • Reward-predicting activity of dopamine and caudate neurons - a possible mechanism of motivational control of saccadic eye movement
    • Kawagoe R., Takikawa Y., and Hikosaka O. Reward-predicting activity of dopamine and caudate neurons - a possible mechanism of motivational control of saccadic eye movement. J Neurophysiol 91 (2004) 1013-1024
    • (2004) J Neurophysiol , vol.91 , pp. 1013-1024
    • Kawagoe, R.1    Takikawa, Y.2    Hikosaka, O.3
  • 14
    • 33744971409 scopus 로고    scopus 로고
    • Role of dopamine in the primate caudate nucleus in reward modulation of saccades
    • 2 receptors enhanced it. The dopamine-dependent plasticity in corticostriatal synapses could modulate sensorimotor learning dependent on reward.
    • 2 receptors enhanced it. The dopamine-dependent plasticity in corticostriatal synapses could modulate sensorimotor learning dependent on reward.
    • (2006) J Neurosci , vol.26 , pp. 5360-5369
    • Nakamura, K.1    Hikosaka, O.2
  • 15
    • 18844366638 scopus 로고    scopus 로고
    • Relative reward processing in primate striatum
    • Cromwell H.C., Hassani O.K., and Schultz W. Relative reward processing in primate striatum. Exp. Brain Res 162 (2005) 520-525
    • (2005) Exp. Brain Res , vol.162 , pp. 520-525
    • Cromwell, H.C.1    Hassani, O.K.2    Schultz, W.3
  • 16
    • 28144449057 scopus 로고    scopus 로고
    • Representation of action-specific reward values in the striatum
    • This study demonstrates that a considerable proportion of dorsal striatal neural activity represents the 'action value' predicted by Doya [6]. The authors then directly compared striatal neuronal activities with dynamically changing model parameters in a trial-by-trial manner, using a sophisticated estimation method, and found a good fit.
    • Samejima K., Ueda Y., Doya K., and Kimura M. Representation of action-specific reward values in the striatum. Science 310 (2005) 1337-1340. This study demonstrates that a considerable proportion of dorsal striatal neural activity represents the 'action value' predicted by Doya [6]. The authors then directly compared striatal neuronal activities with dynamically changing model parameters in a trial-by-trial manner, using a sophisticated estimation method, and found a good fit.
    • (2005) Science , vol.310 , pp. 1337-1340
    • Samejima, K.1    Ueda, Y.2    Doya, K.3    Kimura, M.4
  • 17
    • 3242673464 scopus 로고    scopus 로고
    • Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons
    • Morris G., Arkadir D., Nevet A., Vaadia E., and Bergman H. Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43 (2004) 133-143
    • (2004) Neuron , vol.43 , pp. 133-143
    • Morris, G.1    Arkadir, D.2    Nevet, A.3    Vaadia, E.4    Bergman, H.5
  • 18
    • 33645363841 scopus 로고    scopus 로고
    • Eskandar EN: Selective enhancement of associative learning by microstimulation of the anterior caudate
    • This paper provides direct evidence that the striatum contributes to stimulus-action-reward association learning. The authors recorded caudate neuronal discharge that is temporarily reinforced during learning. They also applied electrical microstimulation to the caudate nucleus while the monkey learned arbitrary association between a visual stimulus and movement guided by reward feedback, and the microstimulation selectively enhanced the association learning.
    • Williams Z.M. Eskandar EN: Selective enhancement of associative learning by microstimulation of the anterior caudate. Nat Neurosci 9 (2006) 562-568. This paper provides direct evidence that the striatum contributes to stimulus-action-reward association learning. The authors recorded caudate neuronal discharge that is temporarily reinforced during learning. They also applied electrical microstimulation to the caudate nucleus while the monkey learned arbitrary association between a visual stimulus and movement guided by reward feedback, and the microstimulation selectively enhanced the association learning.
    • (2006) Nat Neurosci , vol.9 , pp. 562-568
    • Williams, Z.M.1
  • 19
    • 14544291512 scopus 로고    scopus 로고
    • Different time courses of learning-related activity in the prefrontal cortex and striatum
    • Pasupathy A., and Miller E.K. Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433 (2005) 873-876
    • (2005) Nature , vol.433 , pp. 873-876
    • Pasupathy, A.1    Miller, E.K.2
  • 20
    • 0038491288 scopus 로고    scopus 로고
    • Neuronal correlates of goal-based motor selection in the prefrontal cortex
    • Matsumoto K., Suzuki W., and Tanaka K. Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science 301 (2003) 229-232
    • (2003) Science , vol.301 , pp. 229-232
    • Matsumoto, K.1    Suzuki, W.2    Tanaka, K.3
  • 21
    • 1842612383 scopus 로고    scopus 로고
    • Prefrontal cortex and decision making in a mixed-strategy game
    • Barraclough D.J., Conroy M.L., and Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nat Neurosci 7 (2004) 404-410
    • (2004) Nat Neurosci , vol.7 , pp. 404-410
    • Barraclough, D.J.1    Conroy, M.L.2    Lee, D.3
  • 22
    • 2942726234 scopus 로고    scopus 로고
    • Matching behavior and the representation of value in the parietal cortex
    • Sugrue L.P., Corrado G.S., and Newsome W.T. Matching behavior and the representation of value in the parietal cortex. Science 304 (2004) 1782-1787
    • (2004) Science , vol.304 , pp. 1782-1787
    • Sugrue, L.P.1    Corrado, G.S.2    Newsome, W.T.3
  • 23
    • 17844396920 scopus 로고    scopus 로고
    • Choosing the greater of two goods: neural currencies for valuation and decision making
    • Sugrue L.P., Corrado G.S., and Newsome W.T. Choosing the greater of two goods: neural currencies for valuation and decision making. Nat Rev Neurosci 6 (2005) 363-375
    • (2005) Nat Rev Neurosci , vol.6 , pp. 363-375
    • Sugrue, L.P.1    Corrado, G.S.2    Newsome, W.T.3
  • 24
    • 33644782012 scopus 로고    scopus 로고
    • Dynamic response-by-response models of matching behavior in rhesus monkeys
    • Lau B., and Glimcher P.W. Dynamic response-by-response models of matching behavior in rhesus monkeys. J Exp Anal Behav 84 (2005) 555-579
    • (2005) J Exp Anal Behav , vol.84 , pp. 555-579
    • Lau, B.1    Glimcher, P.W.2
  • 26
    • 33749050766 scopus 로고    scopus 로고
    • Computational algorithms and neuronal network models underlying decision processes
    • This review provides computational explanations of perceptual decision and matching behavior, which have been studied in the field of behavioral psychology, and their relationship to optimal learning theories and algorithms of reinforcement learning.
    • Sakai Y., Okamoto H., and Fukai T. Computational algorithms and neuronal network models underlying decision processes. Neural Netw 19 (2006) 1091-1105. This review provides computational explanations of perceptual decision and matching behavior, which have been studied in the field of behavioral psychology, and their relationship to optimal learning theories and algorithms of reinforcement learning.
    • (2006) Neural Netw , vol.19 , pp. 1091-1105
    • Sakai, Y.1    Okamoto, H.2    Fukai, T.3
  • 28
    • 33748302924 scopus 로고    scopus 로고
    • Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans
    • Using combinations of pharmacological, imaging and computational modeling techniques, the authors directly demonstrate that fMRI signals in the striatum and orbitofrontal cortex are modulated by dopamine.
    • Pessiglione M., Seymour B., Flandin G., Dolan R.J., and Frith C.D. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442 (2006) 1042-1045. Using combinations of pharmacological, imaging and computational modeling techniques, the authors directly demonstrate that fMRI signals in the striatum and orbitofrontal cortex are modulated by dopamine.
    • (2006) Nature , vol.442 , pp. 1042-1045
    • Pessiglione, M.1    Seymour, B.2    Flandin, G.3    Dolan, R.J.4    Frith, C.D.5
  • 30
    • 1942520195 scopus 로고    scopus 로고
    • Dissociable roles of ventral and dorsal striatum in instrumental conditioning
    • O'Doherty J., Dayan P., Schultz J., Deichmann R., Friston K., and Dolan R.J. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304 (2004) 452-454
    • (2004) Science , vol.304 , pp. 452-454
    • O'Doherty, J.1    Dayan, P.2    Schultz, J.3    Deichmann, R.4    Friston, K.5    Dolan, R.J.6
  • 31
    • 33644806981 scopus 로고    scopus 로고
    • Human neural learning depends on reward prediction errors in the blocking paradigm
    • Tobler P.N., O'Doherty J.P., Dolan R.J., and Schultz W. Human neural learning depends on reward prediction errors in the blocking paradigm. J Neurophysiol 95 (2006) 301-310
    • (2006) J Neurophysiol , vol.95 , pp. 301-310
    • Tobler, P.N.1    O'Doherty, J.P.2    Dolan, R.J.3    Schultz, W.4
  • 32
    • 1242319297 scopus 로고    scopus 로고
    • A neural correlate of reward-based behavioral learning in caudate nucleus: a functional magnetic resonance imaging study of a stochastic decision task
    • Haruno M., Kuroda T., Doya K., Toyama K., Kimura M., Samejima K., Imamizu H., and Kawato M. A neural correlate of reward-based behavioral learning in caudate nucleus: a functional magnetic resonance imaging study of a stochastic decision task. J Neurosci 24 (2004) 1660-1665
    • (2004) J Neurosci , vol.24 , pp. 1660-1665
    • Haruno, M.1    Kuroda, T.2    Doya, K.3    Toyama, K.4    Kimura, M.5    Samejima, K.6    Imamizu, H.7    Kawato, M.8
  • 33
    • 3343026029 scopus 로고    scopus 로고
    • Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops
    • Tanaka S.C., Doya K., Okada G., Ueda K., Okamoto Y., and Yamawaki S. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat Neurosci 7 (2004) 887-893
    • (2004) Nat Neurosci , vol.7 , pp. 887-893
    • Tanaka, S.C.1    Doya, K.2    Okada, G.3    Ueda, K.4    Okamoto, Y.5    Yamawaki, S.6
  • 34
    • 33749061026 scopus 로고    scopus 로고
    • Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics
    • Tanaka S.C., Samejima K., Okada G., Ueda K., Okamoto Y., Yamawaki S., and Doya K. Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics. Neural Netw 19 (2006) 1233-1241
    • (2006) Neural Netw , vol.19 , pp. 1233-1241
    • Tanaka, S.C.1    Samejima, K.2    Okada, G.3    Ueda, K.4    Okamoto, Y.5    Yamawaki, S.6    Doya, K.7
  • 35
    • 33746711623 scopus 로고    scopus 로고
    • Neural differentiation of expected reward and risk in human subcortical structures
    • Preuschoff K., Bossaerts P., and Quartz S.R. Neural differentiation of expected reward and risk in human subcortical structures. Neuron 51 (2006) 381-390
    • (2006) Neuron , vol.51 , pp. 381-390
    • Preuschoff, K.1    Bossaerts, P.2    Quartz, S.R.3
  • 36
    • 28844452879 scopus 로고    scopus 로고
    • Neural systems responding to degrees of uncertainty in human decision-making
    • Hsu M., Bhatt M., Adolphs R., Tranel D., and Camerer C.F. Neural systems responding to degrees of uncertainty in human decision-making. Science 310 (2005) 1680-1683
    • (2005) Science , vol.310 , pp. 1680-1683
    • Hsu, M.1    Bhatt, M.2    Adolphs, R.3    Tranel, D.4    Camerer, C.F.5
  • 37
    • 33644858743 scopus 로고    scopus 로고
    • Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning
    • This study is an example of recent research trends of computational-model-based neuroimaging. Subjects' choice behavior was modeled by a simple reinforcement learning algorithm (Q-learning) while the model was given the same sensory stimuli and received the same rewards. The model reproduced subjects' behavior reasonably well. Internal representations within the model, such as the reward expectation and the reward expectation error, were found to correlate differentially with activities in the putamen and caudate, respectively.
    • Haruno M., and Kawato M. Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning. J Neurophysiol 95 (2006) 948-959. This study is an example of recent research trends of computational-model-based neuroimaging. Subjects' choice behavior was modeled by a simple reinforcement learning algorithm (Q-learning) while the model was given the same sensory stimuli and received the same rewards. The model reproduced subjects' behavior reasonably well. Internal representations within the model, such as the reward expectation and the reward expectation error, were found to correlate differentially with activities in the putamen and caudate, respectively.
    • (2006) J Neurophysiol , vol.95 , pp. 948-959
    • Haruno, M.1    Kawato, M.2
  • 38
    • 33748188120 scopus 로고    scopus 로고
    • The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans
    • Hampton A.N., Bossaerts P., and O'Doherty J.P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J Neurosci 26 (2006) 8360-8367
    • (2006) J Neurosci , vol.26 , pp. 8360-8367
    • Hampton, A.N.1    Bossaerts, P.2    O'Doherty, J.P.3
  • 39
    • 0037258402 scopus 로고    scopus 로고
    • Meta-learning in reinforcement learning
    • Schweighofer N., and Doya K. Meta-learning in reinforcement learning. Neural Netw 16 (2003) 5-9
    • (2003) Neural Netw , vol.16 , pp. 5-9
    • Schweighofer, N.1    Doya, K.2
  • 40
    • 0036592023 scopus 로고    scopus 로고
    • Metalearning and neuromodulation
    • Doya K. Metalearning and neuromodulation. Neural Netw 15 (2002) 495-506
    • (2002) Neural Netw , vol.15 , pp. 495-506
    • Doya, K.1
  • 41
    • 33745565701 scopus 로고    scopus 로고
    • Optimal decision making and the anterior cingulate cortex
    • This study shows that the anterior cingulate cortex has an important role in utilizing integrated past history of action and reward experiences for reward-based decision making. The authors demonstrate that lesion of the monkey anterior cingulate cortex did not impair performance just after error trials, but it rendered the monkeys unable to maintain optimal choice. The authors also found that in a matching task, monkeys that had lesions of the anterior cingulate cortex took longer to attain the optimum choice ratio.
    • Kennerley S.W., Walton M.E., Behrens T.E., Buckley M.J., and Rushworth M.F. Optimal decision making and the anterior cingulate cortex. Nat Neurosci 9 (2006) 940-947. This study shows that the anterior cingulate cortex has an important role in utilizing integrated past history of action and reward experiences for reward-based decision making. The authors demonstrate that lesion of the monkey anterior cingulate cortex did not impair performance just after error trials, but it rendered the monkeys unable to maintain optimal choice. The authors also found that in a matching task, monkeys that had lesions of the anterior cingulate cortex took longer to attain the optimum choice ratio.
    • (2006) Nat Neurosci , vol.9 , pp. 940-947
    • Kennerley, S.W.1    Walton, M.E.2    Behrens, T.E.3    Buckley, M.J.4    Rushworth, M.F.5
  • 42
    • 0036592028 scopus 로고    scopus 로고
    • Control of exploitation-exploration meta-parameter in reinforcement learning
    • Ishii S., Yoshida W., and Yoshimoto J. Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw 15 (2002) 665-687
    • (2002) Neural Netw , vol.15 , pp. 665-687
    • Ishii, S.1    Yoshida, W.2    Yoshimoto, J.3
  • 43
    • 33745743341 scopus 로고    scopus 로고
    • Neural representation of information measure in the primate premotor cortex
    • Nakamura K. Neural representation of information measure in the primate premotor cortex. J Neurophysiol 96 (2006) 478-485
    • (2006) J Neurophysiol , vol.96 , pp. 478-485
    • Nakamura, K.1
  • 44
    • 33646853495 scopus 로고    scopus 로고
    • Resolution of uncertainty in prefrontal cortex
    • In many situations, we face difficulty finding ascertaining where we are from a limited set of available sensory inputs. The problem of finding an optimal strategy with only partial information of a current state is called a 'partially observable Markov decision process'. The authors modeled how humans resolve this problem in a task of pathfinding through a maze of which there was only a limited view. The anterior prefrontal BOLD signal was found to correlate with the uncertainty of a current belief state.
    • Yoshida W., and Ishii S. Resolution of uncertainty in prefrontal cortex. Neuron 50 (2006) 781-789. In many situations, we face difficulty finding ascertaining where we are from a limited set of available sensory inputs. The problem of finding an optimal strategy with only partial information of a current state is called a 'partially observable Markov decision process'. The authors modeled how humans resolve this problem in a task of pathfinding through a maze of which there was only a limited view. The anterior prefrontal BOLD signal was found to correlate with the uncertainty of a current belief state.
    • (2006) Neuron , vol.50 , pp. 781-789
    • Yoshida, W.1    Ishii, S.2
  • 45
    • 33745223257 scopus 로고    scopus 로고
    • Cortical substrates for exploratory decisions in humans
    • Daw N.D., O'Doherty J.P., Dayan P., Seymour B., and Dolan R.J. Cortical substrates for exploratory decisions in humans. Nature 441 (2006) 876-879
    • (2006) Nature , vol.441 , pp. 876-879
    • Daw, N.D.1    O'Doherty, J.P.2    Dayan, P.3    Seymour, B.4    Dolan, R.J.5
  • 46
    • 28044450875 scopus 로고    scopus 로고
    • Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control
    • The authors propose an interesting computational model consisting of two parallel reinforcement learning modules that could be implemented in prefrontal and subcortical basal ganglia circuits. The prefrontal circuit implements 'model-based reinforcement learning', whereas the corticobasal ganglia circuit including dorsolateral striatum implements 'model-free reinforcement learning'. The former achieves goal-directed behaviors using a tree-search algorithm by simulating the action and state transitions even before actual execution of the action. By contrast, the latter maintains a 'cash' of action value for each state, and it is updated by feedback of reward, step by step through actual action execution.
    • Daw N.D., Niv Y., and Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8 (2005) 1704-1711. The authors propose an interesting computational model consisting of two parallel reinforcement learning modules that could be implemented in prefrontal and subcortical basal ganglia circuits. The prefrontal circuit implements 'model-based reinforcement learning', whereas the corticobasal ganglia circuit including dorsolateral striatum implements 'model-free reinforcement learning'. The former achieves goal-directed behaviors using a tree-search algorithm by simulating the action and state transitions even before actual execution of the action. By contrast, the latter maintains a 'cash' of action value for each state, and it is updated by feedback of reward, step by step through actual action execution.
    • (2005) Nat Neurosci , vol.8 , pp. 1704-1711
    • Daw, N.D.1    Niv, Y.2    Dayan, P.3
  • 47
    • 10344225664 scopus 로고    scopus 로고
    • Addiction as a computational process gone awry
    • Redish A.D. Addiction as a computational process gone awry. Science 306 (2004) 1944-1947
    • (2004) Science , vol.306 , pp. 1944-1947
    • Redish, A.D.1
  • 48
    • 32044452698 scopus 로고    scopus 로고
    • Orbitofrontal cortex, decision-making and drug addiction
    • Schoenbaum G., Roesch M.R., and Stalnaker T.A. Orbitofrontal cortex, decision-making and drug addiction. Trends Neurosci 29 (2006) 116-124
    • (2006) Trends Neurosci , vol.29 , pp. 116-124
    • Schoenbaum, G.1    Roesch, M.R.2    Stalnaker, T.A.3
  • 49
    • 33746898593 scopus 로고    scopus 로고
    • Encoding of time-discounted rewards in orbitofrontal cortex is independent of value representation
    • Roesch M.R., Taylor A.R., and Schoenbaum G. Encoding of time-discounted rewards in orbitofrontal cortex is independent of value representation. Neuron 51 (2006) 509-520
    • (2006) Neuron , vol.51 , pp. 509-520
    • Roesch, M.R.1    Taylor, A.R.2    Schoenbaum, G.3
  • 50
    • 33746920148 scopus 로고    scopus 로고
    • Separate neural pathways process different decision costs
    • This rat lesion study shows that two different types of decision costs - the effort cost and the temporal cost in waiting reward - are represented by different prefrontal circuits - the anterior cingulate cortex and orbitofrontal cortex, respectively. A T-maze with an obstacle in the goal arm and temporal delay for reward delivery were used to examine the effort and temporal costs, respectively. Anterior cingulate cortex lesions affected how much effort rats decided to invest for reward. Orbitofrontal cortex lesions affected how long rats decided to wait for reward. This study is very important in understanding how decision is made depending not only on reward but also on multiple types of cost.
    • Rudebeck P.H., Walton M.E., Smyth A.N., Bannerman D.M., and Rushworth M.F. Separate neural pathways process different decision costs. Nat Neurosci 9 (2006) 1161-1168. This rat lesion study shows that two different types of decision costs - the effort cost and the temporal cost in waiting reward - are represented by different prefrontal circuits - the anterior cingulate cortex and orbitofrontal cortex, respectively. A T-maze with an obstacle in the goal arm and temporal delay for reward delivery were used to examine the effort and temporal costs, respectively. Anterior cingulate cortex lesions affected how much effort rats decided to invest for reward. Orbitofrontal cortex lesions affected how long rats decided to wait for reward. This study is very important in understanding how decision is made depending not only on reward but also on multiple types of cost.
    • (2006) Nat Neurosci , vol.9 , pp. 1161-1168
    • Rudebeck, P.H.1    Walton, M.E.2    Smyth, A.N.3    Bannerman, D.M.4    Rushworth, M.F.5
  • 51
    • 9244231144 scopus 로고    scopus 로고
    • Reinforcement learning and decision making in monkeys during a competitive game
    • Lee D., Conroy M.L., McGreevy B.P., and Barraclough D.J. Reinforcement learning and decision making in monkeys during a competitive game. Brain Res Cogn Brain Res 22 (2004) 45-58
    • (2004) Brain Res Cogn Brain Res , vol.22 , pp. 45-58
    • Lee, D.1    Conroy, M.L.2    McGreevy, B.P.3    Barraclough, D.J.4
  • 52
    • 33748999594 scopus 로고    scopus 로고
    • Neural mechanism for stochastic behaviour during a competitive game
    • Soltani A., Lee D., and Wang X.J. Neural mechanism for stochastic behaviour during a competitive game. Neural Netw 19 (2006) 1075-1090
    • (2006) Neural Netw , vol.19 , pp. 1075-1090
    • Soltani, A.1    Lee, D.2    Wang, X.J.3
  • 53
    • 5144223501 scopus 로고    scopus 로고
    • Activity in posterior parietal cortex is correlated with the relative subjective desirability of action
    • Dorris M.C., and Glimcher P.W. Activity in posterior parietal cortex is correlated with the relative subjective desirability of action. Neuron 44 (2004) 365-378
    • (2004) Neuron , vol.44 , pp. 365-378
    • Dorris, M.C.1    Glimcher, P.W.2
  • 54
    • 33646566317 scopus 로고    scopus 로고
    • Neurons in the orbitofrontal cortex encode economic value
    • Padoa-Schioppa C., and Assad J.A. Neurons in the orbitofrontal cortex encode economic value. Nature 441 (2006) 223-226
    • (2006) Nature , vol.441 , pp. 223-226
    • Padoa-Schioppa, C.1    Assad, J.A.2
  • 55
    • 0001027894 scopus 로고
    • Transfer of learning by composing solutions of elemental sequential tasks
    • Singh S. Transfer of learning by composing solutions of elemental sequential tasks. Mach Learn 8 (1992) 323-339
    • (1992) Mach Learn , vol.8 , pp. 323-339
    • Singh, S.1
  • 59
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
    • Sutton R.S., Precup D., and Singh S. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif Intell 112 (1999) 181-211
    • (1999) Artif Intell , vol.112 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.3
  • 60
    • 0033280847 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning for motion learning: learning 'stand up' trajectories
    • Morimoto J., and Doya K. Hierarchical reinforcement learning for motion learning: learning 'stand up' trajectories. Adv Robot 13 (1999) 267-268
    • (1999) Adv Robot , vol.13 , pp. 267-268
    • Morimoto, J.1    Doya, K.2
  • 61
    • 0742324926 scopus 로고    scopus 로고
    • Inter-module credit assignment in modular reinforcement learning
    • Samejima K., Doya K., and Kawato M. Inter-module credit assignment in modular reinforcement learning. Neural Netw 16 (2003) 985-994
    • (2003) Neural Netw , vol.16 , pp. 985-994
    • Samejima, K.1    Doya, K.2    Kawato, M.3
  • 62
    • 0032192424 scopus 로고    scopus 로고
    • Multiple paired forward and inverse models for motor control
    • Wolpert D.M., and Kawato M. Multiple paired forward and inverse models for motor control. Neural Netw 11 (1998) 1317-1329
    • (1998) Neural Netw , vol.11 , pp. 1317-1329
    • Wolpert, D.M.1    Kawato, M.2
  • 63
    • 0032787485 scopus 로고    scopus 로고
    • Internal models for motor control and trajectory planning
    • Kawato M. Internal models for motor control and trajectory planning. Curr Opin Neurobiol 9 (1999) 718-727
    • (1999) Curr Opin Neurobiol , vol.9 , pp. 718-727
    • Kawato, M.1
  • 64
    • 0035487297 scopus 로고    scopus 로고
    • Mosaic model for sensorimotor learning and control
    • Haruno M., Wolpert D.M., and Kawato M. Mosaic model for sensorimotor learning and control. Neural Comput 13 (2001) 2201-2220
    • (2001) Neural Comput , vol.13 , pp. 2201-2220
    • Haruno, M.1    Wolpert, D.M.2    Kawato, M.3
  • 66
    • 4544350592 scopus 로고    scopus 로고
    • Pedunculopontine nucleus and basal ganglia: distant relatives or part of the same family?
    • Mena-Segovia J., Bolam J.P., and Magill P.J. Pedunculopontine nucleus and basal ganglia: distant relatives or part of the same family?. Trends Neurosci 27 (2004) 585-588
    • (2004) Trends Neurosci , vol.27 , pp. 585-588
    • Mena-Segovia, J.1    Bolam, J.P.2    Magill, P.J.3
  • 67
    • 34147121799 scopus 로고    scopus 로고
    • Kobayashi Y, Okada K, Inoue Y. Reward predicting activity of pedunculopontine tegmental nucleus neurons during visually guided saccade tasks. In 2002 Abstract Viewer and Itinerary Planner Online (http://sfn.scholarone.com/). Society for Neuroscience; 2002: Program No. 890.5.
  • 68
    • 33749080272 scopus 로고    scopus 로고
    • Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning
    • In this study, a heterarchical reinforcement learning model is proposed and supporting fMRI data are presented. The interplay between the model and the experiments could resolve theoretical difficulties of the plain reinforcement learning algorithm.
    • Haruno M., and Kawato M. Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning. Neural Netw 19 (2006) 1242-1254. In this study, a heterarchical reinforcement learning model is proposed and supporting fMRI data are presented. The interplay between the model and the experiments could resolve theoretical difficulties of the plain reinforcement learning algorithm.
    • (2006) Neural Netw , vol.19 , pp. 1242-1254
    • Haruno, M.1    Kawato, M.2
  • 69
    • 0347086138 scopus 로고    scopus 로고
    • The primate basal ganglia: parallel and integrative networks
    • Haber S.N. The primate basal ganglia: parallel and integrative networks. J Chem Neuroanat 26 (2003) 317-330
    • (2003) J Chem Neuroanat , vol.26 , pp. 317-330
    • Haber, S.N.1
  • 70
    • 0036333980 scopus 로고    scopus 로고
    • Contribution of pedunculopontine tegmental nucleus neurons to performance of visually guided saccade tasks in monkeys
    • Kobayashi Y., Inoue Y., Yamamoto M., Isa T., and Aizawa H. Contribution of pedunculopontine tegmental nucleus neurons to performance of visually guided saccade tasks in monkeys. J Neurophysiol 88 (2002) 715-731
    • (2002) J Neurophysiol , vol.88 , pp. 715-731
    • Kobayashi, Y.1    Inoue, Y.2    Yamamoto, M.3    Isa, T.4    Aizawa, H.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.