SCOPUS 정보 검색 플랫폼

Current Opinion in Behavioral Sciences

Volumn 11, Issue , 2016, Pages 67-73

Reinforcement learning with Marr

(2) Niv, Yael a Langdon, Angela a

a PRINCETON UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

DOPAMINE;

BASAL GANGLION; CONCEPTUAL FRAMEWORK; CORPUS STRIATUM; DECISION MAKING; HIDDEN MARKOV MODEL; LEARNING; LEARNING ALGORITHM; NEUROMODULATION; PSYCHOLOGY; REINFORCEMENT; REVIEW;

EID: 84973375982 PISSN: None EISSN: 23521546 Source Type: Journal
DOI: 10.1016/j.cobeha.2016.04.005 Document Type: Review

Times cited : (38)

References (103)

1
- 0017573822
- From understanding computation to understanding neural circuitry
- Marr D., Poggio T. From understanding computation to understanding neural circuitry. Neurosci Res Program Bull 1977, 15:470-488.
- (1977) Neurosci Res Program Bull , vol.15 , pp. 470-488
- Marr, D.¹ Poggio, T.²

2
- 0004102479
- MIT Press.
- Sutton R.S., Barto A.G. Reinforcement learning: an introduction 1998, MIT Press.
- (1998) Reinforcement learning: an introduction
- Sutton, R.S.¹ Barto, A.G.²

3
- 33847202724
- Learning to predict by the methods of temporal differences
- Sutton R.S. Learning to predict by the methods of temporal differences. Mach Learn 1988, 3:9-44.
- (1988) Mach Learn , vol.3 , pp. 9-44
- Sutton, R.S.¹

4
- 45949092119
- Dialogues on prediction errors
- Niv Y., Schoenbaum G. Dialogues on prediction errors. Trends Cogn Sci 2008, 12:265-272.
- (2008) Trends Cogn Sci , vol.12 , pp. 265-272
- Niv, Y.¹ Schoenbaum, G.²

5
- 67349283062
- Reinforcement learning in the brain
- Niv Y. Reinforcement learning in the brain. J Math Psychol 2009, 53:139-154.
- (2009) J Math Psychol , vol.53 , pp. 139-154
- Niv, Y.¹

6
- 0002861883
- A model of how the basal ganglia generate and use neural signals that predict reinforcement
- MIT Press, J.C. Houk, J.L. Davis, D.G. Beiser (Eds.)
- Houk J.C., Adams J.L., Barto A.G. A model of how the basal ganglia generate and use neural signals that predict reinforcement. Models of information processing in the basal ganglia 1995, 249-270. MIT Press. J.C. Houk, J.L. Davis, D.G. Beiser (Eds.).
- (1995) Models of information processing in the basal ganglia , pp. 249-270
- Houk, J.C.¹ Adams, J.L.² Barto, A.G.³

7
- 33644927837
- Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia
- O'Reilly R.C., Frank M.J. Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Computat 2006, 18:283-328.
- (2006) Neural Computat , vol.18 , pp. 283-328
- O'Reilly, R.C.¹ Frank, M.J.²

8
- 0032973437
- The basal ganglia: a vertebrate solution to the selection problem?
- Redgrave P., Prescott T.J., Gurney K. The basal ganglia: a vertebrate solution to the selection problem?. Neuroscience 1999, 89:1009-1023.
- (1999) Neuroscience , vol.89 , pp. 1009-1023
- Redgrave, P.¹ Prescott, T.J.² Gurney, K.³

9
- 0029981543
- A framework for mesencephalic dopamine systems based on predictive Hebbian learning
- Montague P.R., Dayan P., Sejnowski T.J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 1996, 16:1936-1947.
- (1996) J Neurosci , vol.16 , pp. 1936-1947
- Montague, P.R.¹ Dayan, P.² Sejnowski, T.J.³

10
- 0030896968
- A neural substrate of prediction and reward
- Schultz W., Dayan P., Montague P.R. A neural substrate of prediction and reward. Science 1997, 275:1593-1599.
- (1997) Science , vol.275 , pp. 1593-1599
- Schultz, W.¹ Dayan, P.² Montague, P.R.³

11
- 0000541213
- Adaptive critics and the basal ganglia
- MIT Press, J.C. Houk, J.L. Davis, D.G. Beiser (Eds.)
- Barto A.G. Adaptive critics and the basal ganglia. Models of information processing in the basal ganglia 1995, 215-232. MIT Press. J.C. Houk, J.L. Davis, D.G. Beiser (Eds.).
- (1995) Models of information processing in the basal ganglia , pp. 215-232
- Barto, A.G.¹

12
- 68249083099
- The wick in the candle of learning epistemic curiosity activates reward circuitry and enhances memory
- Kang M.J., Hsu M., Krajbich I.M., Loewenstein G., McClure S.M., Wang J.T., Camerer C.F. The wick in the candle of learning epistemic curiosity activates reward circuitry and enhances memory. Psychol Sci 2009, 20:963-973.
- (2009) Psychol Sci , vol.20 , pp. 963-973
- Kang, M.J.¹ Hsu, M.² Krajbich, I.M.³ Loewenstein, G.⁴ McClure, S.M.⁵ Wang, J.T.⁶ Camerer, C.F.⁷

13
- 12044252203
- The psychology of curiosity: a review and reinterpretation
- Loewenstein G. The psychology of curiosity: a review and reinterpretation. Psychol Bull 1994, 116:75.
- (1994) Psychol Bull , vol.116 , pp. 75
- Loewenstein, G.¹

14
- 2442467081
- A possibility for implementing curiosity and boredom in model-building neural controllers
- Schmidhuber J. A possibility for implementing curiosity and boredom in model-building neural controllers. From animals to animats: proceedings of the first international conference on simulation of adaptive behavior 1991.
- (1991) From animals to animats: proceedings of the first international conference on simulation of adaptive behavior
- Schmidhuber, J.¹

15
- 75149137813
- What is intrinsic motivation? A typology of computational approaches
- Oudeyer P.-Y., Kaplan F. What is intrinsic motivation? A typology of computational approaches. Front Neurorobot 2007, 1:6.
- (2007) Front Neurorobot , vol.1 , pp. 6
- Oudeyer, P.-Y.¹ Kaplan, F.²

16
- 84929046579
- Intrinsic motivation and reinforcement learning
- Springer, G. Baldassarre, M. Mirolli (Eds.)
- Barto A.G. Intrinsic motivation and reinforcement learning. Intrinsically motivated learning in natural and artificial systems 2013, 17-47. Springer. G. Baldassarre, M. Mirolli (Eds.).
- (2013) Intrinsically motivated learning in natural and artificial systems , pp. 17-47
- Barto, A.G.¹

17
- 34250703734
- An intrinsic reward mechanism for efficient exploration
- ACM.
- Şimşek Ö., Barto A.G. An intrinsic reward mechanism for efficient exploration. Proceedings of the 23rd international conference on Machine learning 2006, 833-840. ACM.
- (2006) Proceedings of the 23rd international conference on Machine learning , pp. 833-840
- Şimşek, Ö.¹ Barto, A.G.²

18
- 84893133238
- Where do rewards come from
- Singh S., Lewis R.L., Barto A.G. Where do rewards come from. Proceedings of the annual conference of the cognitive science society 2009, 2601-2606.
- (2009) Proceedings of the annual conference of the cognitive science society , pp. 2601-2606
- Singh, S.¹ Lewis, R.L.² Barto, A.G.³

19
- 84968786793
- When good news leads to bad choices
- McDevitt M., Dunn R., Spetch M., Ludvig E. When good news leads to bad choices. J Exp Anal Behav 2016, 105:23-40.
- (2016) J Exp Anal Behav , vol.105 , pp. 23-40
- McDevitt, M.¹ Dunn, R.² Spetch, M.³ Ludvig, E.⁴

20
- 84949990028
- When good pigeons make bad decisions: choice with probabilistic delays and outcomes
- Pisklak J.M., McDevitt M.A., Dunn R.M., Spetch M.L. When good pigeons make bad decisions: choice with probabilistic delays and outcomes. J Exp Anal Behav 2015, 104:241-251.
- (2015) J Exp Anal Behav , vol.104 , pp. 241-251
- Pisklak, J.M.¹ McDevitt, M.A.² Dunn, R.M.³ Spetch, M.L.⁴

21
- 68349115012
- Midbrain dopamine neurons signal preference for advance information about upcoming rewards
- Bromberg-Martin E.S., Hikosaka O. Midbrain dopamine neurons signal preference for advance information about upcoming rewards. Neuron 2009, 63:119-126.
- (2009) Neuron , vol.63 , pp. 119-126
- Bromberg-Martin, E.S.¹ Hikosaka, O.²

22
- 80052211211
- Lateral habenula neurons signal errors in the prediction of reward information
- Bromberg-Martin E.S., Hikosaka O. Lateral habenula neurons signal errors in the prediction of reward information. Nature Neurosci 2011, 14:1209-1216.
- (2011) Nature Neurosci , vol.14 , pp. 1209-1216
- Bromberg-Martin, E.S.¹ Hikosaka, O.²

23
- 84922147919
- Orbitofrontal cortex uses distinct codes for different choice attributes in decisions motivated by curiosity
- Blanchard T.C., Hayden B.Y., Bromberg-Martin E.S. Orbitofrontal cortex uses distinct codes for different choice attributes in decisions motivated by curiosity. Neuron 2015, 85:602-614.
- (2015) Neuron , vol.85 , pp. 602-614
- Blanchard, T.C.¹ Hayden, B.Y.² Bromberg-Martin, E.S.³

24
- 79953822184
- Intrinsically motivated reinforcement learning: An evolutionary perspective
- Singh S., Lewis R.L., Barto A.G., Sorg J. Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Trans Autonom Mental Dev 2010, 2:70-82.
- (2010) IEEE Trans Autonom Mental Dev , vol.2 , pp. 70-82
- Singh, S.¹ Lewis, R.L.² Barto, A.G.³ Sorg, J.⁴

25
- 84898929318
- Reward mapping for transfer in long-lived agents
- Guo X., Singh S., Lewis R.L. Reward mapping for transfer in long-lived agents. Adv Neural Inform Process Syst 2013, 2130-2138.
- (2013) Adv Neural Inform Process Syst , pp. 2130-2138
- Guo, X.¹ Singh, S.² Lewis, R.L.³

26
- 77956525933
- Internal rewards mitigate agent boundedness
- Sorg J., Singh S.P., Lewis R.L. Internal rewards mitigate agent boundedness. Proceedings of the 27th international conference on machine learning (ICML-10) 2010, 1007-1014.
- (2010) Proceedings of the 27th international conference on machine learning (ICML-10) , pp. 1007-1014
- Sorg, J.¹ Singh, S.P.² Lewis, R.L.³

27
- 0003654586
- D. Appleton-Century Company
- Hull C. Principles of behavior: an introduction to behavior theory 1943, D. Appleton-Century Company.
- (1943) Principles of behavior: an introduction to behavior theory
- Hull, C.¹

28
- 84964313382
- Homeostatic reinforcement learning for integrating reward collection and physiological stability
- Keramati M., Gutkin B. Homeostatic reinforcement learning for integrating reward collection and physiological stability. eLife 2014, 3:e04811.
- (2014) eLife , vol.3
- Keramati, M.¹ Gutkin, B.²

29
- 85162538375
- A reinforcement learning theory for homeostatic regulation
- Keramati M., Gutkin B.S. A reinforcement learning theory for homeostatic regulation. Advances in neural information processing systems 2011, 82-90.
- (2011) Advances in neural information processing systems , pp. 82-90
- Keramati, M.¹ Gutkin, B.S.²

30
- 84948584046
- Just in time adaptive interventions (JITAIs): an organizing framework for ongoing health behavior support
- Nahum-Shani I., Smith S.N., Tewari A., Witkiewitz K., Collins L.M., Spring B., Murphy S. Just in time adaptive interventions (JITAIs): an organizing framework for ongoing health behavior support. Methodology Center Technical Report No. 14-126 2014.
- (2014) Methodology Center Technical Report No. 14-126
- Nahum-Shani, I.¹ Smith, S.N.² Tewari, A.³ Witkiewitz, K.⁴ Collins, L.M.⁵ Spring, B.⁶ Murphy, S.⁷

31
- 84995965487
- Building health behavior models to guide the development of just-in-time adaptive interventions: a pragmatic framework
- Nahum-Shani I., Hekler E.B., Spruijt-Metz D. Building health behavior models to guide the development of just-in-time adaptive interventions: a pragmatic framework. Health Psychol 2015, 34:1209.
- (2015) Health Psychol , vol.34 , pp. 1209
- Nahum-Shani, I.¹ Hekler, E.B.² Spruijt-Metz, D.³

32
- 84862001711
- Transfer in reinforcement learning via shared features
- Konidaris G., Scheidwasser I., Barto A.G. Transfer in reinforcement learning via shared features. J Mach Learn Res 2012, 13:1333-1371.
- (2012) J Mach Learn Res , vol.13 , pp. 1333-1371
- Konidaris, G.¹ Scheidwasser, I.² Barto, A.G.³

33
- 77952541839
- Learning latent structure: carving nature at its joints
- Gershman S.J., Niv Y. Learning latent structure: carving nature at its joints. Curr Opin Neurobiol 2010, 20:251-256.
- (2010) Curr Opin Neurobiol , vol.20 , pp. 251-256
- Gershman, S.J.¹ Niv, Y.²

34
- 84921627984
- Trial by trial data analysis using computational models
- Oxford University Press
- Daw N. Trial by trial data analysis using computational models. Decision making, affect and learning: attention and performance XXIII 2011, Oxford University Press.
- (2011) Decision making, affect and learning: attention and performance XXIII
- Daw, N.¹

35
- 84938863250
- Discovering latent causes in reinforcement learning
- Gershman S.J., Norman K.A., Niv Y. Discovering latent causes in reinforcement learning. Curr Opin Behav Sci 2015, 5:43-50.
- (2015) Curr Opin Behav Sci , vol.5 , pp. 43-50
- Gershman, S.J.¹ Norman, K.A.² Niv, Y.³

36
- 74049117596
- Context, learning, and extinction
- Gershman S.J., Blei D.M., Niv Y. Context, learning, and extinction. Psychol Rev 2010, 117:197.
- (2010) Psychol Rev , vol.117 , pp. 197
- Gershman, S.J.¹ Blei, D.M.² Niv, Y.³

37
- 84887935902
- Gradual extinction prevents the return of fear: implications for the discovery of state
- Gershman S.J., Jones C.E., Norman K.A., Monfils M.-H., Niv Y. Gradual extinction prevents the return of fear: implications for the discovery of state. Front Behav Neurosci 2013, 7:164.
- (2013) Front Behav Neurosci , vol.7 , pp. 164
- Gershman, S.J.¹ Jones, C.E.² Norman, K.A.³ Monfils, M.-H.⁴ Niv, Y.⁵

38
- 84885362378
- Perceptual estimation obeys Occam's razor
- Gershman S.J., Niv Y. Perceptual estimation obeys Occam's razor. Front Psychol 2013, 4:623.
- (2013) Front Psychol , vol.4 , pp. 623
- Gershman, S.J.¹ Niv, Y.²

39
- 84912082331
- Working memory contributions to reinforcement learning impairments in schizophrenia
- Collins A.G., Brown J.K., Gold J.M., Waltz J.A., Frank M.J. Working memory contributions to reinforcement learning impairments in schizophrenia. J Neurosci 2014, 34:13747-13756.
- (2014) J Neurosci , vol.34 , pp. 13747-13756
- Collins, A.G.¹ Brown, J.K.² Gold, J.M.³ Waltz, J.A.⁴ Frank, M.J.⁵

40
- 84912099627
- Statistical computations underlying the dynamics of memory updating
- Gershman S.J., Radulescu A., Norman K.A., Niv Y. Statistical computations underlying the dynamics of memory updating. PLoS Computat Biol 2014, 10:e1003939.
- (2014) PLoS Computat Biol , vol.10
- Gershman, S.J.¹ Radulescu, A.² Norman, K.A.³ Niv, Y.⁴

41
- 84930260511
- Reinforcement learning in multidimensional environments relies on attention mechanisms
- Niv Y., Daniel R., Geana A., Gershman S.J., Leong Y.C., Radulescu A., Wilson R.C. Reinforcement learning in multidimensional environments relies on attention mechanisms. J Neurosci 2015, 35:8145-8157.
- (2015) J Neurosci , vol.35 , pp. 8145-8157
- Niv, Y.¹ Daniel, R.² Geana, A.³ Gershman, S.J.⁴ Leong, Y.C.⁵ Radulescu, A.⁶ Wilson, R.C.⁷

42
- 58249121629
- Academic Press
- Glimcher P.W., Fehr E. Neuroeconomics: decision making and the brain 2013, Academic Press.
- (2013) Neuroeconomics: decision making and the brain
- Glimcher, P.W.¹ Fehr, E.²

43
- 84878186347
- The root of all value: a neural common currency for choice
- Levy D.J., Glimcher P.W. The root of all value: a neural common currency for choice. Curr Opin Neurobiol 2012, 22:1027-1038.
- (2012) Curr Opin Neurobiol , vol.22 , pp. 1027-1038
- Levy, D.J.¹ Glimcher, P.W.²

44
- 33646566317
- Neurons in the orbitofrontal cortex encode economic value
- Padoa-Schioppa C., Assad J.A. Neurons in the orbitofrontal cortex encode economic value. Nature 2006, 441:223-226.
- (2006) Nature , vol.441 , pp. 223-226
- Padoa-Schioppa, C.¹ Assad, J.A.²

45
- 37549055372
- The representation of economic value in the orbitofrontal cortex is invariant for changes of menu
- Padoa-Schioppa C., Assad J.A. The representation of economic value in the orbitofrontal cortex is invariant for changes of menu. Nature Neurosci 2008, 11:95-102.
- (2008) Nature Neurosci , vol.11 , pp. 95-102
- Padoa-Schioppa, C.¹ Assad, J.A.²

46
- 0013535965
- Infinite-horizon policy-gradient estimation
- Baxter J., Bartlett P.L. Infinite-horizon policy-gradient estimation. J Artif Intell Res 2001, 319-350.
- (2001) J Artif Intell Res , pp. 319-350
- Baxter, J.¹ Bartlett, P.L.²

47
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Sutton R.S., McAllester D.A., Singh S.P., Mansour Y. Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inform Process Syst 1999, 1057-1063.
- (1999) Adv Neural Inform Process Syst , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.P.³ Mansour, Y.⁴

48
- 70349982705
- Incremental natural actor-critic algorithms
- Bhatnagar S., Ghavamzadeh M., Lee M., Sutton R.S. Incremental natural actor-critic algorithms. Adv Neural Inform Process Syst 2007, 105-112.
- (2007) Adv Neural Inform Process Syst , pp. 105-112
- Bhatnagar, S.¹ Ghavamzadeh, M.² Lee, M.³ Sutton, R.S.⁴

49
- 79955721719
- Signals in human striatum are appropriate for policy update rather than value prediction
- Li J., Daw N.D. Signals in human striatum are appropriate for policy update rather than value prediction. J Neurosci 2011, 31:5504-5511.
- (2011) J Neurosci , vol.31 , pp. 5504-5511
- Li, J.¹ Daw, N.D.²

50
- 84892185669
- Time representation in reinforcement learning models of the basal ganglia
- Gershman S.J., Moustafa A.A., Ludvig E.A. Time representation in reinforcement learning models of the basal ganglia. Front Computat Neurosci 2014, 7.
- (2014) Front Computat Neurosci , pp. 7
- Gershman, S.J.¹ Moustafa, A.A.² Ludvig, E.A.³

51
- 33745787929
- Representation and timing in theories of the dopamine system
- Daw N.D., Courville A.C., Touretzky D.S. Representation and timing in theories of the dopamine system. Neural Computat 2006, 18:1637-1677.
- (2006) Neural Computat , vol.18 , pp. 1637-1677
- Daw, N.D.¹ Courville, A.C.² Touretzky, D.S.³

52
- 84870910490
- Evaluating the TD model of classical conditioning
- Ludvig E.A., Sutton R.S., Kehoe E.J. Evaluating the TD model of classical conditioning. Learn Behav 2012, 40:305-319.
- (2012) Learn Behav , vol.40 , pp. 305-319
- Ludvig, E.A.¹ Sutton, R.S.² Kehoe, E.J.³

53
- 77549088745
- Alternative time representation in dopamine models
- Rivest F., Kalaska J.F., Bengio Y. Alternative time representation in dopamine models. J Computat Neurosci 2010, 28:107-130.
- (2010) J Computat Neurosci , vol.28 , pp. 107-130
- Rivest, F.¹ Kalaska, J.F.² Bengio, Y.³

54
- 85150714688
- Reinforcement learning methods for continuous time Markov decision problems
- Bradtke S.J., Duff M.O. Reinforcement learning methods for continuous time Markov decision problems. Adv Neural Inform Process Syst 1995, 7:393.
- (1995) Adv Neural Inform Process Syst , vol.7 , pp. 393
- Bradtke, S.J.¹ Duff, M.O.²

55
- 84991262300
- Temporal specificity of reward prediction errors signaled by putative dopamine neurons in rat VTA depends on ventral striatum
- Takahashi Y., Langdon A.J., Niv Y., Schoenbaum G. Temporal specificity of reward prediction errors signaled by putative dopamine neurons in rat VTA depends on ventral striatum. Neuron 2016.
- (2016) Neuron
- Takahashi, Y.¹ Langdon, A.J.² Niv, Y.³ Schoenbaum, G.⁴

56
- 84929025181
- A scalable population code for time in the striatum
- Mello G.B., Soares S., Paton J.J. A scalable population code for time in the striatum. Curr Biol 2015, 25:1113-1122.
- (2015) Curr Biol , vol.25 , pp. 1113-1122
- Mello, G.B.¹ Soares, S.² Paton, J.J.³

57
- 84955262053
- Striatal dynamics explain duration judgments
- Gouvêa T.S., Monteiro T., Motiwala A., Soares S., Machens C., Paton J.J. Striatal dynamics explain duration judgments. eLife 2016, 4:e11386.
- (2016) eLife , vol.4
- Gouvêa, T.S.¹ Monteiro, T.² Motiwala, A.³ Soares, S.⁴ Machens, C.⁵ Paton, J.J.⁶

58
- 84896722491
- Basal ganglia subcircuits distinctively encode the parsing and concatenation of action sequences
- Jin X., Tecuapetla F., Costa R.M. Basal ganglia subcircuits distinctively encode the parsing and concatenation of action sequences. Nature Neurosci 2014, 17:423-430.
- (2014) Nature Neurosci , vol.17 , pp. 423-430
- Jin, X.¹ Tecuapetla, F.² Costa, R.M.³

59
- 70350566799
- Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective
- Botvinick M.M., Niv Y., Barto A.C. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition 2009, 113:262-280.
- (2009) Cognition , vol.113 , pp. 262-280
- Botvinick, M.M.¹ Niv, Y.² Barto, A.C.³

60
- 84878190351
- Hierarchical reinforcement learning and decision making
- Botvinick M.M. Hierarchical reinforcement learning and decision making. Curr Opin Neurobiol 2012, 22:956-962.
- (2012) Curr Opin Neurobiol , vol.22 , pp. 956-962
- Botvinick, M.M.¹

61
- 84923228672
- Optimal behavioral hierarchy
- Solway A., Diuk C., Córdova N., Yee D., Barto A.G., Niv Y., Botvinick M.M. Optimal behavioral hierarchy. PLoS Computat Biol 2014, 10. e1003779.
- (2014) PLoS Computat Biol , vol.10
- Solway, A.¹ Diuk, C.² Córdova, N.³ Yee, D.⁴ Barto, A.G.⁵ Niv, Y.⁶ Botvinick, M.M.⁷

62
- 0032073263
- Planning and acting in partially observable stochastic domains
- Kaelbling L.P., Littman M.L., Cassandra A.R. Planning and acting in partially observable stochastic domains. Artif Intell 1998, 101:99-134.
- (1998) Artif Intell , vol.101 , pp. 99-134
- Kaelbling, L.P.¹ Littman, M.L.² Cassandra, A.R.³

63
- 84924051598
- Human-level control through deep reinforcement learning
- Mnih V., Kavukcuoglu K., Silver D., Rusu A.A., Veness J., Bellemare M.G., Graves A., Riedmiller M., Fidjeland A.K., Ostrovski G. Human-level control through deep reinforcement learning. Nature 2015, 518:529-533.
- (2015) Nature , vol.518 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰

64
- 28044450875
- Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control
- Daw N.D., Niv Y., Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neurosci 2005, 8:1704-1711.
- (2005) Nature Neurosci , vol.8 , pp. 1704-1711
- Daw, N.D.¹ Niv, Y.² Dayan, P.³

65
- 79958143780
- Speed/accuracy trade-off between the habitual and the goal-directed processes
- Keramati M., Dezfouli A., Piray P. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Computat Biol 2011, 7. e1002055.
- (2011) PLoS Computat Biol , vol.7
- Keramati, M.¹ Dezfouli, A.² Piray, P.³

66
- 79952746011
- Model-based influences on humans' choices and striatal prediction errors
- Daw N.D., Gershman S.J., Seymour B., Dayan P., Dolan R.J. Model-based influences on humans' choices and striatal prediction errors. Neuron 2011, 69:1204-1215.
- (2011) Neuron , vol.69 , pp. 1204-1215
- Daw, N.D.¹ Gershman, S.J.² Seymour, B.³ Dayan, P.⁴ Dolan, R.J.⁵

67
- 84877341847
- The curse of planning dissecting multiple reinforcement-learning systems by taxing the central executive
- Otto A.R., Gershman S.J., Markman A.B., Daw N.D. The curse of planning dissecting multiple reinforcement-learning systems by taxing the central executive. Psychol Sci 2013, 24:751-761.
- (2013) Psychol Sci , vol.24 , pp. 751-761
- Otto, A.R.¹ Gershman, S.J.² Markman, A.B.³ Daw, N.D.⁴

68
- 84928701322
- Model-based choices involve prospective neural activity
- Doll B.B., Duncan K.D., Simon D.A., Shohamy D., Daw N.D. Model-based choices involve prospective neural activity. Nature Neurosci 2015, 18:767-772.
- (2015) Nature Neurosci , vol.18 , pp. 767-772
- Doll, B.B.¹ Duncan, K.D.² Simon, D.A.³ Shohamy, D.⁴ Daw, N.D.⁵

69
- 84924655858
- Disorders of compulsivity: a common bias towards learning habits
- Voon V., Derbyshire K., Rück C., Irvine M., Worbe Y., Enander J., Schreiber L., Gillan C., Fineberg N., Sahakian B. Disorders of compulsivity: a common bias towards learning habits. Mol Psychiatry 2015, 20:345-352.
- (2015) Mol Psychiatry , vol.20 , pp. 345-352
- Voon, V.¹ Derbyshire, K.² Rück, C.³ Irvine, M.⁴ Worbe, Y.⁵ Enander, J.⁶ Schreiber, L.⁷ Gillan, C.⁸ Fineberg, N.⁹ Sahakian, B.¹⁰

70
- 84961875552
- Characterizing a psychological dimension related to deficits in goal-directed control
- Gillan C., Kosinski R.W., Phelps E.A., Daw N.D. Characterizing a psychological dimension related to deficits in goal-directed control. eLife 2016, 5:e11305.
- (2016) eLife , vol.5
- Gillan, C.¹ Kosinski, R.W.² Phelps, E.A.³ Daw, N.D.⁴

71
- 84905503383
- Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive
- Collins A.G., Frank M.J. Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psychol Rev 2014, 121:337.
- (2014) Psychol Rev , vol.121 , pp. 337
- Collins, A.G.¹ Frank, M.J.²

72
- 84905569399
- A reinforcement learning mechanism responsible for the valuation of free choice
- Cockburn J., Collins A.G., Frank M.J. A reinforcement learning mechanism responsible for the valuation of free choice. Neuron 2014, 83:551-557.
- (2014) Neuron , vol.83 , pp. 551-557
- Cockburn, J.¹ Collins, A.G.² Frank, M.J.³

73
- 36448968271
- Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards
- Roesch M.R., Calu D.J., Schoenbaum G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neurosci 2007, 10:1615-1624.
- (2007) Nature Neurosci , vol.10 , pp. 1615-1624
- Roesch, M.R.¹ Calu, D.J.² Schoenbaum, G.³

74
- 0035817882
- A cellular mechanism of reward-related learning
- Reynolds J.N., Hyland B.I., Wickens J.R. A cellular mechanism of reward-related learning. Nature 2001, 413:67-70.
- (2001) Nature , vol.413 , pp. 67-70
- Reynolds, J.N.¹ Hyland, B.I.² Wickens, J.R.³

75
- 0036592025
- Dopamine-dependent plasticity of corticostriatal synapses
- Reynolds J.N., Wickens J.R. Dopamine-dependent plasticity of corticostriatal synapses. Neural Netw 2002, 15:507-521.
- (2002) Neural Netw , vol.15 , pp. 507-521
- Reynolds, J.N.¹ Wickens, J.R.²

76
- 84864066393
- Reinforcement learning: computing the temporal difference of values via distinct corticostriatal pathways
- Morita K., Morishima M., Sakai K., Kawaguchi Y. Reinforcement learning: computing the temporal difference of values via distinct corticostriatal pathways. Trends Neurosci 2012, 35:457-467.
- (2012) Trends Neurosci , vol.35 , pp. 457-467
- Morita, K.¹ Morishima, M.² Sakai, K.³ Kawaguchi, Y.⁴

77
- 79958078227
- An imperfect dopaminergic error signal can drive temporal-difference learning
- Potjans W., Diesmann M., Morrison A. An imperfect dopaminergic error signal can drive temporal-difference learning. PLoS Computat Biol 2011, 7. e1001133.
- (2011) PLoS Computat Biol , vol.7
- Potjans, W.¹ Diesmann, M.² Morrison, A.³

78
- 67650298948
- A spiking neural network model of an actor-critic learning agent
- Potjans W., Morrison A., Diesmann M. A spiking neural network model of an actor-critic learning agent. Neural Computat 2009, 21:301-339.
- (2009) Neural Computat , vol.21 , pp. 301-339
- Potjans, W.¹ Morrison, A.² Diesmann, M.³

79
- 0036592026
- Actor-critic models of the basal ganglia: new anatomical and computational perspectives
- Joel D., Niv Y., Ruppin E. Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Netw 2002, 15:535-547.
- (2002) Neural Netw , vol.15 , pp. 535-547
- Joel, D.¹ Niv, Y.² Ruppin, E.³

80
- 84941212802
- Arithmetic and local circuitry underlying dopamine prediction errors
- Eshel N., Bukwich M., Rao V., Hemmelder V., Tian J., Uchida N. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 2015, 525:243-246.
- (2015) Nature , vol.525 , pp. 243-246
- Eshel, N.¹ Bukwich, M.² Rao, V.³ Hemmelder, V.⁴ Tian, J.⁵ Uchida, N.⁶

81
- 0035489925
- Spike-timing-dependent Hebbian plasticity as temporal difference learning
- Rao R.P., Sejnowski T.J. Spike-timing-dependent Hebbian plasticity as temporal difference learning. Neural Computat 2001, 13:2221-2237.
- (2001) Neural Computat , vol.13 , pp. 2221-2237
- Rao, R.P.¹ Sejnowski, T.J.²

82
- 33847634405
- The debate over dopamine's role in reward: the case for incentive salience
- Berridge K.C. The debate over dopamine's role in reward: the case for incentive salience. Psychopharmacology 2007, 191:391-431.
- (2007) Psychopharmacology , vol.191 , pp. 391-431
- Berridge, K.C.¹

83
- 0033119561
- Is the short-latency dopamine response too short to signal reward error?
- Redgrave P., Prescott T.J., Gurney K. Is the short-latency dopamine response too short to signal reward error?. Trends Neurosci 1999, 22:146-151.
- (1999) Trends Neurosci , vol.22 , pp. 146-151
- Redgrave, P.¹ Prescott, T.J.² Gurney, K.³

84
- 66049119905
- Short-latency activation of striatal spiny neurons via subcortical visual pathways
- Schulz J.M., Redgrave P., Mehring C., Aertsen A., Clements K.M., Wickens J.R., Reynolds J.N. Short-latency activation of striatal spiny neurons via subcortical visual pathways. J Neurosci 2009, 29:6336-6347.
- (2009) J Neurosci , vol.29 , pp. 6336-6347
- Schulz, J.M.¹ Redgrave, P.² Mehring, C.³ Aertsen, A.⁴ Clements, K.M.⁵ Wickens, J.R.⁶ Reynolds, J.N.⁷

85
- 84856431209
- Neuron-type-specific signals for reward and punishment in the ventral tegmental area
- Cohen J.Y., Haesler S., Vong L., Lowell B.B., Uchida N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 2012, 482:85-88.
- (2012) Nature , vol.482 , pp. 85-88
- Cohen, J.Y.¹ Haesler, S.² Vong, L.³ Lowell, B.B.⁴ Uchida, N.⁵

86
- 84863393406
- Structural correlates of heterogeneous in vivo activity of midbrain dopaminergic neurons
- Henny P., Brown M.T., Northrop A., Faunes M., Ungless M.A., Magill P.J., Bolam J.P. Structural correlates of heterogeneous in vivo activity of midbrain dopaminergic neurons. Nature Neurosci 2012, 15:613-619.
- (2012) Nature Neurosci , vol.15 , pp. 613-619
- Henny, P.¹ Brown, M.T.² Northrop, A.³ Faunes, M.⁴ Ungless, M.A.⁵ Magill, P.J.⁶ Bolam, J.P.⁷

87
- 63849268432
- Phasic excitation of dopamine neurons in ventral VTA by noxious stimuli
- Brischoux F., Chakraborty S., Brierley D.I., Ungless M.A. Phasic excitation of dopamine neurons in ventral VTA by noxious stimuli. Proc Natl Acad Sci U S A 2009, 106:4894-4899.
- (2009) Proc Natl Acad Sci U S A , vol.106 , pp. 4894-4899
- Brischoux, F.¹ Chakraborty, S.² Brierley, D.I.³ Ungless, M.A.⁴

88
- 58149236469
- Midbrain dopaminergic neurons and striatal cholinergic interneurons encode the difference between reward and aversive events at different epochs of probabilistic classical conditioning trials
- Joshua M., Adler A., Mitelman R., Vaadia E., Bergman H. Midbrain dopaminergic neurons and striatal cholinergic interneurons encode the difference between reward and aversive events at different epochs of probabilistic classical conditioning trials. J Neurosci 2008, 28:11673-11684.
- (2008) J Neurosci , vol.28 , pp. 11673-11684
- Joshua, M.¹ Adler, A.² Mitelman, R.³ Vaadia, E.⁴ Bergman, H.⁵

89
- 84875468581
- Hierarchical learning induces two simultaneous, but separable, prediction errors in human basal ganglia
- Diuk C., Tsai K., Wallis J., Botvinick M., Niv Y. Hierarchical learning induces two simultaneous, but separable, prediction errors in human basal ganglia. J Neurosci 2013, 33:5797-5805.
- (2013) J Neurosci , vol.33 , pp. 5797-5805
- Diuk, C.¹ Tsai, K.² Wallis, J.³ Botvinick, M.⁴ Niv, Y.⁵

90
- 84865306110
- The successor representation and temporal context
- Gershman S.J., Moore C.D., Todd M.T., Norman K.A., Sederberg P.B. The successor representation and temporal context. Neural Computat 2012, 24:1553-1568.
- (2012) Neural Computat , vol.24 , pp. 1553-1568
- Gershman, S.J.¹ Moore, C.D.² Todd, M.T.³ Norman, K.A.⁴ Sederberg, P.B.⁵

91
- 0001158047
- Improving generalization for temporal difference learning: the successor representation
- Dayan P. Improving generalization for temporal difference learning: the successor representation. Neural Computat 1993, 5:613-624.
- (1993) Neural Computat , vol.5 , pp. 613-624
- Dayan, P.¹

92
- 34347343926
- Lateral habenula as a source of negative reward signals in dopamine neurons
- Matsumoto M., Hikosaka O. Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 2007, 447:1111-1115.
- (2007) Nature , vol.447 , pp. 1111-1115
- Matsumoto, M.¹ Hikosaka, O.²

93
- 58149127270
- Representation of negative motivational value in the primate lateral habenula
- Matsumoto M., Hikosaka O. Representation of negative motivational value in the primate lateral habenula. Nature Neurosci 2009, 12:77-84.
- (2009) Nature Neurosci , vol.12 , pp. 77-84
- Matsumoto, M.¹ Hikosaka, O.²

94
- 84953756054
- Action initiation shapes mesolimbic dopamine encoding of future rewards
- Syed E.C., Grima L.L., Magill P.J., Bogacz R., Brown P., Walton M.E. Action initiation shapes mesolimbic dopamine encoding of future rewards. Nature Neurosci 2016, 19:34-36.
- (2016) Nature Neurosci , vol.19 , pp. 34-36
- Syed, E.C.¹ Grima, L.L.² Magill, P.J.³ Bogacz, R.⁴ Brown, P.⁵ Walton, M.E.⁶

95
- 0242600534
- Subsecond dopamine release promotes cocaine seeking
- Phillips P.E., Stuber G.D., Heien M.L., Wightman R.M., Carelli R.M. Subsecond dopamine release promotes cocaine seeking. Nature 2003, 422:614-618.
- (2003) Nature , vol.422 , pp. 614-618
- Phillips, P.E.¹ Stuber, G.D.² Heien, M.L.³ Wightman, R.M.⁴ Carelli, R.M.⁵

96
- 10344250993
- By carrot or by stick: cognitive reinforcement learning in parkinsonism
- Frank M.J., Seeberger L.C., O'Reilly R.C. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 2004, 306:1940-1943.
- (2004) Science , vol.306 , pp. 1940-1943
- Frank, M.J.¹ Seeberger, L.C.² O'Reilly, R.C.³

97
- 84921728992
- Striatal D1 and D2 signaling differentially predict learning from positive and negative outcomes
- Cox S.M., Frank M.J., Larcher K., Fellows L.K., Clark C.A., Leyton M., Dagher A. Striatal D1 and D2 signaling differentially predict learning from positive and negative outcomes. Neuroimage 2015, 109:95-101.
- (2015) Neuroimage , vol.109 , pp. 95-101
- Cox, S.M.¹ Frank, M.J.² Larcher, K.³ Fellows, L.K.⁴ Clark, C.A.⁵ Leyton, M.⁶ Dagher, A.⁷

98
- 84866055418
- Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value
- Tai L.-H., Lee A.M., Benavidez N., Bonci A., Wilbrecht L. Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value. Nature Neurosci 2012, 15:1281-1289.
- (2012) Nature Neurosci , vol.15 , pp. 1281-1289
- Tai, L.-H.¹ Lee, A.M.² Benavidez, N.³ Bonci, A.⁴ Wilbrecht, L.⁵

99
- 84873733797
- Concurrent activation of striatal direct and indirect pathways during action initiation
- Cui G., Jun S.B., Jin X., Pham M.D., Vogel S.S., Lovinger D.M., Costa R.M. Concurrent activation of striatal direct and indirect pathways during action initiation. Nature 2013, 494:238-242.
- (2013) Nature , vol.494 , pp. 238-242
- Cui, G.¹ Jun, S.B.² Jin, X.³ Pham, M.D.⁴ Vogel, S.S.⁵ Lovinger, D.M.⁶ Costa, R.M.⁷

100
- 84904961722
- Direct and indirect pathways of basal ganglia: a critical reappraisal
- Calabresi P., Picconi B., Tozzi A., Ghiglieri V., Di Filippo M. Direct and indirect pathways of basal ganglia: a critical reappraisal. Nature Neurosci 2014, 17:1022-1030.
- (2014) Nature Neurosci , vol.17 , pp. 1022-1030
- Calabresi, P.¹ Picconi, B.² Tozzi, A.³ Ghiglieri, V.⁴ Di Filippo, M.⁵

101
- 28444472936
- Neural bases of food-seeking: affect, arousal and reward in corticostriatolimbic circuits
- Balleine B.W. Neural bases of food-seeking: affect, arousal and reward in corticostriatolimbic circuits. Physiol Behav 2005, 86:717-730.
- (2005) Physiol Behav , vol.86 , pp. 717-730
- Balleine, B.W.¹

102
- 84891681994
- A causal link between prediction errors, dopamine neurons and learning
- Steinberg E.E., Keiflin R., Boivin J.R., Witten I.B., Deisseroth K., Janak P.H. A causal link between prediction errors, dopamine neurons and learning. Nature Neurosci 2013, 16:966-973.
- (2013) Nature Neurosci , vol.16 , pp. 966-973
- Steinberg, E.E.¹ Keiflin, R.² Boivin, J.R.³ Witten, I.B.⁴ Deisseroth, K.⁵ Janak, P.H.⁶

103
- 84953837909
- Brief optogenetic inhibition of dopamine neurons mimics endogenous negative reward prediction errors
- Chang C.Y., Esber G.R., Marrero-Garcia Y., Yau H.-J., Bonci A., Schoenbaum G. Brief optogenetic inhibition of dopamine neurons mimics endogenous negative reward prediction errors. Nature Neurosci 2016, 19:111-116.
- (2016) Nature Neurosci , vol.19 , pp. 111-116
- Chang, C.Y.¹ Esber, G.R.² Marrero-Garcia, Y.³ Yau, H.-J.⁴ Bonci, A.⁵ Schoenbaum, G.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.