메뉴 건너뛰기




Volumn 9, Issue 4, 2013, Pages

Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

Author keywords

[No Author keywords available]

Indexed keywords

AMINES; ANIMALS; COMPUTATION THEORY; CONTINUOUS TIME SYSTEMS; FORECASTING; REINFORCEMENT LEARNING;

EID: 84876888983     PISSN: 1553734X     EISSN: 15537358     Source Type: Journal    
DOI: 10.1371/journal.pcbi.1003024     Document Type: Article
Times cited : (180)

References (62)
  • 2
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton RS, (1988) Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 3
    • 0000337576 scopus 로고
    • Simple statistical gradient-following methods for connectionist reinforcement learning
    • Williams R, (1992) Simple statistical gradient-following methods for connectionist reinforcement learning. Machine Learning 8: 229-256.
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.1
  • 4
    • 37649027755 scopus 로고    scopus 로고
    • Learning in neural networks by reinforcement of irregular spiking
    • Xie X, Seung H, (2004) Learning in neural networks by reinforcement of irregular spiking. Physical Review E 69: 41909.
    • (2004) Physical Review E , vol.69 , pp. 41909
    • Xie, X.1    Seung, H.2
  • 6
    • 34249708388 scopus 로고    scopus 로고
    • Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity
    • Florian RV, (2007) Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Computation 19: 1468-1502.
    • (2007) Neural Computation , vol.19 , pp. 1468-1502
    • Florian, R.V.1
  • 7
    • 0030896968 scopus 로고    scopus 로고
    • A neural substrate of prediction and reward
    • Schultz W, Dayan P, Montague PR, (1997) A neural substrate of prediction and reward. Science 275: 1593-1599.
    • (1997) Science , vol.275 , pp. 1593-1599
    • Schultz, W.1    Dayan, P.2    Montague, P.R.3
  • 8
    • 0029655991 scopus 로고    scopus 로고
    • Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex in vitro
    • Wickens JR, Begg AJ, Arbuthnott GW, (1996) Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex in vitro. Neuroscience 70: 1-5.
    • (1996) Neuroscience , vol.70 , pp. 1-5
    • Wickens, J.R.1    Begg, A.J.2    Arbuthnott, G.W.3
  • 9
    • 0034638268 scopus 로고    scopus 로고
    • Substantia nigra dopamine regulates synaptic plasticity and membrane potential uctuations in the rat neostriatum, in vivo
    • Reynolds JNJ, Wickens JR, (2000) Substantia nigra dopamine regulates synaptic plasticity and membrane potential uctuations in the rat neostriatum, in vivo. Neuroscience 99: 199-203.
    • (2000) Neuroscience , vol.99 , pp. 199-203
    • Reynolds, J.N.J.1    Wickens, J.R.2
  • 10
    • 0035817882 scopus 로고    scopus 로고
    • A cellular mechanism of reward-related learning
    • Reynolds JNJ, Hyland BI, Wickens JR, (2001) A cellular mechanism of reward-related learning. Nature 413: 67-70.
    • (2001) Nature , vol.413 , pp. 67-70
    • Reynolds, J.N.J.1    Hyland, B.I.2    Wickens, J.R.3
  • 11
    • 0036592025 scopus 로고    scopus 로고
    • Dopamine-dependent plasticity of corticostriatal synapses
    • Reynolds JNJ, Wickens JR, (2002) Dopamine-dependent plasticity of corticostriatal synapses. Neural Netw 15: 507-521.
    • (2002) Neural Netw , vol.15 , pp. 507-521
    • Reynolds, J.N.J.1    Wickens, J.R.2
  • 12
    • 40449100017 scopus 로고    scopus 로고
    • Dopamine receptor activation is required for corticostriatal spiketiming-dependent plasticity
    • Pawlak V, Kerr JND, (2008) Dopamine receptor activation is required for corticostriatal spiketiming-dependent plasticity. J Neurosci 28: 2435-2446.
    • (2008) J Neurosci , vol.28 , pp. 2435-2446
    • Pawlak, V.1    Kerr, J.N.D.2
  • 13
    • 69149103505 scopus 로고    scopus 로고
    • Gain in sensitivity and loss in temporal contrast of STDP by dopaminergic modulation at hippocampal synapses
    • Zhang JC, Lau PM, Bi GQ, (2009) Gain in sensitivity and loss in temporal contrast of STDP by dopaminergic modulation at hippocampal synapses. PNAS 106: 13028-13033.
    • (2009) PNAS , vol.106 , pp. 13028-13033
    • Zhang, J.C.1    Lau, P.M.2    Bi, G.Q.3
  • 15
    • 67650298948 scopus 로고    scopus 로고
    • A spiking neural network model of an actor-critic learning agent
    • Potjans W, Morrison A, Diesmann M, (2009) A spiking neural network model of an actor-critic learning agent. Neural Computation 21: 301-339.
    • (2009) Neural Computation , vol.21 , pp. 301-339
    • Potjans, W.1    Morrison, A.2    Diesmann, M.3
  • 16
    • 74549209037 scopus 로고    scopus 로고
    • Spike-based reinforcement learning in continuous state and action space: When policy gradient methods fail
    • Vasilaki E, Frémaux N, Urbanczik R, Senn W, Gerstner W, (2009) Spike-based reinforcement learning in continuous state and action space: When policy gradient methods fail. PLoS Comput Biol 5: e1000586.
    • (2009) PLoS Comput Biol , vol.5
    • Vasilaki, E.1    Frémaux, N.2    Urbanczik, R.3    Senn, W.4    Gerstner, W.5
  • 19
    • 0033629916 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space
    • Doya K, (2000) Reinforcement learning in continuous time and space. Neural Computation 12: 219-245.
    • (2000) Neural Computation , vol.12 , pp. 219-245
    • Doya, K.1
  • 20
    • 0034276719 scopus 로고    scopus 로고
    • Spatial cognition and neuro-mimetic navigation: a model of hippocampal place cell activity
    • Arleo A, GerstnerW (2000) Spatial cognition and neuro-mimetic navigation: a model of hippocampal place cell activity. Biological Cybernetics 83: 287-299.
    • (2000) Biological Cybernetics , vol.83 , pp. 287-299
    • Arleo, A.1    Gerstner, W.2
  • 21
    • 0033968832 scopus 로고    scopus 로고
    • Models of hippocampally dependent navigation using the temporal difference learning rule
    • Foster D, Morris R, Dayan P, (2000) Models of hippocampally dependent navigation using the temporal difference learning rule. Hippocampus 10: 1-16.
    • (2000) Hippocampus , vol.10 , pp. 1-16
    • Foster, D.1    Morris, R.2    Dayan, P.3
  • 22
    • 0015145985 scopus 로고
    • The hippocampus as a spatial map: preliminary evidence from unit activity in the freely-moving rat
    • O'Keefe J, Nadal L, (1971) The hippocampus as a spatial map: preliminary evidence from unit activity in the freely-moving rat. Brain Res 34: 171-175.
    • (1971) Brain Res , vol.34 , pp. 171-175
    • O'Keefe, J.1    Nadal, L.2
  • 24
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • MIT Press
    • Sutton RS (1996) Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Advances in Neural Information Processing Systems 8. MIT Press, pp. 1038-1044.
    • (1996) Advances in Neural Information Processing Systems 8 , pp. 1038-1044
    • Sutton, R.S.1
  • 25
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • In: Prieditis A, Russell S, editors, San Francisco, CA.: Morgan Kaufmann
    • Baird LC (1995) Residual algorithms: Reinforcement learning with function approximation. In: Prieditis A, Russell S, editors, Proceedings of the Twelfth International Conference on Machine Learning. San Francisco, CA.: Morgan Kaufmann., pp. 30-37.
    • (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 30-37
    • Baird, L.C.1
  • 26
    • 79957749002 scopus 로고
    • Reinforcement learning applied to a differential game
    • Harmon ME, Baird LC, Klopf AH, (1995) Reinforcement learning applied to a differential game. Adaptive Behavior 4: 3-28.
    • (1995) Adaptive Behavior , vol.4 , pp. 3-28
    • Harmon, M.E.1    Baird, L.C.2    Klopf, A.H.3
  • 28
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general λ
    • Dayan P, (1992) The convergence of TD(λ) for general λ. Machine learning 8: 341-362.
    • (1992) Machine Learning , vol.8 , pp. 341-362
    • Dayan, P.1
  • 29
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis JN, Van Roy B, (1997) An analysis of temporal-difference learning with function approximation. Automatic Control, IEEE Transactions on 42: 674-690.
    • (1997) Automatic Control, IEEE Transactions on , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 30
    • 34948906745 scopus 로고    scopus 로고
    • Solving the distal reward problem through linkage of STDP and dopamine signaling
    • Izhikevich E, (2007) Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral Cortex 17: 2443-2452.
    • (2007) Cerebral Cortex , vol.17 , pp. 2443-2452
    • Izhikevich, E.1
  • 31
    • 55449121121 scopus 로고    scopus 로고
    • A learning theory for reward-modulated spike-timingdependent plasticity with application to biofeedback
    • Legenstein R, Pecevski D, Maass W, (2008) A learning theory for reward-modulated spike-timingdependent plasticity with application to biofeedback. PLOS Comput Biol 4: e1000180.
    • (2008) PLOS Comput Biol , vol.4
    • Legenstein, R.1    Pecevski, D.2    Maass, W.3
  • 32
    • 77957731196 scopus 로고    scopus 로고
    • Functional requirements for reward-modulated spiketiming-dependent plasticity
    • Frémaux N, Sprekeler H, Gerstner W, (2010) Functional requirements for reward-modulated spiketiming-dependent plasticity. The Journal of Neuroscience 30: 13326-13337.
    • (2010) The Journal of Neuroscience , vol.30 , pp. 13326-13337
    • Frémaux, N.1    Sprekeler, H.2    Gerstner, W.3
  • 33
    • 0029821128 scopus 로고    scopus 로고
    • A neuronal learning rule for submillisecond temporal coding
    • Gerstner W, Kempter R, van Hemmen J, Wagner H, (1996) A neuronal learning rule for submillisecond temporal coding. Nature 383: 76-78.
    • (1996) Nature , vol.383 , pp. 76-78
    • Gerstner, W.1    Kempter, R.2    van Hemmen, J.3    Wagner, H.4
  • 34
    • 0031012615 scopus 로고    scopus 로고
    • Regulation of synaptic efficacy by coincidence of postysnaptic AP and EPSP
    • Markram H, Lübke J, Frotscher M, Sakmann B, (1997) Regulation of synaptic efficacy by coincidence of postysnaptic AP and EPSP. Science 275: 213-215.
    • (1997) Science , vol.275 , pp. 213-215
    • Markram, H.1    Lübke, J.2    Frotscher, M.3    Sakmann, B.4
  • 35
    • 0032535029 scopus 로고    scopus 로고
    • Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type
    • Bi G, Poo M, (1998) Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J Neurosci 18: 10464-10472.
    • (1998) J Neurosci , vol.18 , pp. 10464-10472
    • Bi, G.1    Poo, M.2
  • 36
    • 0033860923 scopus 로고    scopus 로고
    • Competitive Hebbian learning through spike-time-dependent synaptic plasticity
    • Song S, Miller K, Abbott L, (2000) Competitive Hebbian learning through spike-time-dependent synaptic plasticity. Nature Neuroscience 3: 919-926.
    • (2000) Nature Neuroscience , vol.3 , pp. 919-926
    • Song, S.1    Miller, K.2    Abbott, L.3
  • 37
    • 0023789678 scopus 로고
    • Primate motor cortex and free arm movements to visual targets in three- dimensional space. II. Coding of the direction of movement by a neuronal population
    • Georgopoulos A, Kettner R, Schwartz A, (1988) Primate motor cortex and free arm movements to visual targets in three- dimensional space. II. Coding of the direction of movement by a neuronal population. J Neurosci 8: 2928-2937.
    • (1988) J Neurosci , vol.8 , pp. 2928-2937
    • Georgopoulos, A.1    Kettner, R.2    Schwartz, A.3
  • 38
    • 33646801243 scopus 로고    scopus 로고
    • Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning
    • Pfister JP, Toyoizumi T, Barber D, Gerstner W, (2006) Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning. Neural Comp 18: 1318-1348.
    • (2006) Neural Comp , vol.18 , pp. 1318-1348
    • Pfister, J.P.1    Toyoizumi, T.2    Barber, D.3    Gerstner, W.4
  • 39
    • 77954098778 scopus 로고    scopus 로고
    • A reward-modulated hebbian learning rule can explain experimentally observed network reorganization in a brain control task
    • Legenstein R, Chase SM, Schwartz AB, MaassW (2010) A reward-modulated hebbian learning rule can explain experimentally observed network reorganization in a brain control task. The Journal of Neuroscience 30: 8400-8410.
    • (2010) The Journal of Neuroscience , vol.30 , pp. 8400-8410
    • Legenstein, R.1    Chase, S.M.2    Schwartz, A.B.3    Maass, W.4
  • 41
    • 0000827179 scopus 로고
    • In: Dale E, Michie D, editors, Machine Intelligence 2. Edinburgh: Oliver and Boyd
    • Michie D, Chambers R (1968) Boxes: An experiment in adaptive control. In: Dale E, Michie D, editors, Machine Intelligence 2. Edinburgh: Oliver and Boyd. pp. 137-152.
    • (1968) Boxes: An experiment in adaptive control , pp. 137-152
    • Michie, D.1    Chambers, R.2
  • 42
    • 0002861883 scopus 로고
    • A model of how the basal ganglia generate and use neural signals that predict reinforcement
    • In: Houk JC, Davis JL, Beiser DG, editors, Cambridge: MIT Press
    • Houk J, Adams J, Barto A (1995) A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Houk JC, Davis JL, Beiser DG, editors, Models on Information Processing in the Basal Ganglia, Cambridge: MIT Press. pp. 249-270.
    • (1995) Models on Information Processing in the Basal Ganglia , pp. 249-270
    • Houk, J.1    Adams, J.2    Barto, A.3
  • 43
    • 0036592026 scopus 로고    scopus 로고
    • Actor-critic models of the basal ganglia: new anatomical and computational perspectives
    • Joel D, Niv Y, Ruppin E, (2002) Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Networks 15: 535-547.
    • (2002) Neural Networks , vol.15 , pp. 535-547
    • Joel, D.1    Niv, Y.2    Ruppin, E.3
  • 44
    • 79951967897 scopus 로고    scopus 로고
    • Theta phase precession in rat ventral striatum links place and reward information
    • van der Meer MAA, Redish AD, (2011) Theta phase precession in rat ventral striatum links place and reward information. The Journal of Neuroscience 31: 2843-2854.
    • (2011) The Journal of Neuroscience , vol.31 , pp. 2843-2854
    • van der Meer, M.A.A.1    Redish, A.D.2
  • 45
    • 33644688754 scopus 로고    scopus 로고
    • Dopamine neurons report an error in the temporal prediction of reward during learning
    • Hollerman J, Schultz W, (1998) Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience 1: 304-309.
    • (1998) Nature Neuroscience , vol.1 , pp. 304-309
    • Hollerman, J.1    Schultz, W.2
  • 46
    • 79958078227 scopus 로고    scopus 로고
    • An imperfect dopaminergic error signal can drive temporal-difference learning
    • Potjans W, Diesmann M, Morrison A, (2011) An imperfect dopaminergic error signal can drive temporal-difference learning. PLoS Comput Biol 7: e1001133.
    • (2011) PLoS Comput Biol , vol.7
    • Potjans, W.1    Diesmann, M.2    Morrison, A.3
  • 47
    • 38449104511 scopus 로고    scopus 로고
    • Differential regulation of fronto-executive function by the monoamines and acetylcholine
    • Robbins T, Roberts A, (2007) Differential regulation of fronto-executive function by the monoamines and acetylcholine. Cerebral Cortex 17: i151-i160.
    • (2007) Cerebral Cortex , vol.17
    • Robbins, T.1    Roberts, A.2
  • 48
    • 45549109997 scopus 로고    scopus 로고
    • Reward-dependent modulation of neuronal activity in the primate dorsal raphe nucleus
    • Nakamura K, Matsumoto M, Hikosaka O, (2008) Reward-dependent modulation of neuronal activity in the primate dorsal raphe nucleus. The Journal of Neuroscience 28: 5331-5343.
    • (2008) The Journal of Neuroscience , vol.28 , pp. 5331-5343
    • Nakamura, K.1    Matsumoto, M.2    Hikosaka, O.3
  • 49
    • 78651513632 scopus 로고    scopus 로고
    • Activation of dorsal raphe serotonin neurons underlies waiting for delayed rewards
    • Miyazaki K, Miyazaki KW, Doya K, (2011) Activation of dorsal raphe serotonin neurons underlies waiting for delayed rewards. The Journal of Neuroscience 31: 469-479.
    • (2011) The Journal of Neuroscience , vol.31 , pp. 469-479
    • Miyazaki, K.1    Miyazaki, K.W.2    Doya, K.3
  • 50
    • 84856431209 scopus 로고    scopus 로고
    • Neuron-type-specific signals for reward and punishment in the ventral tegmental area
    • Cohen JY, Haesler S, Vong L, Lowell BB, Uchida N, (2012) Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482: 85-88.
    • (2012) Nature , vol.482 , pp. 85-88
    • Cohen, J.Y.1    Haesler, S.2    Vong, L.3    Lowell, B.B.4    Uchida, N.5
  • 53
    • 41849100338 scopus 로고    scopus 로고
    • Robustness of learning that is based on covariance-driven synaptic plasticity
    • Loewenstein Y, (2008) Robustness of learning that is based on covariance-driven synaptic plasticity. PLoS Comput Biol 4: e1000007.
    • (2008) PLoS Comput Biol , vol.4
    • Loewenstein, Y.1
  • 54
    • 34548552568 scopus 로고    scopus 로고
    • Neuromodulators control the polarity of spike-timing-dependent synaptic plasticity
    • Seol GH, Ziburkus J, Huang S, Song L, Kim IT, et al. (2007) Neuromodulators control the polarity of spike-timing-dependent synaptic plasticity. Neuron 55: 919-929.
    • (2007) Neuron , vol.55 , pp. 919-929
    • Seol, G.H.1    Ziburkus, J.2    Huang, S.3    Song, L.4    Kim, I.T.5
  • 56
    • 0004524808 scopus 로고
    • Hierarchical model of memory and memory loss
    • Sutton JP, Beis JS, Trainor LEH, (1988) Hierarchical model of memory and memory loss. J Phys A 21: 4443-4454.
    • (1988) J Phys A , vol.21 , pp. 4443-4454
    • Sutton, J.P.1    Beis, J.S.2    Trainor, L.E.H.3
  • 57
    • 0031024891 scopus 로고    scopus 로고
    • Synaptic tagging and long-term potentiation
    • Frey U, Morris R, (1997) Synaptic tagging and long-term potentiation. Nature 385: 533-536.
    • (1997) Nature , vol.385 , pp. 533-536
    • Frey, U.1    Morris, R.2
  • 58
    • 58149165031 scopus 로고    scopus 로고
    • Tag-trigger-consolidation: A model of early and late long-term-potentiation and depression
    • Clopath C, Ziegler L, Vasilaki E, Buesing L, Gerstner W, (2008) Tag-trigger-consolidation: A model of early and late long-term-potentiation and depression. PLoS Comput Biol 4: e1000248.
    • (2008) PLoS Comput Biol , vol.4
    • Clopath, C.1    Ziegler, L.2    Vasilaki, E.3    Buesing, L.4    Gerstner, W.5
  • 59
    • 0001785024 scopus 로고
    • Cellular models of reinforcement
    • In: Houk J, Davis J, Beiser DG, editors, Cambridge: MIT-Press
    • Wickens JR, Kotter R (1995) Cellular models of reinforcement. In: Houk J, Davis J, Beiser DG, editors, Models of information processing in basal ganglia, Cambridge: MIT-Press. pp. 187-214.
    • (1995) Models of information processing in basal ganglia , pp. 187-214
    • Wickens, J.R.1    Kotter, R.2
  • 61
    • 33745833056 scopus 로고    scopus 로고
    • Predicting spike timing of neocortical pyramidal neurons by simple threshold models
    • Jolivet R, Rauch A, Lüscher HR, Gerstner W, (2006) Predicting spike timing of neocortical pyramidal neurons by simple threshold models. J Computational Neuroscience 21: 35-49.
    • (2006) J Computational Neuroscience , vol.21 , pp. 35-49
    • Jolivet, R.1    Rauch, A.2    Lüscher, H.R.3    Gerstner, W.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.