메뉴 건너뛰기




Volumn , Issue , 2009, Pages 385-392

Temporal difference based actor critic learning - Convergence and neural implementation

Author keywords

[No Author keywords available]

Indexed keywords

AMINES; BIOINFORMATICS; LEARNING ALGORITHMS; NEUROPHYSIOLOGY;

EID: 79959855306     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (14)

References (24)
  • 1
    • 34548049545 scopus 로고    scopus 로고
    • Reinforcement learning, spike time dependent plasticity and the bcm rule
    • D. Baras and R. Meir. Reinforcement learning, spike time dependent plasticity and the bcm rule. Neural Comput., 19(8):22452279, 2007
    • (2007) Neural Comput. , vol.19 , Issue.8 , pp. 22452279
    • Baras, D.1    Meir, R.2
  • 2
    • 2542506169 scopus 로고    scopus 로고
    • (Technical rep.). Canberra: Research School of Information Sciences and Engineering Australian National University
    • J. Baxter and P.L. Bartlett. Hebbian synaptic modifications in spiking neurons that learn. (Technical rep.). Canberra: Research School of Information Sciences and Engineering, Australian National University, 1999.
    • (1999) Hebbian Synaptic Modifications in Spiking Neurons That Learn
    • Baxter, J.1    Bartlett, P.L.2
  • 5
    • 85162049326 scopus 로고    scopus 로고
    • Incremental natural actor-critic algorithms
    • J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors. MIT Press, Cambridge, MA
    • S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and M. Lee. Incremental natural actor-critic algorithms. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 105-112. MIT Press, Cambridge, MA, 2008.
    • (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 105-112
    • Bhatnagar, S.1    Sutton, R.2    Ghavamzadeh, M.3    Lee, M.4
  • 7
    • 0031076413 scopus 로고    scopus 로고
    • Stochastic approximation with two time scales
    • V.S. Borkar. Stochastic approximation with two time scales. Syst. Control Lett., 29(5):291294, 1997.
    • (1997) Syst. Control Lett. , vol.29 , Issue.5 , pp. 291294
    • Borkar, V.S.1
  • 9
    • 34249708388 scopus 로고    scopus 로고
    • Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity
    • R.V. Florian. Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Computation, 19:14681502, 2007.
    • (2007) Neural Computation , vol.19 , pp. 14681502
    • Florian, R.V.1
  • 12
    • 34948906745 scopus 로고    scopus 로고
    • Solving the distal reward problem through linkage of STDP and dopamine signaling
    • DOI 10.1093/cercor/bhl152
    • E.M. Izhikevich. Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling. Cerebral Cortex, 17(10):2443-52, 2007. (Pubitemid 47517479)
    • (2007) Cerebral Cortex , vol.17 , Issue.10 , pp. 2443-2452
    • Izhikevich, E.M.1
  • 15
    • 0035249254 scopus 로고    scopus 로고
    • Simulation-based optimization of markov reward processes
    • P. Marbach and J. Tsitsiklis. Simulation-Based Optimization of Markov Reward Processes. IEEE. Trans. Auto. Cont., 46:191-209, 1998.
    • (1998) IEEE. Trans. Auto. Cont. , vol.46 , pp. 191-209
    • Marbach, P.1    Tsitsiklis, J.2
  • 16
    • 0029981543 scopus 로고    scopus 로고
    • A framework for mesencephalic dopamine systems based on predictive hebbian learning
    • P.R. Montague, P. Dayan, and T.J. Sejnowski. A framework for mesencephalic dopamine systems based on predictive hebbian learning. Journal of Neuroscience, 16:19361947, 1996.
    • (1996) Journal of Neuroscience , vol.16 , pp. 19361947
    • Montague, P.R.1    Dayan, P.2    Sejnowski, T.J.3
  • 17
    • 1942520195 scopus 로고    scopus 로고
    • Dissociable roles of ventral and dorsal striatum in instrumental conditioning
    • J. ODoherty, P. Dayan, J. Schultz, R. Deichmann, K. Friston, and R.J. Dolan. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304:452454, 2004.
    • (2004) Science , vol.304 , pp. 452454
    • Odoherty, J.1    Dayan, P.2    Schultz, J.3    Deichmann, R.4    Friston, K.5    Dolan, R.J.6
  • 18
    • 0036592025 scopus 로고    scopus 로고
    • Dopamine-dependent plasticity of corticostriatal synapses
    • J.N.J. Reynolds and J.R.Wickens. Dopamine-dependent plasticity of corticostriatal synapses. Neural Networks, 15(4-6):507521, 2002.
    • (2002) Neural Networks , vol.15 , Issue.4-6 , pp. 507521
    • Reynolds, J.N.J.1    Wickens, J.R.2
  • 19
    • 0036121949 scopus 로고    scopus 로고
    • Development, learning and memory in large random networks of cortical neurons: Lessons beyond anatomy
    • S. Marom and G. Shahaf. Development, learning and memory in large random networks of cortical neurons: lessons beyond anatomy. Quarterly Reviews of Biophysics, 35:6387, 2002.
    • (2002) Quarterly Reviews of Biophysics , vol.35 , pp. 6387
    • Marom, S.1    Shahaf, G.2
  • 20
    • 0034576323 scopus 로고    scopus 로고
    • Multiple reward signals in the brain
    • Dec
    • W. Schultz. Multiple reward signals in the brain. Nature Reviews Neuroscience, 1:199207, Dec. 2000.
    • (2000) Nature Reviews Neuroscience , vol.1 , pp. 199207
    • Schultz, W.1
  • 21
    • 0032114627 scopus 로고    scopus 로고
    • Analytical mean squared error curves for temporal difference learning
    • S. Singh and P. Dayan. Analytical mean squared error curves for temporal difference learning. Machine Learning, 32:540, 1998.
    • (1998) Machine Learning , vol.32 , pp. 540
    • Singh, S.1    Dayan, P.2
  • 24
    • 1642534402 scopus 로고    scopus 로고
    • Modulation of caudate activity by action contingency
    • E.M. Tricomi, M.R. Delgado, and J.A. Fiez. Modulation of caudate activity by action contingency. Neuron, 41(2):281292, 2004.
    • (2004) Neuron , vol.41 , Issue.2 , pp. 281292
    • Tricomi, E.M.1    Delgado, M.R.2    Fiez, J.A.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.