메뉴 건너뛰기




Volumn 21, Issue 4, 2009, Pages 1173-1202

On the asymptotic equivalence between differential Hebbian and temporal difference learning

Author keywords

[No Author keywords available]

Indexed keywords

ANIMAL; ARTICLE; BIOLOGICAL MODEL; BIOPHYSICS; COMPUTER SIMULATION; LEARNING; NERVE CELL; PHYSIOLOGY; REINFORCEMENT; TIME;

EID: 65549116541     PISSN: 08997667     EISSN: 1530888X     Source Type: Journal    
DOI: 10.1162/neco.2008.04-08-750     Document Type: Letter
Times cited : (6)

References (42)
  • 1
    • 0004370245 scopus 로고
    • (Tech. Rep. WL-TR-93-1146). Ohio: Wright Laboratory, Wright-Patterson Air Force Base
    • Baird, L. (1993). Advantage updating (Tech. Rep. WL-TR-93-1146). Ohio: Wright Laboratory, Wright-Patterson Air Force Base.
    • (1993) Advantage updating
    • Baird, L.1
  • 4
    • 0028388685 scopus 로고
    • TD(λ) converges with probability 1
    • Dayan, P., & Sejnowski, T. (1994). TD(λ) converges with probability 1. Mach. Learn., 14(3), 295-301.
    • (1994) Mach. Learn. , vol.14 , Issue.3 , pp. 295-301
    • Dayan, P.1    Sejnowski, T.2
  • 5
    • 85156231814 scopus 로고    scopus 로고
    • Temporal difference learning in continuous time and space
    • D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
    • Doya, K. (1996). Temporal difference learning in continuous time and space. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1073-1079). Cambridge, MA: MIT Press.
    • (1996) Advances in neural information processing systems , vol.8 , pp. 1073-1079
    • Doya, K.1
  • 6
    • 0034524427 scopus 로고    scopus 로고
    • Complementary roles of basal ganglia and cerebellum in learning and motor control
    • Doya, K. (2000). Complementary roles of basal ganglia and cerebellum in learning and motor control. Current Opinion in Neurobiology, 10(6), 732-739.
    • (2000) Current Opinion in Neurobiology , vol.10 , Issue.6 , pp. 732-739
    • Doya, K.1
  • 7
    • 0026579349 scopus 로고
    • Homosynaptic long-term depression in area CA1 of hippocampus and effects of N-methyl-D-aspartate receptor blockade
    • Dudek, S., & Bear, M. (1992). Homosynaptic long-term depression in area CA1 of hippocampus and effects of N-methyl-D-aspartate receptor blockade. Proceedings of the National Academy of Sciences, 89(10), 4363-4367.
    • (1992) Proceedings of the National Academy of Sciences , vol.89 , Issue.10 , pp. 4363-4367
    • Dudek, S.1    Bear, M.2
  • 8
    • 34249708388 scopus 로고    scopus 로고
    • Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity
    • Florian, R. V. (2007). Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Computation, 19, 1468-1502.
    • (2007) Neural Computation , vol.19 , pp. 1468-1502
    • Florian, R.V.1
  • 9
    • 0029821128 scopus 로고    scopus 로고
    • A neuronal learning rule for sub-millisecond temporal coding
    • Gerstner, W., Kempter, R., van Hemmen, L., & Wagner, H. (1996). A neuronal learning rule for sub-millisecond temporal coding. Nature, 383, 76-78.
    • (1996) Nature , vol.383 , pp. 76-78
    • Gerstner, W.1    Kempter, R.2    van Hemmen, L.3    Wagner, H.4
  • 10
    • 0032123567 scopus 로고    scopus 로고
    • The basal ganglia and chunking of action repertoires
    • Graybiel, A. (1998). The basal ganglia and chunking of action repertoires. Neurobiol. Learn. Mem., 70(1-2), 119-136.
    • (1998) Neurobiol. Learn. Mem. , vol.70 , Issue.1-2 , pp. 119-136
    • Graybiel, A.1
  • 11
    • 0035015792 scopus 로고    scopus 로고
    • Influence of expectation of different rewards on behavior-related neuronal activity in the striatum
    • Hassani, O. K., Cromwell, H. C., & Schultz, W. (2001). Influence of expectation of different rewards on behavior-related neuronal activity in the striatum. J. Neurophysiol., 85(6), 2477-2489.
    • (2001) J. Neurophysiol. , vol.85 , Issue.6 , pp. 2477-2489
    • Hassani, O.K.1    Cromwell, H.C.2    Schultz, W.3
  • 12
    • 0020118274 scopus 로고
    • Neural networks and physical systems with emergent collective computational abilities
    • Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79, 2554-2558.
    • (1982) Proceedings of the National Academy of Sciences , vol.79 , pp. 2554-2558
    • Hopfield, J.J.1
  • 13
    • 34948906745 scopus 로고    scopus 로고
    • Solving the distal reward problem through linkage of STDP and dopamine signaling
    • Izhikevich, E. (2007). Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral Cortex, 17, 2443-2452.
    • (2007) Cerebral Cortex , vol.17 , pp. 2443-2452
    • Izhikevich, E.1
  • 14
    • 0023878618 scopus 로고
    • A neuronal model of classical conditioning
    • Klopf, A. H. (1988). A neuronal model of classical conditioning. Psychobiol., 16(2), 85-123.
    • (1988) Psychobiol. , vol.16 , Issue.2 , pp. 85-123
    • Klopf, A.H.1
  • 15
    • 40149107540 scopus 로고    scopus 로고
    • Mathematical properties of neuronal TD-rules and differential Hebbian learning: A comparison
    • Kolodziejski, C., Porr, B., & Wörgötter, F. (2008). Mathematical properties of neuronal TD-rules and differential Hebbian learning: A comparison. Biological Cybernetics, 98(3), 259-272.
    • (2008) Biological Cybernetics , vol.98 , Issue.3 , pp. 259-272
    • Kolodziejski, C.1    Porr, B.2    Wörgötter, F.3
  • 16
    • 0042276165 scopus 로고
    • Differential Hebbian learning
    • J. S. Denker (Ed.), New York: American Institute of Physics
    • Kosco, B. (1986). Differential Hebbian learning. In J. S. Denker (Ed.), Neural networks for computing: AIP Conference Proc. proceedings (Vol. 151). New York: American Institute of Physics.
    • (1986) Neural networks for computing: AIP Conference Proc. proceedings , vol.151
    • Kosco, B.1
  • 18
    • 0023981750 scopus 로고
    • Self-organisation in a perceptual network
    • Linsker, R. (1988). Self-organisation in a perceptual network. Computer, 21(3), 105-117.
    • (1988) Computer , vol.21 , Issue.3 , pp. 105-117
    • Linsker, R.1
  • 19
    • 0031012615 scopus 로고    scopus 로고
    • Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs
    • Markram, H., Lübke, J., Frotscher, M., & Sakmann, B. (1997). Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science, 275, 213-215.
    • (1997) Science , vol.275 , pp. 213-215
    • Markram, H.1    Lübke, J.2    Frotscher, M.3    Sakmann, B.4
  • 20
    • 0029981543 scopus 로고    scopus 로고
    • A framework for mesencephalic dopamine systems based on predictive Hebbian learning
    • Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 76(5), 1936-1947.
    • (1996) Journal of Neuroscience , vol.76 , Issue.5 , pp. 1936-1947
    • Montague, P.R.1    Dayan, P.2    Sejnowski, T.J.3
  • 21
    • 3242673464 scopus 로고    scopus 로고
    • Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons
    • Morris, G., Arkadir, D., Nevet, A., Vaadia, E., & Bergman, H. (2004). Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron, 43(1), 133-143.
    • (2004) Neuron , vol.43 , Issue.1 , pp. 133-143
    • Morris, G.1    Arkadir, D.2    Nevet, A.3    Vaadia, E.4    Bergman, H.5
  • 22
    • 33747585633 scopus 로고    scopus 로고
    • Midbrain dopamine neurons encode decisions for future action
    • Morris, G., Nevet, A., Arkadir, D., Vaadia, E., & Bergman, H. (2006). Midbrain dopamine neurons encode decisions for future action. Nature Neuroscience, 9(8), 1057-1063.
    • (2006) Nature Neuroscience , vol.9 , Issue.8 , pp. 1057-1063
    • Morris, G.1    Nevet, A.2    Arkadir, D.3    Vaadia, E.4    Bergman, H.5
  • 23
    • 0020464111 scopus 로고
    • A simplified neuron model as a principal component analyzer
    • Oja, E. (1982). A simplified neuron model as a principal component analyzer. J. Math. Biol., 15(3), 267-273.
    • (1982) J. Math. Biol. , vol.15 , Issue.3 , pp. 267-273
    • Oja, E.1
  • 24
    • 40449100017 scopus 로고    scopus 로고
    • Dopamine receptor activation is required for corticostriatal spike-timing-dependent plasticity
    • Pawlak, V., & Kerr, J. N. D. (2008). Dopamine receptor activation is required for corticostriatal spike-timing-dependent plasticity. J. Neurosci., 28(10), 2435-2446.
    • (2008) J. Neurosci. , vol.28 , Issue.10 , pp. 2435-2446
    • Pawlak, V.1    Kerr, J.N.D.2
  • 25
    • 0742301619 scopus 로고    scopus 로고
    • Isotropic-sequence-order learning in a closed-loop behavioural system
    • Porr, B., & Wörgötter, F. (2003). Isotropic-sequence-order learning in a closed-loop behavioural system. Phil. Trans. R. Soc. Lond. A, 361, 2225-2244.
    • (2003) Phil. Trans. R. Soc. Lond. A , vol.361 , pp. 2225-2244
    • Porr, B.1    Wörgötter, F.2
  • 26
    • 35549002871 scopus 로고    scopus 로고
    • Learning with " relevance" : Using a third factor to stabilise Hebbian learning
    • Porr, B., & Wörgötter, F. (2007). Learning with "relevance": Using a third factor to stabilise Hebbian learning. Neural Comp., 19, 2694-2719.
    • (2007) Neural Comp. , vol.19 , pp. 2694-2719
    • Porr, B.1    Wörgötter, F.2
  • 27
    • 67650298948 scopus 로고    scopus 로고
    • A spiking neural network model of an actor-critic learning agent
    • Potjans, W., Morrison, A., & Diesmann, M. (2009). A spiking neural network model of an actor-critic learning agent. Neural Computation, 21(2), 301-339.
    • (2009) Neural Computation , vol.21 , Issue.2 , pp. 301-339
    • Potjans, W.1    Morrison, A.2    Diesmann, M.3
  • 28
    • 0035489925 scopus 로고    scopus 로고
    • Spike-timing-dependent Hebbian plasticity as temporal difference learning
    • Rao, R., & Sejnowski, T. (2001). Spike-timing-dependent Hebbian plasticity as temporal difference learning. Neural Computation, 13, 2221-2237.
    • (2001) Neural Computation , vol.13 , pp. 2221-2237
    • Rao, R.1    Sejnowski, T.2
  • 29
    • 33751184634 scopus 로고    scopus 로고
    • The short-latency dopamine signal: A role in discovering novel actions?
    • Redgrave, P., & Gurney, K. (2006). The short-latency dopamine signal: A role in discovering novel actions? Nature Reviews Neuroscience, 7, 967-975.
    • (2006) Nature Reviews Neuroscience , vol.7 , pp. 967-975
    • Redgrave, P.1    Gurney, K.2
  • 30
    • 0032696609 scopus 로고    scopus 로고
    • Computational consequences of temporally asymmetric learning rules: I. Differential Hebbian learning
    • Roberts, P. (1999). Computational consequences of temporally asymmetric learning rules: I. Differential Hebbian learning. J. Comput. Neurosci., 7(3), 235-246.
    • (1999) J. Comput. Neurosci. , vol.7 , Issue.3 , pp. 235-246
    • Roberts, P.1
  • 31
    • 57049100874 scopus 로고    scopus 로고
    • An implementation of reinforcement learning based on spike-timing dependent plasticity
    • Roberts, P., Santiago, R., & Lafferriere, G. (2008). An implementation of reinforcement learning based on spike-timing dependent plasticity. Biological Cybernetics, 99(6), 517-523.
    • (2008) Biological Cybernetics , vol.99 , Issue.6 , pp. 517-523
    • Roberts, P.1    Santiago, R.2    Lafferriere, G.3
  • 32
    • 0031867046 scopus 로고    scopus 로고
    • Predictive reward signal of dopamine neurons
    • Schultz, W. (1998). Predictive reward signal of dopamine neurons. J. Neurophysiol., 80, 1-27.
    • (1998) J. Neurophysiol. , vol.80 , pp. 1-27
    • Schultz, W.1
  • 33
    • 0026442752 scopus 로고
    • Neuronal activity in monkey ventral striatum related to the expectation of reward
    • Schultz, W., Apicella, P., Scarnati, E., & Ljungberg, T. (1992). Neuronal activity in monkey ventral striatum related to the expectation of reward. J. Neurosci., 12(12), 4595-4610.
    • (1992) J. Neurosci. , vol.12 , Issue.12 , pp. 4595-4610
    • Schultz, W.1    Apicella, P.2    Scarnati, E.3    Ljungberg, T.4
  • 34
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • Singh, S. P., Jaakkola, T., Littman, M. L., & Szepesvári, C. (2000). Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3), 287-308.
    • (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
    • Singh, S.P.1    Jaakkola, T.2    Littman, M.L.3    Szepesvári, C.4
  • 35
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3, 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 36
    • 0019537951 scopus 로고
    • Towards a modern theory of adaptive networks: Expectation and prediction
    • Sutton, R., & Barto, A. (1981). Towards a modern theory of adaptive networks: Expectation and prediction. Psychol. Review, 88, 135-170.
    • (1981) Psychol. Review , vol.88 , pp. 135-170
    • Sutton, R.1    Barto, A.2
  • 38
    • 53149129441 scopus 로고    scopus 로고
    • Path-finding in real and simulated rats: On the usefulness of forgetting and frustration for navigation learning
    • Tamosiunaite, M., Ainge, J., Kulvicius, T., Porr, B., Dudchenko, P., & Wörgötter, F. (2008). Path-finding in real and simulated rats: On the usefulness of forgetting and frustration for navigation learning. J. Comp. Neuroscience, 25, 562-582.
    • (2008) J. Comp. Neuroscience , vol.25 , pp. 562-582
    • Tamosiunaite, M.1    Ainge, J.2    Kulvicius, T.3    Porr, B.4    Dudchenko, P.5    Wörgötter, F.6
  • 39
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5), 674-690.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 40
    • 34249833101 scopus 로고
    • Technical note: Q-learning
    • Watkins, C., & Dayan, P. (1992). Technical note: Q-learning. Mach. Learn., 8, 279-292.
    • (1992) Mach. Learn. , vol.8 , pp. 279-292
    • Watkins, C.1    Dayan, P.2
  • 41
    • 0002278965 scopus 로고
    • Adaptive switching circuits
    • New York: Institute of Radio Engineers
    • Widrow, B., & Hoff, M. E. (1960). Adaptive switching circuits. In IRE WESCON Convention Record (pp. 96-104). New York: Institute of Radio Engineers.
    • (1960) IRE WESCON Convention Record , pp. 96-104
    • Widrow, B.1    Hoff, M.E.2
  • 42
    • 22944460232 scopus 로고    scopus 로고
    • Convergence and divergence in standard averaging reinforcement learning
    • J. Boulicaut, F. Esposito, F. Giannotti, & D. Pedreschi (Eds.), Berlin: Springer-Verlag
    • Wiering, M. (2004). Convergence and divergence in standard averaging reinforcement learning. In J. Boulicaut, F. Esposito, F. Giannotti, & D. Pedreschi (Eds.), Proceedings of the 15th European Conference on Machine learning ECML'04 (pp. 477-488). Berlin: Springer-Verlag.
    • (2004) Proceedings of the 15th European Conference on Machine learning ECML'04 , pp. 477-488
    • Wiering, M.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.