-
1
-
-
0004370245
-
-
(Tech. Rep. WL-TR-93-1146). Ohio: Wright Laboratory, Wright-Patterson Air Force Base
-
Baird, L. (1993). Advantage updating (Tech. Rep. WL-TR-93-1146). Ohio: Wright Laboratory, Wright-Patterson Air Force Base.
-
(1993)
Advantage updating
-
-
Baird, L.1
-
2
-
-
0013495368
-
Experiments with infinite-horizon, policy-gradient estimation
-
Baxter, J., Bartlett, P. L., & Weaver, L. (2001). Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research, 15, 351-381.
-
(2001)
Journal of Artificial Intelligence Research
, vol.15
, pp. 351-381
-
-
Baxter, J.1
Bartlett, P.L.2
Weaver, L.3
-
4
-
-
0028388685
-
TD(λ) converges with probability 1
-
Dayan, P., & Sejnowski, T. (1994). TD(λ) converges with probability 1. Mach. Learn., 14(3), 295-301.
-
(1994)
Mach. Learn.
, vol.14
, Issue.3
, pp. 295-301
-
-
Dayan, P.1
Sejnowski, T.2
-
5
-
-
85156231814
-
Temporal difference learning in continuous time and space
-
D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
-
Doya, K. (1996). Temporal difference learning in continuous time and space. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1073-1079). Cambridge, MA: MIT Press.
-
(1996)
Advances in neural information processing systems
, vol.8
, pp. 1073-1079
-
-
Doya, K.1
-
6
-
-
0034524427
-
Complementary roles of basal ganglia and cerebellum in learning and motor control
-
Doya, K. (2000). Complementary roles of basal ganglia and cerebellum in learning and motor control. Current Opinion in Neurobiology, 10(6), 732-739.
-
(2000)
Current Opinion in Neurobiology
, vol.10
, Issue.6
, pp. 732-739
-
-
Doya, K.1
-
7
-
-
0026579349
-
Homosynaptic long-term depression in area CA1 of hippocampus and effects of N-methyl-D-aspartate receptor blockade
-
Dudek, S., & Bear, M. (1992). Homosynaptic long-term depression in area CA1 of hippocampus and effects of N-methyl-D-aspartate receptor blockade. Proceedings of the National Academy of Sciences, 89(10), 4363-4367.
-
(1992)
Proceedings of the National Academy of Sciences
, vol.89
, Issue.10
, pp. 4363-4367
-
-
Dudek, S.1
Bear, M.2
-
8
-
-
34249708388
-
Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity
-
Florian, R. V. (2007). Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Computation, 19, 1468-1502.
-
(2007)
Neural Computation
, vol.19
, pp. 1468-1502
-
-
Florian, R.V.1
-
9
-
-
0029821128
-
A neuronal learning rule for sub-millisecond temporal coding
-
Gerstner, W., Kempter, R., van Hemmen, L., & Wagner, H. (1996). A neuronal learning rule for sub-millisecond temporal coding. Nature, 383, 76-78.
-
(1996)
Nature
, vol.383
, pp. 76-78
-
-
Gerstner, W.1
Kempter, R.2
van Hemmen, L.3
Wagner, H.4
-
10
-
-
0032123567
-
The basal ganglia and chunking of action repertoires
-
Graybiel, A. (1998). The basal ganglia and chunking of action repertoires. Neurobiol. Learn. Mem., 70(1-2), 119-136.
-
(1998)
Neurobiol. Learn. Mem.
, vol.70
, Issue.1-2
, pp. 119-136
-
-
Graybiel, A.1
-
11
-
-
0035015792
-
Influence of expectation of different rewards on behavior-related neuronal activity in the striatum
-
Hassani, O. K., Cromwell, H. C., & Schultz, W. (2001). Influence of expectation of different rewards on behavior-related neuronal activity in the striatum. J. Neurophysiol., 85(6), 2477-2489.
-
(2001)
J. Neurophysiol.
, vol.85
, Issue.6
, pp. 2477-2489
-
-
Hassani, O.K.1
Cromwell, H.C.2
Schultz, W.3
-
12
-
-
0020118274
-
Neural networks and physical systems with emergent collective computational abilities
-
Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79, 2554-2558.
-
(1982)
Proceedings of the National Academy of Sciences
, vol.79
, pp. 2554-2558
-
-
Hopfield, J.J.1
-
13
-
-
34948906745
-
Solving the distal reward problem through linkage of STDP and dopamine signaling
-
Izhikevich, E. (2007). Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral Cortex, 17, 2443-2452.
-
(2007)
Cerebral Cortex
, vol.17
, pp. 2443-2452
-
-
Izhikevich, E.1
-
14
-
-
0023878618
-
A neuronal model of classical conditioning
-
Klopf, A. H. (1988). A neuronal model of classical conditioning. Psychobiol., 16(2), 85-123.
-
(1988)
Psychobiol.
, vol.16
, Issue.2
, pp. 85-123
-
-
Klopf, A.H.1
-
15
-
-
40149107540
-
Mathematical properties of neuronal TD-rules and differential Hebbian learning: A comparison
-
Kolodziejski, C., Porr, B., & Wörgötter, F. (2008). Mathematical properties of neuronal TD-rules and differential Hebbian learning: A comparison. Biological Cybernetics, 98(3), 259-272.
-
(2008)
Biological Cybernetics
, vol.98
, Issue.3
, pp. 259-272
-
-
Kolodziejski, C.1
Porr, B.2
Wörgötter, F.3
-
16
-
-
0042276165
-
Differential Hebbian learning
-
J. S. Denker (Ed.), New York: American Institute of Physics
-
Kosco, B. (1986). Differential Hebbian learning. In J. S. Denker (Ed.), Neural networks for computing: AIP Conference Proc. proceedings (Vol. 151). New York: American Institute of Physics.
-
(1986)
Neural networks for computing: AIP Conference Proc. proceedings
, vol.151
-
-
Kosco, B.1
-
18
-
-
0023981750
-
Self-organisation in a perceptual network
-
Linsker, R. (1988). Self-organisation in a perceptual network. Computer, 21(3), 105-117.
-
(1988)
Computer
, vol.21
, Issue.3
, pp. 105-117
-
-
Linsker, R.1
-
19
-
-
0031012615
-
Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs
-
Markram, H., Lübke, J., Frotscher, M., & Sakmann, B. (1997). Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science, 275, 213-215.
-
(1997)
Science
, vol.275
, pp. 213-215
-
-
Markram, H.1
Lübke, J.2
Frotscher, M.3
Sakmann, B.4
-
20
-
-
0029981543
-
A framework for mesencephalic dopamine systems based on predictive Hebbian learning
-
Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 76(5), 1936-1947.
-
(1996)
Journal of Neuroscience
, vol.76
, Issue.5
, pp. 1936-1947
-
-
Montague, P.R.1
Dayan, P.2
Sejnowski, T.J.3
-
21
-
-
3242673464
-
Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons
-
Morris, G., Arkadir, D., Nevet, A., Vaadia, E., & Bergman, H. (2004). Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron, 43(1), 133-143.
-
(2004)
Neuron
, vol.43
, Issue.1
, pp. 133-143
-
-
Morris, G.1
Arkadir, D.2
Nevet, A.3
Vaadia, E.4
Bergman, H.5
-
22
-
-
33747585633
-
Midbrain dopamine neurons encode decisions for future action
-
Morris, G., Nevet, A., Arkadir, D., Vaadia, E., & Bergman, H. (2006). Midbrain dopamine neurons encode decisions for future action. Nature Neuroscience, 9(8), 1057-1063.
-
(2006)
Nature Neuroscience
, vol.9
, Issue.8
, pp. 1057-1063
-
-
Morris, G.1
Nevet, A.2
Arkadir, D.3
Vaadia, E.4
Bergman, H.5
-
23
-
-
0020464111
-
A simplified neuron model as a principal component analyzer
-
Oja, E. (1982). A simplified neuron model as a principal component analyzer. J. Math. Biol., 15(3), 267-273.
-
(1982)
J. Math. Biol.
, vol.15
, Issue.3
, pp. 267-273
-
-
Oja, E.1
-
24
-
-
40449100017
-
Dopamine receptor activation is required for corticostriatal spike-timing-dependent plasticity
-
Pawlak, V., & Kerr, J. N. D. (2008). Dopamine receptor activation is required for corticostriatal spike-timing-dependent plasticity. J. Neurosci., 28(10), 2435-2446.
-
(2008)
J. Neurosci.
, vol.28
, Issue.10
, pp. 2435-2446
-
-
Pawlak, V.1
Kerr, J.N.D.2
-
25
-
-
0742301619
-
Isotropic-sequence-order learning in a closed-loop behavioural system
-
Porr, B., & Wörgötter, F. (2003). Isotropic-sequence-order learning in a closed-loop behavioural system. Phil. Trans. R. Soc. Lond. A, 361, 2225-2244.
-
(2003)
Phil. Trans. R. Soc. Lond. A
, vol.361
, pp. 2225-2244
-
-
Porr, B.1
Wörgötter, F.2
-
26
-
-
35549002871
-
Learning with " relevance" : Using a third factor to stabilise Hebbian learning
-
Porr, B., & Wörgötter, F. (2007). Learning with "relevance": Using a third factor to stabilise Hebbian learning. Neural Comp., 19, 2694-2719.
-
(2007)
Neural Comp.
, vol.19
, pp. 2694-2719
-
-
Porr, B.1
Wörgötter, F.2
-
27
-
-
67650298948
-
A spiking neural network model of an actor-critic learning agent
-
Potjans, W., Morrison, A., & Diesmann, M. (2009). A spiking neural network model of an actor-critic learning agent. Neural Computation, 21(2), 301-339.
-
(2009)
Neural Computation
, vol.21
, Issue.2
, pp. 301-339
-
-
Potjans, W.1
Morrison, A.2
Diesmann, M.3
-
28
-
-
0035489925
-
Spike-timing-dependent Hebbian plasticity as temporal difference learning
-
Rao, R., & Sejnowski, T. (2001). Spike-timing-dependent Hebbian plasticity as temporal difference learning. Neural Computation, 13, 2221-2237.
-
(2001)
Neural Computation
, vol.13
, pp. 2221-2237
-
-
Rao, R.1
Sejnowski, T.2
-
29
-
-
33751184634
-
The short-latency dopamine signal: A role in discovering novel actions?
-
Redgrave, P., & Gurney, K. (2006). The short-latency dopamine signal: A role in discovering novel actions? Nature Reviews Neuroscience, 7, 967-975.
-
(2006)
Nature Reviews Neuroscience
, vol.7
, pp. 967-975
-
-
Redgrave, P.1
Gurney, K.2
-
30
-
-
0032696609
-
Computational consequences of temporally asymmetric learning rules: I. Differential Hebbian learning
-
Roberts, P. (1999). Computational consequences of temporally asymmetric learning rules: I. Differential Hebbian learning. J. Comput. Neurosci., 7(3), 235-246.
-
(1999)
J. Comput. Neurosci.
, vol.7
, Issue.3
, pp. 235-246
-
-
Roberts, P.1
-
31
-
-
57049100874
-
An implementation of reinforcement learning based on spike-timing dependent plasticity
-
Roberts, P., Santiago, R., & Lafferriere, G. (2008). An implementation of reinforcement learning based on spike-timing dependent plasticity. Biological Cybernetics, 99(6), 517-523.
-
(2008)
Biological Cybernetics
, vol.99
, Issue.6
, pp. 517-523
-
-
Roberts, P.1
Santiago, R.2
Lafferriere, G.3
-
32
-
-
0031867046
-
Predictive reward signal of dopamine neurons
-
Schultz, W. (1998). Predictive reward signal of dopamine neurons. J. Neurophysiol., 80, 1-27.
-
(1998)
J. Neurophysiol.
, vol.80
, pp. 1-27
-
-
Schultz, W.1
-
33
-
-
0026442752
-
Neuronal activity in monkey ventral striatum related to the expectation of reward
-
Schultz, W., Apicella, P., Scarnati, E., & Ljungberg, T. (1992). Neuronal activity in monkey ventral striatum related to the expectation of reward. J. Neurosci., 12(12), 4595-4610.
-
(1992)
J. Neurosci.
, vol.12
, Issue.12
, pp. 4595-4610
-
-
Schultz, W.1
Apicella, P.2
Scarnati, E.3
Ljungberg, T.4
-
34
-
-
0033901602
-
Convergence results for single-step on-policy reinforcement-learning algorithms
-
Singh, S. P., Jaakkola, T., Littman, M. L., & Szepesvári, C. (2000). Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3), 287-308.
-
(2000)
Machine Learning
, vol.38
, Issue.3
, pp. 287-308
-
-
Singh, S.P.1
Jaakkola, T.2
Littman, M.L.3
Szepesvári, C.4
-
35
-
-
33847202724
-
Learning to predict by the method of temporal differences
-
Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3, 9-44.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
36
-
-
0019537951
-
Towards a modern theory of adaptive networks: Expectation and prediction
-
Sutton, R., & Barto, A. (1981). Towards a modern theory of adaptive networks: Expectation and prediction. Psychol. Review, 88, 135-170.
-
(1981)
Psychol. Review
, vol.88
, pp. 135-170
-
-
Sutton, R.1
Barto, A.2
-
38
-
-
53149129441
-
Path-finding in real and simulated rats: On the usefulness of forgetting and frustration for navigation learning
-
Tamosiunaite, M., Ainge, J., Kulvicius, T., Porr, B., Dudchenko, P., & Wörgötter, F. (2008). Path-finding in real and simulated rats: On the usefulness of forgetting and frustration for navigation learning. J. Comp. Neuroscience, 25, 562-582.
-
(2008)
J. Comp. Neuroscience
, vol.25
, pp. 562-582
-
-
Tamosiunaite, M.1
Ainge, J.2
Kulvicius, T.3
Porr, B.4
Dudchenko, P.5
Wörgötter, F.6
-
39
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5), 674-690.
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
40
-
-
34249833101
-
Technical note: Q-learning
-
Watkins, C., & Dayan, P. (1992). Technical note: Q-learning. Mach. Learn., 8, 279-292.
-
(1992)
Mach. Learn.
, vol.8
, pp. 279-292
-
-
Watkins, C.1
Dayan, P.2
-
41
-
-
0002278965
-
Adaptive switching circuits
-
New York: Institute of Radio Engineers
-
Widrow, B., & Hoff, M. E. (1960). Adaptive switching circuits. In IRE WESCON Convention Record (pp. 96-104). New York: Institute of Radio Engineers.
-
(1960)
IRE WESCON Convention Record
, pp. 96-104
-
-
Widrow, B.1
Hoff, M.E.2
-
42
-
-
22944460232
-
Convergence and divergence in standard averaging reinforcement learning
-
J. Boulicaut, F. Esposito, F. Giannotti, & D. Pedreschi (Eds.), Berlin: Springer-Verlag
-
Wiering, M. (2004). Convergence and divergence in standard averaging reinforcement learning. In J. Boulicaut, F. Esposito, F. Giannotti, & D. Pedreschi (Eds.), Proceedings of the 15th European Conference on Machine learning ECML'04 (pp. 477-488). Berlin: Springer-Verlag.
-
(2004)
Proceedings of the 15th European Conference on Machine learning ECML'04
, pp. 477-488
-
-
Wiering, M.1
|