-
1
-
-
60749114870
-
Decision theory, reinforcement learning, and the brain
-
Dayan P, Daw ND, (2008) Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Ne 8: 429-453.
-
(2008)
Cogn Affect Behav Ne
, vol.8
, pp. 429-453
-
-
Dayan, P.1
Daw, N.D.2
-
3
-
-
2942617032
-
Temporal difference models describe higher-order learning in humans
-
Seymour B, O'Doherty JP, Dayan P, Koltzenburg M, Jones AK, et al. (2004) Temporal difference models describe higher-order learning in humans. Nature 429: 664-7.
-
(2004)
Nature
, vol.429
, pp. 664-667
-
-
Seymour, B.1
O'Doherty, J.P.2
Dayan, P.3
Koltzenburg, M.4
Jones, A.K.5
-
4
-
-
67650298948
-
A spiking neural network model of an actor-aritic learning agent
-
Potjans W, Morrison A, Diesmann M, (2009) A spiking neural network model of an actor-aritic learning agent. Neural Comput 21: 301-339.
-
(2009)
Neural Comput
, vol.21
, pp. 301-339
-
-
Potjans, W.1
Morrison, A.2
Diesmann, M.3
-
6
-
-
0015658957
-
The optimal control of partially observable markov processes over a finite horizon
-
Smallwood RD, Sondik EJ, (1973) The optimal control of partially observable markov processes over a finite horizon. Oper Res 21: 1071-1088.
-
(1973)
Oper Res
, vol.21
, pp. 1071-1088
-
-
Smallwood, R.D.1
Sondik, E.J.2
-
7
-
-
79959853243
-
Spatio-temporal credit assignment in neuronal population learning
-
Friedrich J, Urbanczik R, Senn W, (2011) Spatio-temporal credit assignment in neuronal population learning. PLoS Comput Biol 7: e1002092.
-
(2011)
PLoS Comput Biol
, vol.7
-
-
Friedrich, J.1
Urbanczik, R.2
Senn, W.3
-
10
-
-
33947232811
-
Decision-making in blackjack: An electrophysiological analysis
-
Hewig J, Trippe R, Hecht H, Coles GH, Holroyd CB, et al. (2007) Decision-making in blackjack: An electrophysiological analysis. Cereb Cortex 17: 865-877.
-
(2007)
Cereb Cortex
, vol.17
, pp. 865-877
-
-
Hewig, J.1
Trippe, R.2
Hecht, H.3
Coles, G.H.4
Holroyd, C.B.5
-
11
-
-
5144223501
-
Activity in posterior parietal cortex is correlated with the relative subjective desirability of action
-
Dorris MC, Glimcher PW, (2004) Activity in posterior parietal cortex is correlated with the relative subjective desirability of action. Neuron 44: 365-378.
-
(2004)
Neuron
, vol.44
, pp. 365-378
-
-
Dorris, M.C.1
Glimcher, P.W.2
-
12
-
-
0002621983
-
Animal Intelligence: An Experimental Study of the Associative Processes in Animals
-
Thorndike EL, (1898) Animal Intelligence: An Experimental Study of the Associative Processes in Animals. Psychol Monogr 2: 321-330.
-
(1898)
Psychol Monogr
, vol.2
, pp. 321-330
-
-
Thorndike, E.L.1
-
13
-
-
0002109138
-
A theory of Pavlovian conditioning: variations in the effectiveness of reinforecement and nonreinforcement
-
In: Black AH, Prokasy WF, editors, New York: Appleton Century Crofts
-
Rescorla R, Wagner A (1972) A theory of Pavlovian conditioning: variations in the effectiveness of reinforecement and nonreinforcement. In: Black AH, Prokasy WF, editors. Classical Conditioning II: current research and theory. New York: Appleton Century Crofts. pp. 64-99.
-
(1972)
Classical Conditioning II: Current research and theory
, pp. 64-99
-
-
Rescorla, R.1
Wagner, A.2
-
15
-
-
33746652644
-
Gradient learning in spiking neural networks by dynamic perturbation of conductances
-
Fiete IR, Seung HS, (2006) Gradient learning in spiking neural networks by dynamic perturbation of conductances. Phys Rev Lett 97: 048104.
-
(2006)
Phys Rev Lett
, vol.97
, pp. 048104
-
-
Fiete, I.R.1
Seung, H.S.2
-
16
-
-
77957731196
-
Functional requirements for reward-modulated spike- timing-dependent plasticity
-
Frémaux N, Sprekeler H, Gerstner W, (2010) Functional requirements for reward-modulated spike- timing-dependent plasticity. J Neurosci 30: 13326-13337.
-
(2010)
J Neurosci
, vol.30
, pp. 13326-13337
-
-
Frémaux, N.1
Sprekeler, H.2
Gerstner, W.3
-
17
-
-
0038829878
-
Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria
-
Erev I, Roth AE, (1998) Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. Amer Econ Rev 88: 848-881.
-
(1998)
Amer Econ Rev
, vol.88
, pp. 848-881
-
-
Erev, I.1
Roth, A.E.2
-
18
-
-
5644290841
-
An integrated theory of the mind
-
Anderson JR, Bothell D, Byrne MD, Douglass S, Lebiere C, et al. (2004) An integrated theory of the mind. Psychol Rev 111: 1036-1060.
-
(2004)
Psychol Rev
, vol.111
, pp. 1036-1060
-
-
Anderson, J.R.1
Bothell, D.2
Byrne, M.D.3
Douglass, S.4
Lebiere, C.5
-
20
-
-
0030896968
-
A neural substrate of prediction and reward
-
Schultz W, Dayan P, Montague PR, (1997) A neural substrate of prediction and reward. Science 275: 1593-1599.
-
(1997)
Science
, vol.275
, pp. 1593-1599
-
-
Schultz, W.1
Dayan, P.2
Montague, P.R.3
-
21
-
-
53849125053
-
Decision making in recurrent neuronal circuits
-
Wang XJ, (2008) Decision making in recurrent neuronal circuits. Neuron 60: 215-234.
-
(2008)
Neuron
, vol.60
, pp. 215-234
-
-
Wang, X.J.1
-
22
-
-
60749100305
-
Reinforcement learning in populations of spiking neurons
-
Urbanczik R, Senn W, (2009) Reinforcement learning in populations of spiking neurons. Nat Neurosci 12: 250-252.
-
(2009)
Nat Neurosci
, vol.12
, pp. 250-252
-
-
Urbanczik, R.1
Senn, W.2
-
23
-
-
34948906745
-
Solving the distal reward problem through linkage of STDP and dopamine signaling
-
Izhikevich EM, (2007) Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex 17: 2443-2452.
-
(2007)
Cereb Cortex
, vol.17
, pp. 2443-2452
-
-
Izhikevich, E.M.1
-
24
-
-
33745726849
-
Neural correlations, population coding and computation
-
Averbeck B, Latham PE, Pouget A, (2006) Neural correlations, population coding and computation. Nat Rev Neurosci 7: 358-3666.
-
(2006)
Nat Rev Neurosci
, vol.7
, pp. 358-3666
-
-
Averbeck, B.1
Latham, P.E.2
Pouget, A.3
-
25
-
-
21244466146
-
Zur Theorie der Gesellschaftsspiele
-
Von Neumann J, (1928) Zur Theorie der Gesellschaftsspiele. Math Ann 100: 295-320.
-
(1928)
Math Ann
, vol.100
, pp. 295-320
-
-
von Neumann, J.1
-
26
-
-
50149108585
-
An electrophysiological analysis of coaching in blackjack
-
Hewig J, Trippe R, Hecht H, Coles GH, Holroyd CB, et al. (2008) An electrophysiological analysis of coaching in blackjack. Cortex 44: 1197-1205.
-
(2008)
Cortex
, vol.44
, pp. 1197-1205
-
-
Hewig, J.1
Trippe, R.2
Hecht, H.3
Coles, G.H.4
Holroyd, C.B.5
-
27
-
-
67649304665
-
Inspection games
-
In: Aumann RJ, Hart S, editors
-
Avenhaus R, Von Stengel B, Zamir S (2002) Inspection games. In: Aumann RJ, Hart S, editors. Handbook of Game Theory with Economic Applications. pp. 1947-1987.
-
(2002)
Handbook of Game Theory with Economic Applications
, pp. 1947-1987
-
-
Avenhaus, R.1
von Stengel, B.2
Zamir, S.3
-
29
-
-
33646801243
-
Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning
-
Pfister J, Toyoizumi T, Barber D, Gerstner W, (2006) Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning. Neural Comput 18: 1318-1348.
-
(2006)
Neural Comput
, vol.18
, pp. 1318-1348
-
-
Pfister, J.1
Toyoizumi, T.2
Barber, D.3
Gerstner, W.4
-
30
-
-
34249708388
-
Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity
-
Florian RV, (2007) Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Comput 19: 1468-1502.
-
(2007)
Neural Comput
, vol.19
, pp. 1468-1502
-
-
Florian, R.V.1
-
31
-
-
77955988359
-
Learning spike-based population codes by reward and population feedback
-
Friedrich J, Urbanczik R, Senn W, (2010) Learning spike-based population codes by reward and population feedback. Neural Comput 22: 1698-1717.
-
(2010)
Neural Comput
, vol.22
, pp. 1698-1717
-
-
Friedrich, J.1
Urbanczik, R.2
Senn, W.3
-
32
-
-
0347362917
-
Learning in spiking neural networks by reinforcement of stochastic synaptic transmission
-
Seung HS, (2003) Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron 40: 1063-1073.
-
(2003)
Neuron
, vol.40
, pp. 1063-1073
-
-
Seung, H.S.1
-
33
-
-
27144462270
-
Learning curves for stochastic gradient descent in linear feedfor-ward networks
-
Werfel J, Xie X, Seung HS, (2005) Learning curves for stochastic gradient descent in linear feedfor-ward networks. Neural Comput 17: 2699-2718.
-
(2005)
Neural Comput
, vol.17
, pp. 2699-2718
-
-
Werfel, J.1
Xie, X.2
Seung, H.S.3
-
34
-
-
0002070953
-
Learning behavior and mixed-strategy Nash equilibria
-
Crawford VP, (1985) Learning behavior and mixed-strategy Nash equilibria. J Econ Behav Organ 6: 69-78.
-
(1985)
J Econ Behav Organ
, vol.6
, pp. 69-78
-
-
Crawford, V.P.1
-
35
-
-
38249030846
-
On the instability of mixed-strategy Nash equilibria
-
Stahl DO, (1988) On the instability of mixed-strategy Nash equilibria. J Econ Behav Organ 9: 59-69.
-
(1988)
J Econ Behav Organ
, vol.9
, pp. 59-69
-
-
Stahl, D.O.1
-
36
-
-
0013315245
-
A re-examination of probability matching and ra-tional choice
-
Shanks DR, Tunney RJ, McCarthy JD, (2002) A re-examination of probability matching and ra-tional choice. J Behav Decis Making 15: 233-250.
-
(2002)
J Behav Decis Making
, vol.15
, pp. 233-250
-
-
Shanks, D.R.1
Tunney, R.J.2
McCarthy, J.D.3
-
37
-
-
33750041626
-
Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity
-
Loewenstein Y, Seung HS, (2006) Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity. Proc Natl Acad Sci U S A 103: 15224-15229.
-
(2006)
Proc Natl Acad Sci U S A
, vol.103
, pp. 15224-15229
-
-
Loewenstein, Y.1
Seung, H.S.2
-
38
-
-
70449718877
-
Operant matching as a Nash equilibrium of an in-tertemporal game
-
Loewenstein Y, Prelec D, Seung HS, (2009) Operant matching as a Nash equilibrium of an in-tertemporal game. Neural Comput 21: 2755-2773.
-
(2009)
Neural Comput
, vol.21
, pp. 2755-2773
-
-
Loewenstein, Y.1
Prelec, D.2
Seung, H.S.3
-
39
-
-
37749023538
-
The actor-critic learning is behind the matching law: matching versus optimal behaviors
-
Sakai Y, Fukai T, (2008) The actor-critic learning is behind the matching law: matching versus optimal behaviors. Neural Comput 20: 227-251.
-
(2008)
Neural Comput
, vol.20
, pp. 227-251
-
-
Sakai, Y.1
Fukai, T.2
-
40
-
-
27844539379
-
Relative and absolute strength of response as a function of frequency of reinforcement
-
Herrnstein RJ, (1961) Relative and absolute strength of response as a function of frequency of reinforcement. J Exp Anal Behav 4: 267-272.
-
(1961)
J Exp Anal Behav
, vol.4
, pp. 267-272
-
-
Herrnstein, R.J.1
-
41
-
-
84866930679
-
Synaptic theory of replicator-like melioration
-
Loewenstein Y, (2010) Synaptic theory of replicator-like melioration. Front Comput Neurosci 4: 17.
-
(2010)
Front Comput Neurosci
, vol.4
, pp. 17
-
-
Loewenstein, Y.1
-
42
-
-
33645566919
-
A biophysically based neural model of matching law behavior: melio-ration by stochastic synapses
-
Soltani A, Wang XJ, (2006) A biophysically based neural model of matching law behavior: melio-ration by stochastic synapses. J Neurosci 26: 3731-3744.
-
(2006)
J Neurosci
, vol.26
, pp. 3731-3744
-
-
Soltani, A.1
Wang, X.J.2
-
44
-
-
0001281582
-
Do people play Nash equilibrium? Lessons from evolutionary game theory
-
Shanks DR, Tunney RJ, McCarthy JD, (1998) Do people play Nash equilibrium? Lessons from evolutionary game theory. J Econ Lit 36: 1-28.
-
(1998)
J Econ Lit
, vol.36
, pp. 1-28
-
-
Shanks, D.R.1
Tunney, R.J.2
McCarthy, J.D.3
-
46
-
-
4644369748
-
Nash Q-learning for general-sum stochastic games
-
Hu J, Wellman MP, (2003) Nash Q-learning for general-sum stochastic games. J Mach Learn Res 4: 1039-1069.
-
(2003)
J Mach Learn Res
, vol.4
, pp. 1039-1069
-
-
Hu, J.1
Wellman, M.P.2
-
47
-
-
1642570323
-
The Nash equilibrium: a perspective
-
Holt CA, Roth AE, (2004) The Nash equilibrium: a perspective. Proc Natl Acad Sci U S A 101: 3999-4002.
-
(2004)
Proc Natl Acad Sci U S A
, vol.101
, pp. 3999-4002
-
-
Holt, C.A.1
Roth, A.E.2
-
50
-
-
0029800695
-
How the brain keeps the eyes still
-
Seung HS, (1996) How the brain keeps the eyes still. Proc Natl Acad Sci U S A 93: 13339-13344.
-
(1996)
Proc Natl Acad Sci U S A
, vol.93
, pp. 13339-13344
-
-
Seung, H.S.1
-
51
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Williams RJ, (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8: 229-256.
-
(1992)
Mach Learn
, vol.8
, pp. 229-256
-
-
Williams, R.J.1
|