메뉴 건너뛰기




Volumn 8, Issue 9, 2012, Pages

Spike-based Decision Learning of Nash Equilibria in Two-Player Games

Author keywords

[No Author keywords available]

Indexed keywords

BEHAVIORAL RESEARCH; COMPUTATION THEORY; DECISION MAKING; GAME THEORY; MULTI AGENT SYSTEMS; NEURONS; POPULATION STATISTICS; STOCHASTIC SYSTEMS;

EID: 84866941777     PISSN: 1553734X     EISSN: 15537358     Source Type: Journal    
DOI: 10.1371/journal.pcbi.1002691     Document Type: Article
Times cited : (5)

References (51)
  • 1
    • 60749114870 scopus 로고    scopus 로고
    • Decision theory, reinforcement learning, and the brain
    • Dayan P, Daw ND, (2008) Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Ne 8: 429-453.
    • (2008) Cogn Affect Behav Ne , vol.8 , pp. 429-453
    • Dayan, P.1    Daw, N.D.2
  • 3
    • 2942617032 scopus 로고    scopus 로고
    • Temporal difference models describe higher-order learning in humans
    • Seymour B, O'Doherty JP, Dayan P, Koltzenburg M, Jones AK, et al. (2004) Temporal difference models describe higher-order learning in humans. Nature 429: 664-7.
    • (2004) Nature , vol.429 , pp. 664-667
    • Seymour, B.1    O'Doherty, J.P.2    Dayan, P.3    Koltzenburg, M.4    Jones, A.K.5
  • 4
    • 67650298948 scopus 로고    scopus 로고
    • A spiking neural network model of an actor-aritic learning agent
    • Potjans W, Morrison A, Diesmann M, (2009) A spiking neural network model of an actor-aritic learning agent. Neural Comput 21: 301-339.
    • (2009) Neural Comput , vol.21 , pp. 301-339
    • Potjans, W.1    Morrison, A.2    Diesmann, M.3
  • 6
    • 0015658957 scopus 로고
    • The optimal control of partially observable markov processes over a finite horizon
    • Smallwood RD, Sondik EJ, (1973) The optimal control of partially observable markov processes over a finite horizon. Oper Res 21: 1071-1088.
    • (1973) Oper Res , vol.21 , pp. 1071-1088
    • Smallwood, R.D.1    Sondik, E.J.2
  • 7
    • 79959853243 scopus 로고    scopus 로고
    • Spatio-temporal credit assignment in neuronal population learning
    • Friedrich J, Urbanczik R, Senn W, (2011) Spatio-temporal credit assignment in neuronal population learning. PLoS Comput Biol 7: e1002092.
    • (2011) PLoS Comput Biol , vol.7
    • Friedrich, J.1    Urbanczik, R.2    Senn, W.3
  • 10
    • 33947232811 scopus 로고    scopus 로고
    • Decision-making in blackjack: An electrophysiological analysis
    • Hewig J, Trippe R, Hecht H, Coles GH, Holroyd CB, et al. (2007) Decision-making in blackjack: An electrophysiological analysis. Cereb Cortex 17: 865-877.
    • (2007) Cereb Cortex , vol.17 , pp. 865-877
    • Hewig, J.1    Trippe, R.2    Hecht, H.3    Coles, G.H.4    Holroyd, C.B.5
  • 11
    • 5144223501 scopus 로고    scopus 로고
    • Activity in posterior parietal cortex is correlated with the relative subjective desirability of action
    • Dorris MC, Glimcher PW, (2004) Activity in posterior parietal cortex is correlated with the relative subjective desirability of action. Neuron 44: 365-378.
    • (2004) Neuron , vol.44 , pp. 365-378
    • Dorris, M.C.1    Glimcher, P.W.2
  • 12
    • 0002621983 scopus 로고
    • Animal Intelligence: An Experimental Study of the Associative Processes in Animals
    • Thorndike EL, (1898) Animal Intelligence: An Experimental Study of the Associative Processes in Animals. Psychol Monogr 2: 321-330.
    • (1898) Psychol Monogr , vol.2 , pp. 321-330
    • Thorndike, E.L.1
  • 13
    • 0002109138 scopus 로고
    • A theory of Pavlovian conditioning: variations in the effectiveness of reinforecement and nonreinforcement
    • In: Black AH, Prokasy WF, editors, New York: Appleton Century Crofts
    • Rescorla R, Wagner A (1972) A theory of Pavlovian conditioning: variations in the effectiveness of reinforecement and nonreinforcement. In: Black AH, Prokasy WF, editors. Classical Conditioning II: current research and theory. New York: Appleton Century Crofts. pp. 64-99.
    • (1972) Classical Conditioning II: Current research and theory , pp. 64-99
    • Rescorla, R.1    Wagner, A.2
  • 14
    • 84899031338 scopus 로고    scopus 로고
    • Statistical models of conditioning
    • Dayan P, Long T (1998) Statistical models of conditioning. Adv Neural Inf Process Syst 10. pp. 117-123.
    • (1998) Adv Neural Inf Process Syst , vol.10 , pp. 117-123
    • Dayan, P.1    Long, T.2
  • 15
    • 33746652644 scopus 로고    scopus 로고
    • Gradient learning in spiking neural networks by dynamic perturbation of conductances
    • Fiete IR, Seung HS, (2006) Gradient learning in spiking neural networks by dynamic perturbation of conductances. Phys Rev Lett 97: 048104.
    • (2006) Phys Rev Lett , vol.97 , pp. 048104
    • Fiete, I.R.1    Seung, H.S.2
  • 16
    • 77957731196 scopus 로고    scopus 로고
    • Functional requirements for reward-modulated spike- timing-dependent plasticity
    • Frémaux N, Sprekeler H, Gerstner W, (2010) Functional requirements for reward-modulated spike- timing-dependent plasticity. J Neurosci 30: 13326-13337.
    • (2010) J Neurosci , vol.30 , pp. 13326-13337
    • Frémaux, N.1    Sprekeler, H.2    Gerstner, W.3
  • 17
    • 0038829878 scopus 로고    scopus 로고
    • Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria
    • Erev I, Roth AE, (1998) Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. Amer Econ Rev 88: 848-881.
    • (1998) Amer Econ Rev , vol.88 , pp. 848-881
    • Erev, I.1    Roth, A.E.2
  • 20
    • 0030896968 scopus 로고    scopus 로고
    • A neural substrate of prediction and reward
    • Schultz W, Dayan P, Montague PR, (1997) A neural substrate of prediction and reward. Science 275: 1593-1599.
    • (1997) Science , vol.275 , pp. 1593-1599
    • Schultz, W.1    Dayan, P.2    Montague, P.R.3
  • 21
    • 53849125053 scopus 로고    scopus 로고
    • Decision making in recurrent neuronal circuits
    • Wang XJ, (2008) Decision making in recurrent neuronal circuits. Neuron 60: 215-234.
    • (2008) Neuron , vol.60 , pp. 215-234
    • Wang, X.J.1
  • 22
    • 60749100305 scopus 로고    scopus 로고
    • Reinforcement learning in populations of spiking neurons
    • Urbanczik R, Senn W, (2009) Reinforcement learning in populations of spiking neurons. Nat Neurosci 12: 250-252.
    • (2009) Nat Neurosci , vol.12 , pp. 250-252
    • Urbanczik, R.1    Senn, W.2
  • 23
    • 34948906745 scopus 로고    scopus 로고
    • Solving the distal reward problem through linkage of STDP and dopamine signaling
    • Izhikevich EM, (2007) Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex 17: 2443-2452.
    • (2007) Cereb Cortex , vol.17 , pp. 2443-2452
    • Izhikevich, E.M.1
  • 24
    • 33745726849 scopus 로고    scopus 로고
    • Neural correlations, population coding and computation
    • Averbeck B, Latham PE, Pouget A, (2006) Neural correlations, population coding and computation. Nat Rev Neurosci 7: 358-3666.
    • (2006) Nat Rev Neurosci , vol.7 , pp. 358-3666
    • Averbeck, B.1    Latham, P.E.2    Pouget, A.3
  • 25
    • 21244466146 scopus 로고
    • Zur Theorie der Gesellschaftsspiele
    • Von Neumann J, (1928) Zur Theorie der Gesellschaftsspiele. Math Ann 100: 295-320.
    • (1928) Math Ann , vol.100 , pp. 295-320
    • von Neumann, J.1
  • 26
    • 50149108585 scopus 로고    scopus 로고
    • An electrophysiological analysis of coaching in blackjack
    • Hewig J, Trippe R, Hecht H, Coles GH, Holroyd CB, et al. (2008) An electrophysiological analysis of coaching in blackjack. Cortex 44: 1197-1205.
    • (2008) Cortex , vol.44 , pp. 1197-1205
    • Hewig, J.1    Trippe, R.2    Hecht, H.3    Coles, G.H.4    Holroyd, C.B.5
  • 29
    • 33646801243 scopus 로고    scopus 로고
    • Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning
    • Pfister J, Toyoizumi T, Barber D, Gerstner W, (2006) Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning. Neural Comput 18: 1318-1348.
    • (2006) Neural Comput , vol.18 , pp. 1318-1348
    • Pfister, J.1    Toyoizumi, T.2    Barber, D.3    Gerstner, W.4
  • 30
    • 34249708388 scopus 로고    scopus 로고
    • Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity
    • Florian RV, (2007) Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Comput 19: 1468-1502.
    • (2007) Neural Comput , vol.19 , pp. 1468-1502
    • Florian, R.V.1
  • 31
    • 77955988359 scopus 로고    scopus 로고
    • Learning spike-based population codes by reward and population feedback
    • Friedrich J, Urbanczik R, Senn W, (2010) Learning spike-based population codes by reward and population feedback. Neural Comput 22: 1698-1717.
    • (2010) Neural Comput , vol.22 , pp. 1698-1717
    • Friedrich, J.1    Urbanczik, R.2    Senn, W.3
  • 32
    • 0347362917 scopus 로고    scopus 로고
    • Learning in spiking neural networks by reinforcement of stochastic synaptic transmission
    • Seung HS, (2003) Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron 40: 1063-1073.
    • (2003) Neuron , vol.40 , pp. 1063-1073
    • Seung, H.S.1
  • 33
    • 27144462270 scopus 로고    scopus 로고
    • Learning curves for stochastic gradient descent in linear feedfor-ward networks
    • Werfel J, Xie X, Seung HS, (2005) Learning curves for stochastic gradient descent in linear feedfor-ward networks. Neural Comput 17: 2699-2718.
    • (2005) Neural Comput , vol.17 , pp. 2699-2718
    • Werfel, J.1    Xie, X.2    Seung, H.S.3
  • 34
    • 0002070953 scopus 로고
    • Learning behavior and mixed-strategy Nash equilibria
    • Crawford VP, (1985) Learning behavior and mixed-strategy Nash equilibria. J Econ Behav Organ 6: 69-78.
    • (1985) J Econ Behav Organ , vol.6 , pp. 69-78
    • Crawford, V.P.1
  • 35
    • 38249030846 scopus 로고
    • On the instability of mixed-strategy Nash equilibria
    • Stahl DO, (1988) On the instability of mixed-strategy Nash equilibria. J Econ Behav Organ 9: 59-69.
    • (1988) J Econ Behav Organ , vol.9 , pp. 59-69
    • Stahl, D.O.1
  • 36
    • 0013315245 scopus 로고    scopus 로고
    • A re-examination of probability matching and ra-tional choice
    • Shanks DR, Tunney RJ, McCarthy JD, (2002) A re-examination of probability matching and ra-tional choice. J Behav Decis Making 15: 233-250.
    • (2002) J Behav Decis Making , vol.15 , pp. 233-250
    • Shanks, D.R.1    Tunney, R.J.2    McCarthy, J.D.3
  • 37
    • 33750041626 scopus 로고    scopus 로고
    • Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity
    • Loewenstein Y, Seung HS, (2006) Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity. Proc Natl Acad Sci U S A 103: 15224-15229.
    • (2006) Proc Natl Acad Sci U S A , vol.103 , pp. 15224-15229
    • Loewenstein, Y.1    Seung, H.S.2
  • 38
    • 70449718877 scopus 로고    scopus 로고
    • Operant matching as a Nash equilibrium of an in-tertemporal game
    • Loewenstein Y, Prelec D, Seung HS, (2009) Operant matching as a Nash equilibrium of an in-tertemporal game. Neural Comput 21: 2755-2773.
    • (2009) Neural Comput , vol.21 , pp. 2755-2773
    • Loewenstein, Y.1    Prelec, D.2    Seung, H.S.3
  • 39
    • 37749023538 scopus 로고    scopus 로고
    • The actor-critic learning is behind the matching law: matching versus optimal behaviors
    • Sakai Y, Fukai T, (2008) The actor-critic learning is behind the matching law: matching versus optimal behaviors. Neural Comput 20: 227-251.
    • (2008) Neural Comput , vol.20 , pp. 227-251
    • Sakai, Y.1    Fukai, T.2
  • 40
    • 27844539379 scopus 로고
    • Relative and absolute strength of response as a function of frequency of reinforcement
    • Herrnstein RJ, (1961) Relative and absolute strength of response as a function of frequency of reinforcement. J Exp Anal Behav 4: 267-272.
    • (1961) J Exp Anal Behav , vol.4 , pp. 267-272
    • Herrnstein, R.J.1
  • 41
    • 84866930679 scopus 로고    scopus 로고
    • Synaptic theory of replicator-like melioration
    • Loewenstein Y, (2010) Synaptic theory of replicator-like melioration. Front Comput Neurosci 4: 17.
    • (2010) Front Comput Neurosci , vol.4 , pp. 17
    • Loewenstein, Y.1
  • 42
    • 33645566919 scopus 로고    scopus 로고
    • A biophysically based neural model of matching law behavior: melio-ration by stochastic synapses
    • Soltani A, Wang XJ, (2006) A biophysically based neural model of matching law behavior: melio-ration by stochastic synapses. J Neurosci 26: 3731-3744.
    • (2006) J Neurosci , vol.26 , pp. 3731-3744
    • Soltani, A.1    Wang, X.J.2
  • 44
    • 0001281582 scopus 로고    scopus 로고
    • Do people play Nash equilibrium? Lessons from evolutionary game theory
    • Shanks DR, Tunney RJ, McCarthy JD, (1998) Do people play Nash equilibrium? Lessons from evolutionary game theory. J Econ Lit 36: 1-28.
    • (1998) J Econ Lit , vol.36 , pp. 1-28
    • Shanks, D.R.1    Tunney, R.J.2    McCarthy, J.D.3
  • 46
    • 4644369748 scopus 로고    scopus 로고
    • Nash Q-learning for general-sum stochastic games
    • Hu J, Wellman MP, (2003) Nash Q-learning for general-sum stochastic games. J Mach Learn Res 4: 1039-1069.
    • (2003) J Mach Learn Res , vol.4 , pp. 1039-1069
    • Hu, J.1    Wellman, M.P.2
  • 47
    • 1642570323 scopus 로고    scopus 로고
    • The Nash equilibrium: a perspective
    • Holt CA, Roth AE, (2004) The Nash equilibrium: a perspective. Proc Natl Acad Sci U S A 101: 3999-4002.
    • (2004) Proc Natl Acad Sci U S A , vol.101 , pp. 3999-4002
    • Holt, C.A.1    Roth, A.E.2
  • 50
    • 0029800695 scopus 로고    scopus 로고
    • How the brain keeps the eyes still
    • Seung HS, (1996) How the brain keeps the eyes still. Proc Natl Acad Sci U S A 93: 13339-13344.
    • (1996) Proc Natl Acad Sci U S A , vol.93 , pp. 13339-13344
    • Seung, H.S.1
  • 51
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams RJ, (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8: 229-256.
    • (1992) Mach Learn , vol.8 , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.