메뉴 건너뛰기




Volumn 44, Issue 2, 2006, Pages 495-514

Individual Q-learning in normal form games

Author keywords

Multi agent learning; Normal form games; Player dependent learning rates; Reinforcement learning; Stochastic approximation

Indexed keywords

ALGORITHMS; GAME THEORY; ITERATIVE METHODS; PROBLEM SOLVING;

EID: 33645029191     PISSN: 03630129     EISSN: None     Source Type: Journal    
DOI: 10.1137/S0363012903437976     Document Type: Article
Times cited : (137)

References (38)
  • 2
    • 0002430114 scopus 로고
    • Subjectivity and correlation in randomized strategies
    • R. J. AUMANN (1974), Subjectivity and correlation in randomized strategies, J. Math. Econom., 1, pp. 67-96.
    • (1974) J. Math. Econom. , vol.1 , pp. 67-96
    • Aumann, R.J.1
  • 3
    • 0038623721 scopus 로고
    • On pseudo-games
    • A. BAÑDS (1968), On pseudo-games, Ann. Math. Statist., 39, pp. 1932-1945.
    • (1968) Ann. Math. Statist. , vol.39 , pp. 1932-1945
    • Bañds, A.1
  • 4
    • 0001793657 scopus 로고    scopus 로고
    • Dynamics of stochastic approximation algorithms
    • Le Séminaire de Probabilités, Springer-Verlag, Berlin
    • M. BENAÏM (1999), Dynamics of stochastic approximation algorithms, in Le Séminaire de Probabilités, Lecture Notes in Math. 1709, Springer-Verlag, Berlin, pp. 1-68.
    • (1999) Lecture Notes in Math. , vol.1709 , pp. 1-68
    • Benaïm, M.1
  • 5
    • 0002277539 scopus 로고    scopus 로고
    • Mixed equilibria and dynamical systems arising from fictitious play in perturbed games
    • M. BENAÏM AND M. W. HIRSCH (1999), Mixed equilibria and dynamical systems arising from fictitious play in perturbed games, Games Econom. Behav., 29, pp. 36-72.
    • (1999) Games Econom. Behav. , vol.29 , pp. 36-72
    • Benaïm, M.1    Hirsch, M.W.2
  • 6
    • 12444269117 scopus 로고    scopus 로고
    • Fictitious play inlxn games
    • U. BERGER (2005), Fictitious play inlxn games, J. Econom. Theory, 120, pp. 139-154.
    • (2005) J. Econom. Theory , vol.120 , pp. 139-154
    • Berger, U.1
  • 7
    • 0031281590 scopus 로고    scopus 로고
    • Learning through reinforcement and replicator dynamics
    • T. BÖRGERS AND R. SARIN (1997), Learning through reinforcement and replicator dynamics, J. Econom. Theory, 77, pp. 1-14.
    • (1997) J. Econom. Theory , vol.77 , pp. 1-14
    • Börgers, T.1    Sarin, R.2
  • 9
    • 0036531878 scopus 로고    scopus 로고
    • Multiagent learning using a variable learning rate
    • M. BOWLING AND M. VELOSO (2002), Multiagent learning using a variable learning rate, Artificial Intelligence, 136, pp. 215-250.
    • (2002) Artificial Intelligence , vol.136 , pp. 215-250
    • Bowling, M.1    Veloso, M.2
  • 10
    • 0000719863 scopus 로고
    • Packet routing in dynamically changing networks: A reinforcement learning approach
    • J. D. Cowan, G. Tesauro, and J. Alspector, eds., Morgan Kaufmann, San Francisco
    • J. A. BOYAN AND M. L. LITTMAN (1994), Packet routing in dynamically changing networks: A reinforcement learning approach, in Advances in Neural Information Processing Systems, Vol. 6, J. D. Cowan, G. Tesauro, and J. Alspector, eds., Morgan Kaufmann, San Francisco, pp. 671-678.
    • (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 671-678
    • Boyan, J.A.1    Littman, M.L.2
  • 11
    • 0002672918 scopus 로고
    • Iterative solution of games by fictitious play
    • T. C. Koopmans, ed., John Wiley &; Sons, New York
    • G. W. BROWN (1951), Iterative solution of games by fictitious play, in Activity Analysis of Production and Allocation, T. C. Koopmans, ed., John Wiley &; Sons, New York, pp. 374-376.
    • (1951) Activity Analysis of Production and Allocation , pp. 374-376
    • Brown, G.W.1
  • 12
    • 0011595015 scopus 로고
    • Stochastic approximation procedures with randomly varying truncations
    • H.-F. CHEN AND Y.-M. ZHU (1986), Stochastic approximation procedures with randomly varying truncations, Sci. China Ser. A, 29, pp. 914-926.
    • (1986) Sci. China Ser. A , vol.29 , pp. 914-926
    • Chen, H.-F.1    Zhu, Y.-M.2
  • 14
    • 0032208335 scopus 로고    scopus 로고
    • Elevator group control using multiple reinforcement learning agents
    • R. H. CRITES AND A. G. BARTO (1998), Elevator group control using multiple reinforcement learning agents, Machine Learning, 33, pp. 235-262.
    • (1998) Machine Learning , vol.33 , pp. 235-262
    • Crites, R.H.1    Barto, A.G.2
  • 15
    • 0141838158 scopus 로고    scopus 로고
    • Learning, hypothesis testing, and Nash equilibrium
    • D. P. FOSTER AND H. P. YOUNG (2003), Learning, hypothesis testing, and Nash equilibrium, Games Econom. Behav., 45, pp. 73-96.
    • (2003) Games Econom. Behav. , vol.45 , pp. 73-96
    • Foster, D.P.1    Young, H.P.2
  • 18
    • 22944471739 scopus 로고    scopus 로고
    • Physiological utility theory and the neuroeconomics of choice
    • to appear
    • P. W. GLIMCHER, M. C. DORRIS, AND H. M. BAYER (2005), Physiological utility theory and the neuroeconomics of choice, Games Econom. Behav., to appear.
    • (2005) Games Econom. Behav.
    • Glimcher, P.W.1    Dorris, M.C.2    Bayer, H.M.3
  • 19
    • 0242570473 scopus 로고    scopus 로고
    • A short proof of Harsanyi's purification theorem
    • S. GOVINDAN, P. J. RENY, AND A. J. ROBSON (2003), A short proof of Harsanyi's purification theorem, Games Econom. Behav., 45, pp. 369-374.
    • (2003) Games Econom. Behav. , vol.45 , pp. 369-374
    • Govindan, S.1    Reny, P.J.2    Robson, A.J.3
  • 20
    • 0001976283 scopus 로고
    • Approximation to Bayes risk in repeated play
    • Contributions to the Theory of Games, M. Drescher, A. W. Tucker, and P. Wolfe, eds., Princeton University Press, Princeton, NJ
    • J. HANNAN (1957), Approximation to Bayes risk in repeated play, in Contributions to the Theory of Games, Vol. 3, Ann. of Math. Stud. 39, M. Drescher, A. W. Tucker, and P. Wolfe, eds., Princeton University Press, Princeton, NJ, pp. 97-139.
    • (1957) Ann. of Math. Stud. , vol.3-39 , pp. 97-139
    • Hannan, J.1
  • 21
    • 0003161771 scopus 로고
    • Games with randomly disturbed payoffs: A new rationale for mixed-strategy equilibrium points
    • J. C. HARSANYI (1973), Games with randomly disturbed payoffs: A new rationale for mixed-strategy equilibrium points, Internat. J. Game Theory, 2, pp. 1-23.
    • (1973) Internat. J. Game Theory , vol.2 , pp. 1-23
    • Harsanyi, J.C.1
  • 22
    • 0000908510 scopus 로고    scopus 로고
    • A simple adaptive procedure leading to correlated equilibrium
    • S. HART AND A. MAS-COLELL (2000), A simple adaptive procedure leading to correlated equilibrium, Econometrica, 68, pp. 1127-1150.
    • (2000) Econometrica , vol.68 , pp. 1127-1150
    • Hart, S.1    Mas-Colell, A.2
  • 23
    • 0242684983 scopus 로고    scopus 로고
    • A reinforcement procedure leading to correlated equilibrium
    • W. N. G. Debreu and W. Trockel, eds., Springer-Verlag, Berlin
    • S. HART AND A. MAS-COLELL (2001), A reinforcement procedure leading to correlated equilibrium, in Economic Essays: A Festschrift for Werner Hildenbrand, W. N. G. Debreu and W. Trockel, eds., Springer-Verlag, Berlin, pp. 181-200.
    • (2001) Economic Essays: A Festschrift for Werner Hildenbrand , pp. 181-200
    • Hart, S.1    Mas-Colell, A.2
  • 24
    • 2942744741 scopus 로고    scopus 로고
    • Uncoupled dynamics cannot lead to Nash equilibrium
    • S. HART AND A. MAS-COLELL (2003), Uncoupled dynamics cannot lead to Nash equilibrium, Amer. Econom. Rev., 93, pp. 1830-1836.
    • (2003) Amer. Econom. Rev. , vol.93 , pp. 1830-1836
    • Hart, S.1    Mas-Colell, A.2
  • 25
    • 20344390000 scopus 로고    scopus 로고
    • Learning in perturbed asymmetric games
    • J. HOFBAUER AND E. HOPKINS (2005), Learning in perturbed asymmetric games, Games Econom. Behav., 52, pp. 133-152.
    • (2005) Games Econom. Behav. , vol.52 , pp. 133-152
    • Hofbauer, J.1    Hopkins, E.2
  • 26
    • 0000415605 scopus 로고
    • Three problems in learning mixed strategy equilibria
    • J. S. JORDAN (1993), Three problems in learning mixed strategy equilibria, Games Econom. Behav., 5, pp. 368-386.
    • (1993) Games Econom. Behav. , vol.5 , pp. 368-386
    • Jordan, J.S.1
  • 27
    • 0141503453 scopus 로고    scopus 로고
    • Multi-agent influence diagrams for representing and solving games
    • D. KOLLER AND B. MILCH (2003), Multi-agent influence diagrams for representing and solving games, Games Econom. Behav., 45, pp. 181-221.
    • (2003) Games Econom. Behav. , vol.45 , pp. 181-221
    • Koller, D.1    Milch, B.2
  • 28
    • 0345532155 scopus 로고    scopus 로고
    • Stochastic approximation algorithms and applications
    • Springer-Verlag, New York
    • H. J. KUSHNER AND G. G. YIN (1997), Stochastic Approximation Algorithms and Applications, Appl. Math. 35, Springer-Verlag, New York.
    • (1997) Appl. Math. , vol.35
    • Kushner, H.J.1    Yin, G.G.2
  • 29
    • 0346913265 scopus 로고    scopus 로고
    • Convergent multiple-timescales reinforcement learning algorithms in normal form games
    • D. S. LESLIE AND E. J. COLLINS (2003), Convergent multiple-timescales reinforcement learning algorithms in normal form games, Ann. Appl. Probab., 13, pp. 1231-1251.
    • (2003) Ann. Appl. Probab. , vol.13 , pp. 1231-1251
    • Leslie, D.S.1    Collins, E.J.2
  • 30
    • 0242635258 scopus 로고    scopus 로고
    • An efficient, exact algorithm for solving tree-structured graphical games
    • T. G. Dietterich, S. Becker, and Z. Ghahramani, eds., MIT Press, Cambridge, MA
    • M. L. LITTMAN, M. KEARNS, AND S. SINGH (2001), An efficient, exact algorithm for solving tree-structured graphical games, in Advances in Neural Information Processing Systems, Vol. 14, T. G. Dietterich, S. Becker, and Z. Ghahramani, eds., MIT Press, Cambridge, MA.
    • (2001) Advances in Neural Information Processing Systems , vol.14
    • Littman, M.L.1    Kearns, M.2    Singh, S.3
  • 32
    • 0038675791 scopus 로고
    • On repeated games with incomplete information played by non-Bayesian players
    • N. MEGIDDO (1980), On repeated games with incomplete information played by non-Bayesian players, Internat. J. Game Theory, 9, pp. 157-167.
    • (1980) Internat. J. Game Theory , vol.9 , pp. 157-167
    • Megiddo, N.1
  • 33
    • 0002021736 scopus 로고
    • Equilibrium points in n-person games
    • J. NASH (1950), Equilibrium points in n-person games, Proc. Natl. Acad. Sci. USA, 36, pp. 48-49.
    • (1950) Proc. Natl. Acad. Sci. USA , vol.36 , pp. 48-49
    • Nash, J.1
  • 34
    • 0001000786 scopus 로고
    • Nonconvergence to unstable points in urn models and stochastic approximations
    • R. PEMANTLE (1990), Nonconvergence to unstable points in urn models and stochastic approximations, Ann. Probab., 18, pp. 698-712.
    • (1990) Ann. Probab. , vol.18 , pp. 698-712
    • Pemantle, R.1
  • 35
    • 58149324992 scopus 로고
    • Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term
    • A. E. ROTH AND I. EREV (1995), Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term, Games Econom. Behav., 8, pp. 164-212.
    • (1995) Games Econom. Behav. , vol.8 , pp. 164-212
    • Roth, A.E.1    Erev, I.2
  • 36
    • 0002623794 scopus 로고
    • Some topics in two person games
    • M. Drescher, L. S. Shapley, and A. W. Tucker, eds., Princeton University Press, Princeton, NJ
    • L. S. SHAPLEY (1964), Some topics in two person games, in Advances in Game Theory, M. Drescher, L. S. Shapley, and A. W. Tucker, eds., Princeton University Press, Princeton, NJ, pp. 1-28.
    • (1964) Advances in Game Theory , pp. 1-28
    • Shapley, L.S.1
  • 37
    • 84898972974 scopus 로고    scopus 로고
    • Reinforcement learning for dynamic channel allocation in cellular telephone systems
    • M. C. Mozer, M. I. Jordan, and T. Petsche, eds., MIT Press, Cambridge, MA
    • S. SINGH AND D. BERTSEKAS (1997), Reinforcement learning for dynamic channel allocation in cellular telephone systems, in Advances in Neural Information Processing Systems, Vol. 9, M. C. Mozer, M. I. Jordan, and T. Petsche, eds., MIT Press, Cambridge, MA, p. 974.
    • (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 974
    • Singh, S.1    Bertsekas, D.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.