메뉴 건너뛰기




Volumn 122, Issue 1, 2005, Pages 1-36

On the convergence of reinforcement learning

Author keywords

Games; Reinforcement learning

Indexed keywords


EID: 16244410118     PISSN: 00220531     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.jet.2004.03.008     Document Type: Article
Times cited : (153)

References (39)
  • 1
    • 0001784118 scopus 로고
    • On designing economic agents that behave like human agents
    • W.B. Arthur On designing economic agents that behave like human agents J. Evolutionary Econ. 3 1993 1-22
    • (1993) J. Evolutionary Econ. , vol.3 , pp. 1-22
    • Arthur, W.B.1
  • 2
    • 0042496192 scopus 로고    scopus 로고
    • Gambling in a rigged casino: The adversarial multi-armed bandit problem
    • Mimeo, AT&T laboratories
    • P. Auer, N. Cesa-Bianchi, Y. Freund, R. Schapire, Gambling in a rigged casino: The adversarial multi-armed bandit problem, Mimeo, AT&T laboratories, 1998.
    • (1998)
    • Auer, P.1    Cesa-Bianchi, N.2    Freund, Y.3    Schapire, R.4
  • 3
    • 0001793657 scopus 로고    scopus 로고
    • Dynamics of stochastic approximation algorithms
    • Seminaire de Probabilités, XXXIII, Springer, Berlin
    • M. Benaïm, Dynamics of stochastic approximation algorithms, in: Seminaire de Probabilités, XXXIII, Lecture Notes in Mathematics, vol. 1709, Springer, Berlin, 1999, pp. 1-68.
    • (1999) Lecture Notes in Mathematics , vol.1709 , pp. 1-68
    • Benaïm, M.1
  • 4
    • 0002277539 scopus 로고    scopus 로고
    • Mixed equilibria and dynamical systems arising from fictitious play in perturbed games
    • M. Benaïm M. Hirsch Mixed equilibria and dynamical systems arising from fictitious play in perturbed games Games Econ. Behav. 29 1999 36-72
    • (1999) Games Econ. Behav. , vol.29 , pp. 36-72
    • Benaïm, M.1    Hirsch, M.2
  • 6
    • 0003070025 scopus 로고    scopus 로고
    • Nash equilibrium and evolution by imitation
    • K. Arrow et al. (Eds.) Macmillan London
    • J. Bjornerstedt J. Weibull Nash equilibrium and evolution by imitation in: K. Arrow et al. (Eds.) The Rational Foundations of Economic Behaviour 1996 Macmillan London pp. 155-171
    • (1996) The Rational Foundations of Economic Behaviour , pp. 155-171
    • Bjornerstedt, J.1    Weibull, J.2
  • 7
    • 0031281590 scopus 로고    scopus 로고
    • Learning through reinforcement and replicator dynamics
    • T. Börgers R. Sarin Learning through reinforcement and replicator dynamics J. Econ. Theory 77 1997 1-14
    • (1997) J. Econ. Theory , vol.77 , pp. 1-14
    • Börgers, T.1    Sarin, R.2
  • 8
    • 0001052668 scopus 로고    scopus 로고
    • Les algorithmes stochastiques contournent-ils les pieges
    • O. Brandiere M. Duflo Les algorithmes stochastiques contournent-ils les pieges Ann. Inst. Henri Poincare 32 1996 395-427
    • (1996) Ann. Inst. Henri Poincare , vol.32 , pp. 395-427
    • Brandiere, O.1    Duflo, M.2
  • 9
    • 18644365144 scopus 로고    scopus 로고
    • Experience-weighted attraction learning in normal form games
    • C. Camerer T.-H. Ho Experience-weighted attraction learning in normal form games Econometrica 67 1999 827-874
    • (1999) Econometrica , vol.67 , pp. 827-874
    • Camerer, C.1    Ho, T.-H.2
  • 10
    • 0031647921 scopus 로고    scopus 로고
    • Convergence rate of stochastic algorithms in degenerate cases
    • H.-F. Chen Convergence rate of stochastic algorithms in degenerate cases SIAM J. Control Optim. 36 1998 100-114
    • (1998) SIAM J. Control Optim. , vol.36 , pp. 100-114
    • Chen, H.-F.1
  • 13
    • 0024731334 scopus 로고
    • Stochastic approximations and large deviations: Upper bounds and w.p.1 convergence
    • P. Dupuis H. Kushner Stochastic approximations and large deviations : Upper bounds and w.p.1 convergence SIAM J. Control Optim. 27 1989 1108-1135
    • (1989) SIAM J. Control Optim. , vol.27 , pp. 1108-1135
    • Dupuis, P.1    Kushner, H.2
  • 14
    • 0038829878 scopus 로고    scopus 로고
    • Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria
    • I. Erev A. Roth Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria Amer. Econ. Rev. 88 1998 848-881
    • (1998) Amer. Econ. Rev. , vol.88 , pp. 848-881
    • Erev, I.1    Roth, A.2
  • 15
    • 0001370536 scopus 로고
    • Bernard friedman's urn
    • D. Freedman Bernard friedman's urn Ann. Math. Statist. 36 1965 956-970
    • (1965) Ann. Math. Statist. , vol.36 , pp. 956-970
    • Freedman, D.1
  • 18
    • 0003983811 scopus 로고
    • Huntingdon, New York: Robert E. Krieger Publishing Co
    • J. Hale Ordinary Differential Equations 1980 Robert E. Krieger Publishing Co. Huntingdon, New York
    • (1980) Ordinary Differential Equations
    • Hale, J.1
  • 20
    • 0019885790 scopus 로고
    • Learning the evolutionarily stable strategy
    • C. Harley Learning the evolutionarily stable strategy J. Theoret. Biol. 89 1981 611-633
    • (1981) J. Theoret. Biol. , vol.89 , pp. 611-633
    • Harley, C.1
  • 21
    • 0000908510 scopus 로고    scopus 로고
    • A simple adaptive procedure leading to correlated equilibrium
    • S. Hart A. Mas-Collel A simple adaptive procedure leading to correlated equilibrium Econometrica 68 2000 1127-1150
    • (2000) Econometrica , vol.68 , pp. 1127-1150
    • Hart, S.1    Mas-Collel, A.2
  • 22
    • 0013327463 scopus 로고    scopus 로고
    • A general class of adaptive strategies
    • S. Hart A. Mas-Collel A general class of adaptive strategies J. Econ. Theory 98 2001 26-54
    • (2001) J. Econ. Theory , vol.98 , pp. 26-54
    • Hart, S.1    Mas-Collel, A.2
  • 23
    • 0242684983 scopus 로고    scopus 로고
    • A reinforcement procedure leading to correlated equilibrium
    • Mimeo, Hebrew University
    • S. Hart, A. Mas-Collel, A reinforcement procedure leading to correlated equilibrium, Mimeo, Hebrew University, 2001.
    • (2001)
    • Hart, S.1    Mas-Collel, A.2
  • 24
    • 0000559084 scopus 로고
    • A strong law for some generalized urn processes
    • B. Hill D. Lane W. Sudderth A strong law for some generalized urn processes Ann. Probab. 8 1980 214-226
    • (1980) Ann. Probab. , vol.8 , pp. 214-226
    • Hill, B.1    Lane, D.2    Sudderth, W.3
  • 25
    • 0034348033 scopus 로고    scopus 로고
    • Sophisticated imitation in cyclic games
    • J. Hofbauer K. Schlag Sophisticated imitation in cyclic games J. Evolutionary Econ. 10 2000 523-543
    • (2000) J. Evolutionary Econ. , vol.10 , pp. 523-543
    • Hofbauer, J.1    Schlag, K.2
  • 27
    • 0036434064 scopus 로고    scopus 로고
    • Two competing models of how people learn in games
    • E. Hopkins Two competing models of how people learn in games Econometrica 70 2001 2141-2166
    • (2001) Econometrica , vol.70 , pp. 2141-2166
    • Hopkins, E.1
  • 28
    • 26844454413 scopus 로고    scopus 로고
    • Reinforcement learning and the power law of practice
    • Mimeo, University of Southampton
    • A. Ianni, Reinforcement learning and the power law of practice, Mimeo, University of Southampton, 2001.
    • (2001)
    • Ianni, A.1
  • 30
    • 0000199420 scopus 로고
    • Adaptive learning with nonlinear dynamics driven by dependent processes
    • C.-M. Kuan H. White Adaptive learning with nonlinear dynamics driven by dependent processes Econometrica 62 1994 1087-1114
    • (1994) Econometrica , vol.62 , pp. 1087-1114
    • Kuan, C.-M.1    White, H.2
  • 34
    • 0001000786 scopus 로고
    • Nonconvergence to unstable points in urn models and stochastic approximations
    • R. Pemantle Nonconvergence to unstable points in urn models and stochastic approximations Ann. Probab. 18 1990 698-712
    • (1990) Ann. Probab. , vol.18 , pp. 698-712
    • Pemantle, R.1
  • 35
    • 0033164053 scopus 로고    scopus 로고
    • Vertex-reinforced random walk on z has finite range
    • R. Pemantle S. Volkov Vertex-reinforced random walk on z has finite range Ann. Probab. 27 1999 1368-1388
    • (1999) Ann. Probab. , vol.27 , pp. 1368-1388
    • Pemantle, R.1    Volkov, S.2
  • 36
    • 0031287487 scopus 로고    scopus 로고
    • Cycling in a stochastic learning algorithm for normal form games
    • M. Posch Cycling in a stochastic learning algorithm for normal form games J. Evolutionary Econ. 7 1997 193-207
    • (1997) J. Evolutionary Econ. , vol.7 , pp. 193-207
    • Posch, M.1
  • 37
    • 58149324992 scopus 로고
    • Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term
    • A. Roth I. Erev Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term Games Econ. Behav. 8 1995 164-212
    • (1995) Games Econ. Behav. , vol.8 , pp. 164-212
    • Roth, A.1    Erev, I.2
  • 38
    • 0001703679 scopus 로고    scopus 로고
    • Optimal properties of stimulus-response learning models
    • A. Rustichini Optimal properties of stimulus-response learning models Games Econ. Behav. 29 1999 244-273
    • (1999) Games Econ. Behav. , vol.29 , pp. 244-273
    • Rustichini, A.1
  • 39
    • 0034255271 scopus 로고    scopus 로고
    • A dynamic model of social network formation
    • B. Skyrms R. Pemantle A dynamic model of social network formation Proc. Nat. Acad. Sci. 97 2000 9340-9346
    • (2000) Proc. Nat. Acad. Sci. , vol.97 , pp. 9340-9346
    • Skyrms, B.1    Pemantle, R.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.