메뉴 건너뛰기




Volumn 171, Issue 7, 2007, Pages 382-391

Perspectives on multiagent learning

Author keywords

Game theory; Learning in games; Multiagent learning; Reinforcement learning

Indexed keywords

ALGORITHMS; COMPUTATIONAL COMPLEXITY; GAME THEORY; REINFORCEMENT LEARNING;

EID: 34249045960     PISSN: 00043702     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.artint.2007.02.004     Document Type: Article
Times cited : (35)

References (74)
  • 1
    • 33748599852 scopus 로고    scopus 로고
    • T. Abbott, D. Kane, P. Valiant, On the complexity of two-player win-lose games, in: Symposium on Foundations of Computer Science, 2005
  • 3
    • 9444299000 scopus 로고    scopus 로고
    • B. Banerjee, J. Peng, Performance bounded reinforcement learning in strategic interactions, in: National Conf. on Artificial Intelligence, 2004
  • 4
    • 84880876200 scopus 로고    scopus 로고
    • B. Banerjee, S. Sen, J. Peng, Fast concurrent reinforcement learners, in: Internat. Joint Conf. on Artificial Intelligence, 2001
  • 5
    • 33750693387 scopus 로고    scopus 로고
    • M. Benisch, G. Davis, T. Sandholm, Algorithms for rationalizability and CURB sets, in: National Conf. on Artificial Intelligence, 2006
  • 6
    • 33748692398 scopus 로고    scopus 로고
    • A. Blum, E. Even-Dar, K. Ligett, Routing without regret: On convergence to Nash equilibria of regret-minimizing algorithms in routing games, in: ACM Symposium on Principles of Distributed Computing, 2006
  • 7
    • 84899027977 scopus 로고    scopus 로고
    • M. Bowling, Convergence and no-regret in multiagent learning, in: Conf. on Neural Information Processing Systems, 2005
  • 8
    • 0036531878 scopus 로고    scopus 로고
    • Multiagent learning using a variable learning rate
    • Bowling M., and Veloso M. Multiagent learning using a variable learning rate. Artificial Intelligence 136 (2002) 215-250
    • (2002) Artificial Intelligence , vol.136 , pp. 215-250
    • Bowling, M.1    Veloso, M.2
  • 9
    • 0034247018 scopus 로고    scopus 로고
    • A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
    • Brafman R., and Tennenholtz M. A near-optimal polynomial time algorithm for learning in certain classes of stochastic games. Artificial Intelligence 121 (2000) 31-47
    • (2000) Artificial Intelligence , vol.121 , pp. 31-47
    • Brafman, R.1    Tennenholtz, M.2
  • 11
    • 0041965975 scopus 로고    scopus 로고
    • R-max-a general polynomial time algorithm for near-optimal reinforcement learning
    • Brafman R., and Tennenholtz M. R-max-a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3 (2003) 213-231
    • (2003) Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.1    Tennenholtz, M.2
  • 12
    • 4544271516 scopus 로고    scopus 로고
    • Efficient learning equilibrium
    • Earlier version in NIPS-02
    • Brafman R., and Tennenholtz M. Efficient learning equilibrium. Artificial Intelligence 159 (2004) 27-47 Earlier version in NIPS-02
    • (2004) Artificial Intelligence , vol.159 , pp. 27-47
    • Brafman, R.1    Tennenholtz, M.2
  • 13
    • 29344456477 scopus 로고    scopus 로고
    • R. Brafman, M. Tennenholtz, Optimal efficient learning equilibrium: Imperfect monitoring in symmetric games, in: National Conf. on Artificial Intelligence, 2005
  • 14
    • 4544279432 scopus 로고    scopus 로고
    • Y.-H. Chang, T. Ho, L. Kaelbling, Mobilized ad-hoc networks: A reinforcement learning approach, in: Internat. Conf. on Autonomic Computing, 2004
  • 15
    • 34248999852 scopus 로고    scopus 로고
    • X. Chen, X. Deng, Settling the complexity of 2-player Nash equilibrium, in: Electronic Colloquium on Computational Complexity, Report No. 150, 2005
  • 16
    • 0031630561 scopus 로고    scopus 로고
    • C. Claus, C. Boutilier, The dynamics of reinforcement learning in cooperative multiagent systems, in: National Conf. on Artificial Intelligence, 1998
  • 17
    • 1942452777 scopus 로고    scopus 로고
    • V. Conitzer, T. Sandholm, BL-WoLF: A framework for loss-bounded learnability in zero-sum games, in: Internat. Conf. on Machine Learning, 2003
  • 18
    • 84880852207 scopus 로고    scopus 로고
    • V. Conitzer, T. Sandholm, Complexity results about Nash equilibria, in: Internat. Joint Conf. on Artificial Intelligence, 2003
  • 19
    • 14344252185 scopus 로고    scopus 로고
    • V. Conitzer, T. Sandholm, Communication complexity as a lower bound for learning in games, in: Internat. Conf. on Machine Learning, 2004
  • 20
    • 30044441719 scopus 로고    scopus 로고
    • V. Conitzer, T. Sandholm, Complexity of (iterated) dominance, in: ACM Conf. on Electronic Commerce, 2005
  • 21
    • 29344453190 scopus 로고    scopus 로고
    • V. Conitzer, T. Sandholm, A generalized strategy eliminability criterion and computational methods for applying it, in: National Conf. on Artificial Intelligence, 2005
  • 22
    • 34147159616 scopus 로고    scopus 로고
    • AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
    • Special issue on Learning and Computational Game Theory Short version in ICML-03
    • Conitzer V., and Sandholm T. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Special issue on Learning and Computational Game Theory. Machine Learning 67 (2007) 23-43 Short version in ICML-03
    • (2007) Machine Learning , vol.67 , pp. 23-43
    • Conitzer, V.1    Sandholm, T.2
  • 23
    • 33748712836 scopus 로고    scopus 로고
    • V. Conitzer, T. Sandholm, Computing the optimal strategy to commit to, in: ACM Conf. on Electronic Commerce, 2006
  • 24
    • 20744454447 scopus 로고    scopus 로고
    • A. Flaxman, A. Kalai, B. McMahan, Online convex optimization in the bandit setting: Gradient descent without a gradient, in: ACM-SIAM Symposium on Discrete Algorithms, 2005
  • 26
    • 0002267135 scopus 로고    scopus 로고
    • Adaptive game playing using multiplicative weights
    • Freund Y., and Schapire R. Adaptive game playing using multiplicative weights. Games and Economic Behavior 29 (1999) 79-103
    • (1999) Games and Economic Behavior , vol.29 , pp. 79-103
    • Freund, Y.1    Schapire, R.2
  • 30
    • 45249127547 scopus 로고
    • Nash and correlated equilibria: Some complexity considerations
    • Gilboa I., and Zemel E. Nash and correlated equilibria: Some complexity considerations. Games and Economic Behavior 1 (1989) 80-93
    • (1989) Games and Economic Behavior , vol.1 , pp. 80-93
    • Gilboa, I.1    Zemel, E.2
  • 31
    • 34249084095 scopus 로고    scopus 로고
    • A. Gilpin, S. Hoda, J. Peña, T. Sandholm, Gradient-based algorithms for finding Nash equilibria in extensive form games, Mimeo, 2007
  • 32
    • 33748800293 scopus 로고    scopus 로고
    • A. Gilpin, T. Sandholm, Finding equilibria in large sequential games of imperfect information, in: ACM Conf. on Electronic Commerce, 2006
  • 33
    • 1942517280 scopus 로고    scopus 로고
    • A. Greenwald, K. Hall, Correlated Q-learning, in: Internat. Conf. on Machine Learning, 2003
  • 34
    • 34249025537 scopus 로고    scopus 로고
    • A. Greenwald, A. Jafari, A general class of no-regret learning algorithms and game-theoretic equilibria, in: Conf. on Learning Theory, 2003
  • 35
    • 34248994001 scopus 로고    scopus 로고
    • S. Hart, Y. Mansour, The communication complexity of uncoupled Nash equilibrium procedures, 2006, Draft
  • 36
    • 0000908510 scopus 로고    scopus 로고
    • A simple adaptive procedure leading to correlated equilibrium
    • Hart S., and Mas-Colell A. A simple adaptive procedure leading to correlated equilibrium. Econometrica 68 (2000) 1127-1150
    • (2000) Econometrica , vol.68 , pp. 1127-1150
    • Hart, S.1    Mas-Colell, A.2
  • 37
    • 2942744741 scopus 로고    scopus 로고
    • Uncoupled dynamics do not lead to Nash equilibrium
    • Hart S., and Mas-Colell A. Uncoupled dynamics do not lead to Nash equilibrium. American Economic Review 93 (2003) 1830-1836
    • (2003) American Economic Review , vol.93 , pp. 1830-1836
    • Hart, S.1    Mas-Colell, A.2
  • 38
  • 39
    • 34249001758 scopus 로고    scopus 로고
    • A. Jafari, A. Greenwald, D. Gondek, G. Ercal, On no-regret learning, fictitious play, and Nash equilibrium, in: Internat. Conf. on Machine Learning, 2001
  • 41
    • 0000221289 scopus 로고
    • Rational learning leads to Nash equilibrium
    • Kalai E., and Lehrer E. Rational learning leads to Nash equilibrium. Econometrica 61 5 (1993) 1019-1045
    • (1993) Econometrica , vol.61 , Issue.5 , pp. 1019-1045
    • Kalai, E.1    Lehrer, E.2
  • 42
    • 34249099432 scopus 로고    scopus 로고
    • M. Kearns, M. Littman, S. Singh, Graphical models for game theory, in: Conf. on Uncertainty in Artificial Intelligence, 2001
  • 46
    • 34249059413 scopus 로고    scopus 로고
    • M. Littman, Markov games as a framework for multi-agent reinforcement learning, in: Internat. Conf. on Machine Learning, 1994
  • 47
    • 34248995976 scopus 로고    scopus 로고
    • M. Littman, Friend or foe Q-learning in general-sum Markov games, in: Internat. Conf. on Machine Learning, 2001
  • 48
    • 0001547175 scopus 로고    scopus 로고
    • Value-function reinforcement learning in Markov games
    • Littman M. Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research 2 (2001) 55-66
    • (2001) Journal of Cognitive Systems Research , vol.2 , pp. 55-66
    • Littman, M.1
  • 49
    • 34249080445 scopus 로고    scopus 로고
    • M. Littman, C. Szepesvári, A generalized reinforcement-learning model: Convergence and applications, in: Internat. Conf. on Machine Learning, 1996
  • 50
    • 0038386340 scopus 로고    scopus 로고
    • The empirical Bayes envelope and regret minimization in competitive Markov decision processes
    • Mannor S., and Shimkin N. The empirical Bayes envelope and regret minimization in competitive Markov decision processes. Mathematics of Operations Research 28 2 (2003) 327-345
    • (2003) Mathematics of Operations Research , vol.28 , Issue.2 , pp. 327-345
    • Mannor, S.1    Shimkin, N.2
  • 51
    • 32844468744 scopus 로고    scopus 로고
    • P. McCracken, M. Bowling, Safe strategies for agent modelling in games, in: AAAI Fall Symposium on Artificial Multi-agent Learning, 2004
  • 52
    • 34249106519 scopus 로고    scopus 로고
    • B. McMahan, A. Blum, Online geometric optimization in the bandit setting against an adaptive adversary, in: Conf. on Learning Theory, 2004
  • 53
    • 0000927072 scopus 로고    scopus 로고
    • Prediction, optimization, and learning in games
    • Nachbar J. Prediction, optimization, and learning in games. Econometrica 65 (1997) 275-309
    • (1997) Econometrica , vol.65 , pp. 275-309
    • Nachbar, J.1
  • 54
    • 23044525979 scopus 로고    scopus 로고
    • Bayesian learning in repeated games of incomplete information
    • Nachbar J. Bayesian learning in repeated games of incomplete information. Social Choice and Welfare 18 (2001) 303-326
    • (2001) Social Choice and Welfare , vol.18 , pp. 303-326
    • Nachbar, J.1
  • 56
    • 20744453823 scopus 로고    scopus 로고
    • C. Papadimitriou, T. Roughgarden, Computing equilibria in multi-player games, in: Symposium on Discrete Algorithms, 2005
  • 57
    • 0036923099 scopus 로고    scopus 로고
    • K. Pivazyan, Y. Shoham, Polynomial-time reinforcement learning of near-optimal policies, in: National Conf. on Artificial Intelligence, 2002
  • 58
    • 9444249830 scopus 로고    scopus 로고
    • R. Porter, E. Nudelman, Y. Shoham, Simple search methods for finding a Nash equilibrium, in: National Conf. on Artificial Intelligence, 2004
  • 59
    • 33745609272 scopus 로고    scopus 로고
    • R. Powers, Y. Shoham, Learning against opponents with bounded memory, in: Internat. Joint Conf. on Artificial Intelligence, 2005
  • 60
    • 84898936075 scopus 로고    scopus 로고
    • R. Powers, Y. Shoham, New criteria and a new algorithm for learning in multi-agent systems, in: Conf. on Neural Information Processing Systems, 2005
  • 61
    • 0030050933 scopus 로고    scopus 로고
    • Multiagent reinforcement learning in the iterated prisoner's dilemma
    • special issue on the Prisoner's Dilemma. Early version in IJCAI-95 Workshop on Adaptation and Learning in Multiagent Systems
    • Sandholm T., and Crites R. Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems 37 (1996) 147-166 special issue on the Prisoner's Dilemma. Early version in IJCAI-95 Workshop on Adaptation and Learning in Multiagent Systems
    • (1996) Biosystems , vol.37 , pp. 147-166
    • Sandholm, T.1    Crites, R.2
  • 62
    • 29344453416 scopus 로고    scopus 로고
    • T. Sandholm, A. Gilpin, V. Conitzer, Mixed-integer programming methods for finding Nash equilibria, in: National Conf. on Artificial Intelligence, 2005
  • 63
    • 34249018059 scopus 로고    scopus 로고
    • T. Sandholm, M.V. Nagendra Prasad, Learning pursuit strategies, Project for CmpSci 698 Machine Learning, Computer Science Department, University of Massachusetts at Amherst, Spring, 1993
  • 64
    • 84880710441 scopus 로고    scopus 로고
    • J. Schaeffer, Y. Björnsson, N. Burch, A. Kishimoto, M. Müller, R. Lake, P. Lu, S. Sutphen, Solving checkers, in: Internat. Joint Conf. on Artificial Intelligence, 2005
  • 65
    • 34249095225 scopus 로고    scopus 로고
    • S. Singh, M. Kearns, Y. Mansour, Nash convergence of gradient dynamics in general-sum games, in: Conf. on Uncertainty in Artificial Intelligence, 2000
  • 66
    • 85131710903 scopus 로고    scopus 로고
    • F. Southey, M. Bowling, B. Larson, C. Piccione, N. Burch, D. Billings, C. Rayner, Bayes' bluff: Opponent modelling in poker, in: Conf. on Uncertainty in Artificial Intelligence, 2005
  • 67
    • 0031648211 scopus 로고    scopus 로고
    • Towards collaborative and adversarial learning: A case study in robotic soccer
    • Stone P., and Veloso M. Towards collaborative and adversarial learning: A case study in robotic soccer. International Journal of Human Computer Studies 48 (1998)
    • (1998) International Journal of Human Computer Studies , vol.48
    • Stone, P.1    Veloso, M.2
  • 68
    • 34249080936 scopus 로고    scopus 로고
    • M. Tan, Multi-agent reinforcement learning: Independent vs. cooperative agents, in: Internat. Conf. on Machine Learning, 1993
  • 69
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-gammon
    • Tesauro G. Temporal difference learning and TD-gammon. Communications of the ACM 38 3 (1995)
    • (1995) Communications of the ACM , vol.38 , Issue.3
    • Tesauro, G.1
  • 71
    • 0036930301 scopus 로고    scopus 로고
    • D. Vickrey, D. Koller, Multi-agent algorithms for solving graphical games, in: National Conf. on Artificial Intelligence, 2002
  • 72
    • 34249049304 scopus 로고    scopus 로고
    • X. Wang, T. Sandholm, Reinforcement learning to play an optimal Nash equilibrium in team Markov games, in: Conf. on Neural Information Processing Systems, 2002
  • 73
    • 34249014146 scopus 로고    scopus 로고
    • X. Wang, T. Sandholm, Learning near-Pareto-optimal conventions in polynomial time, in: Conf. on Neural Information Processing Systems, 2003
  • 74
    • 1942484421 scopus 로고    scopus 로고
    • M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, in: Internat. Conf. on Machine Learning, 2003


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.