메뉴 건너뛰기




Volumn 38, Issue 2, 2008, Pages 156-172

A comprehensive survey of multiagent reinforcement learning

Author keywords

Distributed control; Game theory; Multiagent systems; Reinforcement learning

Indexed keywords

ALGORITHMS; ARTIFICIAL INTELLIGENCE; GAME THEORY; REINFORCEMENT LEARNING;

EID: 40949147745     PISSN: 10946977     EISSN: None     Source Type: Journal    
DOI: 10.1109/TSMCC.2007.913919     Document Type: Review
Times cited : (2045)

References (132)
  • 2
    • 40949094373 scopus 로고    scopus 로고
    • Sep.) A concise introduction to multiagent systems and distributed AI, Fac. Sci. Univ. Amsterdam, Amsterdam, The Netherlands
    • Tech. Rep, Online, Available
    • N. Vlassis. (2003, Sep.) A concise introduction to multiagent systems and distributed AI, Fac. Sci. Univ. Amsterdam, Amsterdam, The Netherlands, Tech. Rep. [Online]. Available: http://www.science.uva.nl/ ~vlassis/cimasdai/cimasdai.pdf
    • (2003)
    • Vlassis, N.1
  • 4
    • 0034205975 scopus 로고    scopus 로고
    • Multiagent systems: A survey from the machine learning perspective
    • P. Stone and M. Veloso, "Multiagent systems: A survey from the machine learning perspective," Auton. Robots, vol. 8, no. 3, pp. 345-383, 2000.
    • (2000) Auton. Robots , vol.8 , Issue.3 , pp. 345-383
    • Stone, P.1    Veloso, M.2
  • 5
    • 0032208335 scopus 로고    scopus 로고
    • Elevator group control using multiple reinforcement learning agents
    • R. H. Crites and A. G. Barto, "Elevator group control using multiple reinforcement learning agents," Mach. Learn., vol. 33, no. 2-3, pp. 235-262, 1998.
    • (1998) Mach. Learn , vol.33 , Issue.2-3 , pp. 235-262
    • Crites, R.H.1    Barto, A.G.2
  • 13
    • 0036531878 scopus 로고    scopus 로고
    • Multiagent learning using a variable learning rate
    • M. Bowling and M. Veloso, "Multiagent learning using a variable learning rate," Artif. Intell., vol. 136, no. 2, pp. 215-250, 2002.
    • (2002) Artif. Intell , vol.136 , Issue.2 , pp. 215-250
    • Bowling, M.1    Veloso, M.2
  • 14
    • 4544279348 scopus 로고    scopus 로고
    • May, Comput. Sci. Dept, Stanford Univ, Stanford, CA, Tech. Rep, Online, Available
    • Y. Shoham, R. Powers, and T. Grenager. (2003, May). "Multi-agent reinforcement learning: A critical survey," Comput. Sci. Dept., Stanford Univ., Stanford, CA, Tech. Rep. [Online]. Available: http:// multiagent.stanford.edu/papers/MALearning_ACriticalSurvey_2003_0516.pdf
    • (2003) Multi-agent reinforcement learning: A critical survey
    • Shoham, Y.1    Powers, R.2    Grenager, T.3
  • 17
    • 26444601262 scopus 로고    scopus 로고
    • Cooperative multi-agent learning: The state of the art
    • Nov
    • L. Panait and S. Luke, "Cooperative multi-agent learning: The state of the art," Auton. Agents Multi-Agent Syst., vol. 11, no. 3, pp. 387-434, Nov. 2005.
    • (2005) Auton. Agents Multi-Agent Syst , vol.11 , Issue.3 , pp. 387-434
    • Panait, L.1    Luke, S.2
  • 18
    • 85027492413 scopus 로고
    • A cooperative coevolutionary approach to function optimization
    • Jerusalem, Israel, Oct. 9-14
    • M. A. Potter and K. A. D. Jong, "A cooperative coevolutionary approach to function optimization," in Proc. 3rd Conf. Parallel Probl. Solving Nat. (PPSN-III), Jerusalem, Israel, Oct. 9-14, 1994, pp. 249-257.
    • (1994) Proc. 3rd Conf. Parallel Probl. Solving Nat. (PPSN-III) , pp. 249-257
    • Potter, M.A.1    Jong, K.A.D.2
  • 22
    • 0032208296 scopus 로고    scopus 로고
    • Learning team strategies: Soccer case studies
    • R. Salustowicz, M. Wiering, and J. Schmidhuber, "Learning team strategies: Soccer case studies," Mach. Learn., vol. 33, no. 2-3, pp. 263-282, 1998.
    • (1998) Mach. Learn , vol.33 , Issue.2-3 , pp. 263-282
    • Salustowicz, R.1    Wiering, M.2    Schmidhuber, J.3
  • 23
    • 84880763288 scopus 로고    scopus 로고
    • When evolving populations is better than coevolving individuals: The blind mice problem
    • Acapulco, Mexico, Aug. 9-15, pp
    • T. Miconi, "When evolving populations is better than coevolving individuals: The blind mice problem," in Proc. 18th Int. Joint Conf. Artif. Intell. (IJCAI-03), Acapulco, Mexico, Aug. 9-15, pp. 647-652.
    • Proc. 18th Int. Joint Conf. Artif. Intell. (IJCAI-03) , pp. 647-652
    • Miconi, T.1
  • 24
    • 35048865922 scopus 로고    scopus 로고
    • Gradient based method for symmetric and asymmetric multiagent reinforcement learning
    • Hong Kong, China, Mar. 21-23, pp
    • V. Könönen, "Gradient based method for symmetric and asymmetric multiagent reinforcement learning," in Proc. 4th Int. Conf. Intell. Data Eng. Autom. Learn. (IDEAL-03), Hong Kong, China, Mar. 21-23, pp. 68-75.
    • Proc. 4th Int. Conf. Intell. Data Eng. Autom. Learn. (IDEAL-03) , pp. 68-75
    • Könönen, V.1
  • 25
    • 0032207350 scopus 로고    scopus 로고
    • Learning coordination strategies for cooperative multiagent systems
    • F. Ho and M. Kamel, "Learning coordination strategies for cooperative multiagent systems," Mach. Learn., vol. 33, no. 2-3, pp. 155-177, 1998.
    • (1998) Mach. Learn , vol.33 , Issue.2-3 , pp. 155-177
    • Ho, F.1    Kamel, M.2
  • 26
    • 0007918330 scopus 로고    scopus 로고
    • A general method for incremental self-improvement and multi-agent learning
    • X. Yao, Ed. Singapore: World Scientific, ch. 3, pp
    • J. Schmidhuber, "A general method for incremental self-improvement and multi-agent learning," in Evolutionary Computation: Theory and Applications, X. Yao, Ed. Singapore: World Scientific, 1999, ch. 3, pp. 81-123.
    • (1999) Evolutionary Computation: Theory and Applications , pp. 81-123
    • Schmidhuber, J.1
  • 28
    • 28544446213 scopus 로고    scopus 로고
    • Evolutionary game theory and multi-agent re-inforcement learning
    • K. Tuyls and A. Nowé, "Evolutionary game theory and multi-agent re-inforcement learning," Knowl. Eng. Rev., vol. 20, no. 1, pp. 63-90, 2005.
    • (2005) Knowl. Eng. Rev , vol.20 , Issue.1 , pp. 63-90
    • Tuyls, K.1    Nowé, A.2
  • 33
    • 0000955979 scopus 로고    scopus 로고
    • Incremental multi-step Q-learning
    • J. Peng and R. J. Williams, "Incremental multi-step Q-learning," Mach. Learn., vol. 22, no. 1-3, pp. 283-290, 1996.
    • (1996) Mach. Learn , vol.22 , Issue.1-3 , pp. 283-290
    • Peng, J.1    Williams, R.J.2
  • 34
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, pp. 9-44, 1988.
    • (1988) Mach. Learn , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 35
    • 0020970738 scopus 로고
    • Neuronlike adaptive elements that can solve difficult learning control problems
    • Sep./Oct
    • A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Trans. Syst., Man, Cybern., vol. SMC-5, no. 5, pp. 843-846, Sep./Oct. 1983.
    • (1983) IEEE Trans. Syst., Man, Cybern , vol.SMC-5 , Issue.5 , pp. 843-846
    • Barto, A.G.1    Sutton, R.S.2    Anderson, C.W.3
  • 36
    • 85132026293 scopus 로고    scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Austin, TX, Jun. 21-23, pp
    • R. S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," in Proc. 7th Int. Conf. Mach. Learn. (ICML-90), Austin, TX, Jun. 21-23, pp. 216-224.
    • Proc. 7th Int. Conf. Mach. Learn. (ICML-90) , pp. 216-224
    • Sutton, R.S.1
  • 37
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less time
    • A. W. Moore and C. G. Atkeson, "Prioritized sweeping: Reinforcement learning with less data and less time," Mach. Learn., vol. 13, pp. 103-130, 1993.
    • (1993) Mach. Learn , vol.13 , pp. 103-130
    • Moore, A.W.1    Atkeson, C.G.2
  • 38
    • 0001547175 scopus 로고    scopus 로고
    • Value-function reinforcement learning in Markov games
    • M. L. Littman, "Value-function reinforcement learning in Markov games," J. Cogn. Syst. Res., vol. 2, no. 1, pp. 55-66, 2001.
    • (2001) J. Cogn. Syst. Res , vol.2 , Issue.1 , pp. 55-66
    • Littman, M.L.1
  • 39
    • 85149834820 scopus 로고    scopus 로고
    • Markov games as a framework for multi-agent reinforcement learning
    • New Brunswick, NJ, Jul. 10-13, pp
    • M. L. Littman, "Markov games as a framework for multi-agent reinforcement learning," in Proc. 11th Int. Conf. Mach. Learn. (ICML-94), New Brunswick, NJ, Jul. 10-13, pp. 157-163.
    • Proc. 11th Int. Conf. Mach. Learn. (ICML-94) , pp. 157-163
    • Littman, M.L.1
  • 40
    • 0000929496 scopus 로고    scopus 로고
    • Multiagent reinforcement learning: Theoretical framework and an algorithm
    • Madison, WI, Jul. 24-27, pp
    • J. Hu and M. P. Wellman, "Multiagent reinforcement learning: Theoretical framework and an algorithm," in Proc. 15th Int. Conf. Mach. Learn. (ICML-98), Madison, WI, Jul. 24-27, pp. 242-250.
    • Proc. 15th Int. Conf. Mach. Learn. (ICML-98) , pp. 242-250
    • Hu, J.1    Wellman, M.P.2
  • 41
    • 0012286079 scopus 로고    scopus 로고
    • An algorithm for distributed reinforcement learning in cooperative multi-agent systems
    • Stanford Univ, Stanford, CA, Jun. 29-Jul. 2, pp
    • M. Lauer and M. Riedmiller, "An algorithm for distributed reinforcement learning in cooperative multi-agent systems," in Proc. 17th Int. Conf. Mach. Learn. (ICML-00), Stanford Univ., Stanford, CA, Jun. 29-Jul. 2, pp. 535-542.
    • Proc. 17th Int. Conf. Mach. Learn. (ICML-00) , pp. 535-542
    • Lauer, M.1    Riedmiller, M.2
  • 43
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • T. Jaakkola, M. I. Jordan, and S. P. Singh, "On the convergence of stochastic iterative dynamic programming algorithms," Neural Comput. vol. 6, no. 6, pp. 1185-1201, 1994.
    • (1994) Neural Comput , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 44
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q-learning," Mach. Learn., vol. 16, no. 1, pp. 185-202, 1994.
    • (1994) Mach. Learn , vol.16 , Issue.1 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 46
    • 12244304892 scopus 로고    scopus 로고
    • Non-communicative multi-robot coordination in dynamic environment
    • J. R. Kok, M. T. J. Spaan, and N. Vlassis, "Non-communicative multi-robot coordination in dynamic environment," Robot. Auton. Syst., vol. 50, no. 2-3, pp. 99-114, 2005.
    • (2005) Robot. Auton. Syst , vol.50 , Issue.2-3 , pp. 99-114
    • Kok, J.R.1    Spaan, M.T.J.2    Vlassis, N.3
  • 48
    • 37249017092 scopus 로고    scopus 로고
    • Using the max-plus algorithm for multiagent decision making in coordination graphs
    • Robot Soccer World Cup IX RoboCup 2005, Osaka, Japan, Jul. 13-19
    • J. R. Kok and N. Vlassis, "Using the max-plus algorithm for multiagent decision making in coordination graphs," in Robot Soccer World Cup IX (RoboCup 2005). Lecture Notes in Computer Science, vol. 4020, Osaka, Japan, Jul. 13-19, 2005.
    • (2005) Lecture Notes in Computer Science , vol.4020
    • Kok, J.R.1    Vlassis, N.2
  • 49
    • 33745586802 scopus 로고    scopus 로고
    • Structural abstraction experiments in reinforcement learning
    • Proc. 18th Aust. Joint Conf. Artif. Intell, AI-05, Sydney, Australia, Dec. 5-9, pp
    • R. Fitch, B. Hengst, D. Sue, G. Calbert, and J. B. Scholz, "Structural abstraction experiments in reinforcement learning," in Proc. 18th Aust. Joint Conf. Artif. Intell. (AI-05 , Lecture Notes in Computer Science, vol. 3809, Sydney, Australia, Dec. 5-9, pp. 164-175.
    • Lecture Notes in Computer Science , vol.3809 , pp. 164-175
    • Fitch, R.1    Hengst, B.2    Sue, D.3    Calbert, G.4    Scholz, J.B.5
  • 51
    • 85152198941 scopus 로고    scopus 로고
    • Multi-agent reinforcement learning: Independent vs. cooperative agents
    • Amherst, OH, Jun. 27-29, pp
    • M. Tan, "Multi-agent reinforcement learning: Independent vs. cooperative agents," in Proc. 10th Int. Conf. Mach. Learn. (ICML-93) Amherst, OH, Jun. 27-29, pp. 330-337.
    • Proc. 10th Int. Conf. Mach. Learn. (ICML-93) , pp. 330-337
    • Tan, M.1
  • 53
    • 27344432348 scopus 로고    scopus 로고
    • Accelerating reinforcement learning through implicit imitation
    • B. Price and C. Boutilier, "Accelerating reinforcement learning through implicit imitation," J. Artif. Intell. Res., vol. 19, pp. 569-629, 2003.
    • (2003) J. Artif. Intell. Res , vol.19 , pp. 569-629
    • Price, B.1    Boutilier, C.2
  • 54
    • 4644369748 scopus 로고    scopus 로고
    • Nash Q-learning for general-sum stochastic games
    • J. Hu and M. P. Wellman, "Nash Q-learning for general-sum stochastic games," J. Mach. Learn. Res., vol. 4, pp. 1039-1069, 2003.
    • (2003) J. Mach. Learn. Res , vol.4 , pp. 1039-1069
    • Hu, J.1    Wellman, M.P.2
  • 55
    • 84898936075 scopus 로고    scopus 로고
    • New criteria and a new algorithm for learning in multi-agent systems
    • Vancouver, BC, Canada, Dec. 13-18
    • R. Powers and Y. Shoham, "New criteria and a new algorithm for learning in multi-agent systems," in Proc. Adv. Neural Inf. Process. Syst. (NIPS-04), Vancouver, BC, Canada, Dec. 13-18, vol. 17, pp. 1089-1096.
    • Proc. Adv. Neural Inf. Process. Syst. (NIPS-04) , vol.17 , pp. 1089-1096
    • Powers, R.1    Shoham, Y.2
  • 56
    • 84880865940 scopus 로고    scopus 로고
    • Rational and convergent learning in stochastic games
    • San Francisco, CA, Aug. 4-10
    • M. Bowling and M. Veloso, "Rational and convergent learning in stochastic games," in Proc. 17th Int. Conf. Artif. Intell. (IJCAI-01), San Francisco, CA, Aug. 4-10, 2001, pp. 1021-1026.
    • (2001) Proc. 17th Int. Conf. Artif. Intell. (IJCAI-01) , pp. 1021-1026
    • Bowling, M.1    Veloso, M.2
  • 57
    • 84899027977 scopus 로고    scopus 로고
    • Convergence and no-regret in multiagent learning
    • Vancouver, BC, Canada, Dec. 13-18
    • M. Bowling, "Convergence and no-regret in multiagent learning," in Proc. Adv. Neural Inf. Process. Syst. (NIPS-04 , Vancouver, BC, Canada, Dec. 13-18, vol. 17, pp. 209-216.
    • Proc. Adv. Neural Inf. Process. Syst. (NIPS-04 , vol.17 , pp. 209-216
    • Bowling, M.1
  • 58
    • 40949100375 scopus 로고    scopus 로고
    • G. Chalkiadakis. (2003, Mar.). Multiagent reinforcement learning: Stochastic games with multiple learning players, Dept. of Comput. Sci., Univ. Toronto. Toronto, ON, Canada, Tech. Rep. [Online]. Available: http://www.cs.toronto.edu/~gehalk/DepthReport/DepthReport.ps
    • G. Chalkiadakis. (2003, Mar.). Multiagent reinforcement learning: Stochastic games with multiple learning players, Dept. of Comput. Sci., Univ. Toronto. Toronto, ON, Canada, Tech. Rep. [Online]. Available: http://www.cs.toronto.edu/~gehalk/DepthReport/DepthReport.ps
  • 60
    • 1942421183 scopus 로고    scopus 로고
    • AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
    • Washington, DC, Aug. 21-24, pp
    • V. Conitzer and T. Sandholm, "AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents," in Proc. 20th Int. Conf. Mach. Learn. (ICML-03), Washington, DC, Aug. 21-24, pp. 83-90.
    • Proc. 20th Int. Conf. Mach. Learn. (ICML-03) , pp. 83-90
    • Conitzer, V.1    Sandholm, T.2
  • 61
    • 22944447799 scopus 로고    scopus 로고
    • Multiagent learning in the presence of agents with limitations,
    • Ph.D. dissertation, Dept. Comput. Sci, Carnegie Mellon Univ, Pittsburgh, PA, May
    • M. Bowling, "Multiagent learning in the presence of agents with limitations," Ph.D. dissertation, Dept. Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, May 2003.
    • (2003)
    • Bowling, M.1
  • 64
    • 27744448185 scopus 로고    scopus 로고
    • Reinforcement learning to play an optimal Nash equilibrium in team Markov games
    • Vancouver, BC, Canada, Dec. 9-14
    • X. Wang and T. Sandholm, "Reinforcement learning to play an optimal Nash equilibrium in team Markov games," in Proc. Adv. Neural Inf. Process. Syst. (NIPS-02), Vancouver, BC, Canada, Dec. 9-14, vol. 15, pp. 1571-1578.
    • Proc. Adv. Neural Inf. Process. Syst. (NIPS-02) , vol.15 , pp. 1571-1578
    • Wang, X.1    Sandholm, T.2
  • 65
    • 0002672918 scopus 로고
    • Iterative solutions of games by fictitious play
    • T. C. Koopmans, Ed. New York: Wiley, ch. XXIV, pp
    • G. W. Brown, "Iterative solutions of games by fictitious play," in Activitiy Analysis of Production and Allocation, T. C. Koopmans, Ed. New York: Wiley, 1951, ch. XXIV, pp. 374-376.
    • (1951) Activitiy Analysis of Production and Allocation , pp. 374-376
    • Brown, G.W.1
  • 66
  • 67
    • 1942484421 scopus 로고
    • Online convex programming and generalized infinitesimal gradient ascent
    • Washington, DC, Aug. 21
    • M. Zinkevich, "Online convex programming and generalized infinitesimal gradient ascent," in Proc. 20th Int. Conf. Mach. Learn. (ICML-03) Washington, DC, Aug. 21 24, pp. 928-936.
    • (1924) Proc. 20th Int. Conf. Mach. Learn. (ICML-03) , pp. 928-936
    • Zinkevich, M.1
  • 68
    • 84898941549 scopus 로고    scopus 로고
    • Extending Q-Iearning to general adaptive multi-agent systems
    • Vancouver, BC, Canada, Dec. 8-13
    • G. Tesauro, "Extending Q-Iearning to general adaptive multi-agent systems," in Proc. Adv. Neural Inf. Process. Syst. (NIPS-03), Vancouver, BC, Canada, Dec. 8-13, vol. 16.
    • Proc. Adv. Neural Inf. Process. Syst. (NIPS-03) , vol.16
    • Tesauro, G.1
  • 69
  • 70
    • 0030647149 scopus 로고    scopus 로고
    • Reinforcement learning in the multi-robot domain
    • M. J. Matarić, "Reinforcement learning in the multi-robot domain," Auton. Robots, vol. 4, no. 1, pp. 73-83, 1997.
    • (1997) Auton. Robots , vol.4 , Issue.1 , pp. 73-83
    • Matarić, M.J.1
  • 71
    • 85156187730 scopus 로고    scopus 로고
    • Improving elevator performance using reinforcement learning
    • Denver, CO, Nov. 27-30
    • R. H. Crites and A. G. Barto, "Improving elevator performance using reinforcement learning," in Proc. Adv. Neural Inf. Process. Syst. (NIPS-95), Denver, CO, Nov. 27-30, 1996, vol. 8, pp. 1017-1023.
    • (1996) Proc. Adv. Neural Inf. Process. Syst. (NIPS-95) , vol.8 , pp. 1017-1023
    • Crites, R.H.1    Barto, A.G.2
  • 76
    • 2642545776 scopus 로고    scopus 로고
    • Opponent modeling in multi-agent systems
    • G. Weiss and S. Sen, Eds. New York: Springer-Verlag, ch. 3, pp
    • D. Carmel and S. Markovitch, "Opponent modeling in multi-agent systems," in Adaptation and Learning in Multi-Agent Systems, G. Weiss and S. Sen, Eds. New York: Springer-Verlag, 1996, ch. 3, pp. 40-52.
    • (1996) Adaptation and Learning in Multi-Agent Systems , pp. 40-52
    • Carmel, D.1    Markovitch, S.2
  • 77
    • 40949139243 scopus 로고    scopus 로고
    • Apr.). Adversarial reinforcement learning, School Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA
    • Tech. Rep, Onlilne, Available
    • W. T. Uther and M. Veloso. (1997, Apr.). Adversarial reinforcement learning, School Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. [Onlilne]. Available: http://www.cs.cmu.edu/afs/cs/user/will/ www/papers/Uther97a.ps
    • (1997)
    • Uther, W.T.1    Veloso, M.2
  • 78
    • 1142292938 scopus 로고    scopus 로고
    • The communicative multiagent team decision problem: Analyzing teamwork theories and models
    • D. V. Pynadath and M. Tambe, "The communicative multiagent team decision problem: Analyzing teamwork theories and models," J. Artif. Intell. Res., vol. 16, pp. 389-423, 2002.
    • (2002) J. Artif. Intell. Res , vol.16 , pp. 389-423
    • Pynadath, D.V.1    Tambe, M.2
  • 79
    • 40949144431 scopus 로고    scopus 로고
    • M. T. J. Spaan, N. Vlassis, and F. C. A. Groen, High level coordination of agents based on multiagent Markov decision processes with roles, in Proc. Workshop Coop. Robot., 2002 IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS-02), Lausanne, Switzerland, Oct. 1, pp. 66-73.
    • M. T. J. Spaan, N. Vlassis, and F. C. A. Groen, "High level coordination of agents based on multiagent Markov decision processes with roles," in Proc. Workshop Coop. Robot., 2002 IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS-02), Lausanne, Switzerland, Oct. 1, pp. 66-73.
  • 80
    • 0002500351 scopus 로고    scopus 로고
    • Planning, learning and coordination in multiagent decision processes
    • De Zeeuwse Stromen, The Netherlands, Mar. 17-20, pp
    • C. Boutilier, "Planning, learning and coordination in multiagent decision processes," in Proc. 6th Conf. Theor. Aspects Rationality Knowl. (TARK-96), De Zeeuwse Stromen, The Netherlands, Mar. 17-20, pp. 195-210.
    • Proc. 6th Conf. Theor. Aspects Rationality Knowl. (TARK-96) , pp. 195-210
    • Boutilier, C.1
  • 82
    • 0031701693 scopus 로고    scopus 로고
    • Learning organizational roles for negotiated search in a multiagent system
    • M. V. Nagendra Prasad, V. R. Lesser, and S. E. Lander, "Learning organizational roles for negotiated search in a multiagent system," Int. J. Hum. Comput. Stud., vol. 48, no. 1, pp. 51-67, 1998.
    • (1998) Int. J. Hum. Comput. Stud , vol.48 , Issue.1 , pp. 51-67
    • Nagendra Prasad, M.V.1    Lesser, V.R.2    Lander, S.E.3
  • 85
    • 84957895797 scopus 로고    scopus 로고
    • Reward functions for accelerated learning
    • New Brunswick, NJ, Jul. 10-13, pp
    • M. J. Matarić, "Reward functions for accelerated learning," in Proc. 11th Int. Conf. Mach. Learn. (ICML-94), New Brunswick, NJ, Jul. 10-13, pp. 181-189.
    • Proc. 11th Int. Conf. Mach. Learn. (ICML-94) , pp. 181-189
    • Matarić, M.J.1
  • 86
    • 84949949419 scopus 로고    scopus 로고
    • Learning in multi-robot systems
    • G. Weiss and S. Sen, Eds. New York: Springer-Verlag, ch. 10, pp
    • M. J. Matarić, "Learning in multi-robot systems," in Adaptation and Learning in Multi-Agent Systems, G. Weiss and S. Sen, Eds. New York: Springer-Verlag, 1996, ch. 10, pp. 152-163.
    • (1996) Adaptation and Learning in Multi-Agent Systems , pp. 152-163
    • Matarić, M.J.1
  • 87
    • 31344450384 scopus 로고    scopus 로고
    • An evolutionary dynamical analysis of multi-agent learning in iterated games
    • K. Tuyls, P. J. 't Hoen, and B. Vanschoenwinkel, "An evolutionary dynamical analysis of multi-agent learning in iterated games," Auton. Agents Multi-Agent Syst., vol. 12, no. 1, pp. 115-153, 2006.
    • (2006) Auton. Agents Multi-Agent Syst , vol.12 , Issue.1 , pp. 115-153
    • Tuyls, K.1    't Hoen, P.J.2    Vanschoenwinkel, B.3
  • 88
    • 0003091684 scopus 로고    scopus 로고
    • Convergence problems of general-sum multiagent reinforcement learning
    • Stanford Univ, Stanford, CA, Jun. 29-Jul. 2, pp
    • M. Bowling, "Convergence problems of general-sum multiagent reinforcement learning," in Proc. 17th Int. Conf. Mach. Learn. (ICML-00), Stanford Univ., Stanford, CA, Jun. 29-Jul. 2, pp. 89-94.
    • Proc. 17th Int. Conf. Mach. Learn. (ICML-00) , pp. 89-94
    • Bowling, M.1
  • 91
    • 36249019659 scopus 로고    scopus 로고
    • Multi-agent reinforcement learning for traffic light control
    • Stanford Univ, Stanford, CA, Jun. 29-Jul. 2, pp
    • M. Wiering, "Multi-agent reinforcement learning for traffic light control," in Proc. 17th Int. Conf. Mach. Learn. (ICML-00), Stanford Univ., Stanford, CA, Jun. 29-Jul. 2, pp. 1151-1158.
    • Proc. 17th Int. Conf. Mach. Learn. (ICML-00) , pp. 1151-1158
    • Wiering, M.1
  • 93
    • 84864835333 scopus 로고    scopus 로고
    • Reinforcement learning for cooperating and communicating reactive agents in electrical power grids
    • M. Hannebauer, J. Wendler, and E. Pagello, Eds. New York: Springer
    • M. A. Riedmiller, A. W. Moore, and J. G. Schneider, "Reinforcement learning for cooperating and communicating reactive agents in electrical power grids," in Balancing Reactivity and Social Deliberation in Multi-Agent Systems, M. Hannebauer, J. Wendler, and E. Pagello, Eds. New York: Springer, 2000, pp. 137-149.
    • (2000) Balancing Reactivity and Social Deliberation in Multi-Agent Systems , pp. 137-149
    • Riedmiller, M.A.1    Moore, A.W.2    Schneider, J.G.3
  • 94
    • 0003481349 scopus 로고    scopus 로고
    • Robot awareness in cooperative mobile robot learning
    • C. F. Touzet, "Robot awareness in cooperative mobile robot learning," Auton. Robots, vol. 8, no. 1, pp. 87-97, 2000.
    • (2000) Auton. Robots , vol.8 , Issue.1 , pp. 87-97
    • Touzet, C.F.1
  • 95
    • 5644261272 scopus 로고    scopus 로고
    • Learning in large cooperative multirobot systems
    • F. Fernández and L. E. Parker, "Learning in large cooperative multirobot systems," Int. J. Robot. Autom., vol. 16, no. 4, pp. 217-226, 2001.
    • (2001) Int. J. Robot. Autom , vol.16 , Issue.4 , pp. 217-226
    • Fernández, F.1    Parker, L.E.2
  • 96
    • 0037843409 scopus 로고    scopus 로고
    • An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning
    • Y. Ishiwaka, T. Sato, and Y. Kakazu, "An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning," Robot. Auton. Syst., vol. 43, no. 4, pp. 245-256, 2003.
    • (2003) Robot. Auton. Syst , vol.43 , Issue.4 , pp. 245-256
    • Ishiwaka, Y.1    Sato, T.2    Kakazu, Y.3
  • 97
    • 0032645144 scopus 로고    scopus 로고
    • Team-partitioned, opaque-transition reinforcement learning
    • Seattle, WA, May 1-5, pp
    • P. Stone and M. Veloso, "Team-partitioned, opaque-transition reinforcement learning," in Proc. 3rd Int. Conf. Auton. Agents (Agents-99), Seattle, WA, May 1-5, pp. 206-212.
    • Proc. 3rd Int. Conf. Auton. Agents (Agents-99) , pp. 206-212
    • Stone, P.1    Veloso, M.2
  • 98
    • 0345073177 scopus 로고    scopus 로고
    • Reinforcement learning soccer teams with incomplete world models
    • M. Wiering, R. Salustowicz, and J. Schmidhuber, "Reinforcement learning soccer teams with incomplete world models," Auton. Robots, vol. 7, no. 1, pp. 77-88, 1999.
    • (1999) Auton. Robots , vol.7 , Issue.1 , pp. 77-88
    • Wiering, M.1    Salustowicz, R.2    Schmidhuber, J.3
  • 99
    • 40949146053 scopus 로고    scopus 로고
    • Q-Learning in simulated robotic soccer - large state spaces and incomplete information
    • Las Vegas, NV, Jun. 24-27, pp
    • K. Tuyls, S. Maes, and B. Manderick, "Q-Learning in simulated robotic soccer - large state spaces and incomplete information," in Proc. 2002 Int. Conf. Mach. Learn. Appl. (ICMLA-02), Las Vegas, NV, Jun. 24-27, pp. 226-232.
    • Proc. 2002 Int. Conf. Mach. Learn. Appl. (ICMLA-02) , pp. 226-232
    • Tuyls, K.1    Maes, S.2    Manderick, B.3
  • 100
    • 84867463287 scopus 로고    scopus 로고
    • Karlsruhe brainstormers - A reinforcement learning approach to robotic soccer
    • Robot Soccer World Cup V RoboCup 2001, Washington, DC, Aug. 2-10, pp
    • A. Merke and M. A. Riedmiller, "Karlsruhe brainstormers - A reinforcement learning approach to robotic soccer," in Robot Soccer World Cup V (RoboCup 2001). Lecture Notes in Computer Science, vol. 2377, Washington, DC, Aug. 2-10, pp. 435-440.
    • Lecture Notes in Computer Science , vol.2377 , pp. 435-440
    • Merke, A.1    Riedmiller, M.A.2
  • 102
    • 34248683404 scopus 로고    scopus 로고
    • Market performance of adaptive trading agents in synchronous double auctions
    • Proc. 4th Pacific Rim Int. Workshop Multi-Agents. Intell. Agents: Specification Model. Appl, PRIMA-01, Taipei, Taiwan, R.O.C, Jul. 28-29, pp
    • W.-T. Hsu and V.-W. Soo, "Market performance of adaptive trading agents in synchronous double auctions," in Proc. 4th Pacific Rim Int. Workshop Multi-Agents. Intell. Agents: Specification Model. Appl. (PRIMA-01). Lecture Notes in Computer Science Series, vol. 2132, Taipei, Taiwan, R.O.C., Jul. 28-29, pp. 108-121.
    • Lecture Notes in Computer Science Series , vol.2132 , pp. 108-121
    • Hsu, W.-T.1    Soo, V.-W.2
  • 103
    • 84949746648 scopus 로고    scopus 로고
    • A multi-agent Q-Learning framework for optimizing stock trading systems
    • Proc. 13th Int. Conf. Database Expert Syst. Appl, DEXA-02, Aixen-Provence, France, Sep. 2-6, pp
    • J. W. Lee and J. Oo, "A multi-agent Q-Learning framework for optimizing stock trading systems," in Proc. 13th Int. Conf. Database Expert Syst. Appl. (DEXA-02). Lecture Notes in Computer Science, vol. 2453, Aixen-Provence, France, Sep. 2-6, pp. 153-162.
    • Lecture Notes in Computer Science , vol.2453 , pp. 153-162
    • Lee, J.W.1    Oo, J.2
  • 104
    • 23044457299 scopus 로고    scopus 로고
    • Stock trading system using reinforcement learning with cooperative agents
    • Sydney, Australia, Jul. 8-12, pp
    • J. Oo, J. W. Lee, and B.-T. Zhang, "Stock trading system using reinforcement learning with cooperative agents," in Proc. 19th Int. Conf. Mach. Learn. (ICML-02), Sydney, Australia, Jul. 8-12, pp. 451-458.
    • Proc. 19th Int. Conf. Mach. Learn. (ICML-02) , pp. 451-458
    • Oo, J.1    Lee, J.W.2    Zhang, B.-T.3
  • 105
    • 0036274424 scopus 로고    scopus 로고
    • Pricing in agent economies using multiagent Q-Learning
    • G. Tesauro and J. O. Kephart, "Pricing in agent economies using multiagent Q-Learning," Auton. Agents Multi-Agent Syst., vol. 5, no. 3, pp. 289-304, 2002.
    • (2002) Auton. Agents Multi-Agent Syst , vol.5 , Issue.3 , pp. 289-304
    • Tesauro, G.1    Kephart, J.O.2
  • 106
    • 84944045450 scopus 로고    scopus 로고
    • Reinforcement learning applications in dynamic pricing of retail markets
    • Newport Beach, CA, Jun. 24-27, pp
    • C. Raju, Y. Narahari, and K. Ravikumar, "Reinforcement learning applications in dynamic pricing of retail markets," in Proc. 2003 IEEE Int. Conf. E-Commerce (CEC-03), Newport Beach, CA, Jun. 24-27, pp. 339-346.
    • Proc. 2003 IEEE Int. Conf. E-Commerce (CEC-03) , pp. 339-346
    • Raju, C.1    Narahari, Y.2    Ravikumar, K.3
  • 107
    • 0001624494 scopus 로고
    • Adaptive load balancing: A study in multi-agent learning
    • A. Schaerf, Y. Shoham, and M. Tennenholtz, "Adaptive load balancing: A study in multi-agent learning," J. Artif. Intell. Res., vol. 2, pp. 475-500, 1995.
    • (1995) J. Artif. Intell. Res , vol.2 , pp. 475-500
    • Schaerf, A.1    Shoham, Y.2    Tennenholtz, M.3
  • 108
    • 0000719863 scopus 로고    scopus 로고
    • Packet routing in dynamically changing networks: A reinforcement learning approach
    • Denver, CO, Nov. 29-Dec. 2
    • J. A. Boyan and M. L. Littman, "Packet routing in dynamically changing networks: A reinforcement learning approach," in Proc. Adv. Neural Inf. Process. Syst. (NIPS-93), Denver, CO, Nov. 29-Dec. 2, vol. 6, pp. 671-678.
    • Proc. Adv. Neural Inf. Process. Syst. (NIPS-93) , vol.6 , pp. 671-678
    • Boyan, J.A.1    Littman, M.L.2
  • 109
    • 85156238953 scopus 로고    scopus 로고
    • Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control
    • Denver, CO, Nov. 27-30
    • S. P. M. Choi and D.-Y. Yeung, "Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control," in Proc. Adv. Neural Inf. Process. Syst. (NIPS-95), Denver, CO, Nov. 27-30, vol. 8, pp. 945-951.
    • Proc. Adv. Neural Inf. Process. Syst. (NIPS-95) , vol.8 , pp. 945-951
    • Choi, S.P.M.1    Yeung, D.-Y.2
  • 110
    • 2042544751 scopus 로고    scopus 로고
    • Multi-agent learning for routing control within an Internet environment
    • P. Tillotson, Q. Wu, and P. Hughes, "Multi-agent learning for routing control within an Internet environment," Eng. Appl. Artif. Intell. vol. 17, no. 2, pp. 179-185, 2004.
    • (2004) Eng. Appl. Artif. Intell , vol.17 , Issue.2 , pp. 179-185
    • Tillotson, P.1    Wu, Q.2    Hughes, P.3
  • 112
    • 84880694195 scopus 로고    scopus 로고
    • Stable function approximation in dynamic programming
    • Tahoe City, CA, Jul. 9-12, pp
    • G. Gordon, "Stable function approximation in dynamic programming," in Proc. 12th Int. Conf. Mach. Learn. (ICML-95 , Tahoe City, CA, Jul. 9-12, pp. 261-268.
    • Proc. 12th Int. Conf. Mach. Learn. (ICML-95 , pp. 261-268
    • Gordon, G.1
  • 113
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • J. N. Tsitsiklis and B. Van Roy, "Feature-based methods for large scale dynamic programming," Mach. Learn., vol. 22, no. 1-3, pp. 59-94, 1996.
    • (1996) Mach. Learn , vol.22 , Issue.1-3 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 114
    • 0036832953 scopus 로고    scopus 로고
    • Variable-resolution discretization in optimal control
    • R. Munos and A. Moore, "Variable-resolution discretization in optimal control," Mach. Learn., vol. 49, no. 2-3, pp. 291-323, 2002.
    • (2002) Mach. Learn , vol.49 , Issue.2-3 , pp. 291-323
    • Munos, R.1    Moore, A.2
  • 115
    • 40949107944 scopus 로고    scopus 로고
    • p-norm for approximate value iteration
    • p-norm for approximate value iteration," SIAM J. Control Optim., vol. 46, no. 2. pp. 546-561, 2007.
    • (2007) SIAM J. Control Optim , vol.46 , Issue.2 , pp. 546-561
    • Munos, R.1
  • 116
    • 31844456754 scopus 로고    scopus 로고
    • Finite time bounds for sampling based fitted value iteration
    • Bonn, Germany, Aug. 7-11, pp
    • C. Szepesvári and R. Munos, "Finite time bounds for sampling based fitted value iteration," in Proc. 22nd Int. Conf. Mach. Learn. (ICML-05), Bonn, Germany, Aug. 7-11, pp. 880-887.
    • Proc. 22nd Int. Conf. Mach. Learn. (ICML-05) , pp. 880-887
    • Szepesvári, C.1    Munos, R.2
  • 117
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal difference learning with function approximation
    • May
    • J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal difference learning with function approximation," IEEE Trans. Autom. Control vol. 42, no. 5, pp. 674-690, May 1997.
    • (1997) IEEE Trans. Autom. Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 118
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • D. Ormoneit and S. Sen, "Kernel-based reinforcement learning," Mach. Learn., vol. 49, no. 2-3, pp. 161-178, 2002.
    • (2002) Mach. Learn , vol.49 , Issue.2-3 , pp. 161-178
    • Ormoneit, D.1    Sen, S.2
  • 120
    • 21844465127 scopus 로고    scopus 로고
    • Tree-based batch mode reinforcement learning
    • D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," J. Mach. Learn. Res., vol. 6, pp. 503-556, 2005.
    • (2005) J. Mach. Learn. Res , vol.6 , pp. 503-556
    • Ernst, D.1    Geurts, P.2    Wehenkel, L.3
  • 121
    • 4644323293 scopus 로고    scopus 로고
    • Least-squares policy iteration
    • M. G. Lagoudakis and R. Parr, "Least-squares policy iteration," J. Mach. Learn. Res., vol. 4, pp. 1107-1149, 2003.
    • (2003) J. Mach. Learn. Res , vol.4 , pp. 1107-1149
    • Lagoudakis, M.G.1    Parr, R.2
  • 122
    • 0035312760 scopus 로고    scopus 로고
    • Relational reinforcement learning
    • S. Dzěroski, L. D. Raedt, and K. Driessens, "Relational reinforcement learning," Mach. Learn., vol. 43, no. 1-2, pp. 7-52, 2001.
    • (2001) Mach. Learn , vol.43 , Issue.1-2 , pp. 7-52
    • Dzěroski, S.1    Raedt, L.D.2    Driessens, K.3
  • 123
    • 0034313638 scopus 로고    scopus 로고
    • Multiagent reinforcement learning using function approximation
    • Nov
    • O. Abul, F. Polar, and R. Alhajj, "Multiagent reinforcement learning using function approximation," IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 4, no. 4, pp. 485-497, Nov. 2000.
    • (2000) IEEE Trans. Syst., Man, Cybern. C, Appl. Rev , vol.4 , Issue.4 , pp. 485-497
    • Abul, O.1    Polar, F.2    Alhajj, R.3
  • 125
    • 40949126042 scopus 로고    scopus 로고
    • Multiagent reinforcement learning applied to a chase problem in a continuous world
    • H. Tamakoshi and S. Ishii, "Multiagent reinforcement learning applied to a chase problem in a continuous world," Artif. Life Robot., vol. 5, no. 4, pp. 202-206, 2001.
    • (2001) Artif. Life Robot , vol.5 , Issue.4 , pp. 202-206
    • Tamakoshi, H.1    Ishii, S.2
  • 126
    • 0010276944 scopus 로고    scopus 로고
    • Implicit imitation in multiagent reinforcement learning
    • Bled, Slovenia, Jun. 27-30, pp
    • B. Price and C. Boutilier, "Implicit imitation in multiagent reinforcement learning," in Proc. 16th Int. Conf. Mach. Learn. (ICML-99), Bled, Slovenia, Jun. 27-30, pp. 325-334.
    • Proc. 16th Int. Conf. Mach. Learn. (ICML-99) , pp. 325-334
    • Price, B.1    Boutilier, C.2
  • 127
    • 34548099216 scopus 로고    scopus 로고
    • Shaping multi-agent systems with gradient reinforcement learning
    • O. Buffet, A. Dutech, and F. Charpillet, "Shaping multi-agent systems with gradient reinforcement learning," Auton. Agents Multi-Agent Syst., vol. 15, no. 2, pp. 197-220, 2007.
    • (2007) Auton. Agents Multi-Agent Syst , vol.15 , Issue.2 , pp. 197-220
    • Buffet, O.1    Dutech, A.2    Charpillet, F.3
  • 129
    • 0000494894 scopus 로고
    • Computationally feasible bounds for partially observed Markov decision processes
    • W. S. Lovejoy, "Computationally feasible bounds for partially observed Markov decision processes," Oper. Res., vol. 39, no. 1, pp. 162-175, 1991.
    • (1991) Oper. Res , vol.39 , Issue.1 , pp. 162-175
    • Lovejoy, W.S.1
  • 130
    • 21244489639 scopus 로고    scopus 로고
    • A reinforcement learning scheme for a partially-observable multi-agent game
    • S. Ishii, H. Fujita, M. Mitsmake, T. Yamazaki, J. Matsuda, and Y. Matsuno, "A reinforcement learning scheme for a partially-observable multi-agent game," Mach. Learn., vol. 59, no. 1-2, pp. 31-54, 2005.
    • (2005) Mach. Learn , vol.59 , Issue.1-2 , pp. 31-54
    • Ishii, S.1    Fujita, H.2    Mitsmake, M.3    Yamazaki, T.4    Matsuda, J.5    Matsuno, Y.6
  • 132
    • 23144455713 scopus 로고    scopus 로고
    • Learning in multiagent systems: An introduction from a game-theoretic perspective
    • Adaptive Agents, E. Alonso, Ed. New York: Springer-Verlag
    • J. M. Vidal, "Learning in multiagent systems: An introduction from a game-theoretic perspective," in Adaptive Agents. Lecture Notes in Artificial Intelligence, vol. 2636, E. Alonso, Ed. New York: Springer-Verlag, 2003, pp. 202-215.
    • (2003) Lecture Notes in Artificial Intelligence , vol.2636 , pp. 202-215
    • Vidal, J.M.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.