메뉴 건너뛰기




Volumn 310, Issue , 2010, Pages 183-221

Multi-agent reinforcement learning: An overview

Author keywords

[No Author keywords available]

Indexed keywords


EID: 77956317028     PISSN: 1860949X     EISSN: None     Source Type: Book Series    
DOI: 10.1007/978-3-642-14435-6_7     Document Type: Article
Times cited : (679)

References (144)
  • 7
    • 0041877717 scopus 로고    scopus 로고
    • A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters
    • Berenji, H.R., Vengerov, D.: A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters. IEEE Transactions on Fuzzy Systems 11(4), 478-485 (2003)
    • (2003) IEEE Transactions on Fuzzy Systems , vol.11 , Issue.4 , pp. 478-485
    • Berenji, H.R.1    Vengerov, D.2
  • 10
    • 13244278201 scopus 로고    scopus 로고
    • An actor-critic algorithm for constrained Markov decision processes
    • Borkar, V.: An actor-critic algorithm for constrained Markov decision processes. Systems & Control Letters 54(3), 207-213 (2005)
    • (2005) Systems & Control Letters , vol.54 , Issue.3 , pp. 207-213
    • Borkar, V.1
  • 14
    • 84899027977 scopus 로고    scopus 로고
    • Convergence and no-regret in multiagent learning
    • Saul, L.K., Weiss, Y., Bottou, L. (eds.), MIT Press, Cambridge
    • Bowling, M.: Convergence and no-regret in multiagent learning. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 209-216. MIT Press, Cambridge (2005)
    • (2005) Advances in Neural Information Processing Systems , vol.17 , pp. 209-216
    • Bowling, M.1
  • 15
    • 0003863106 scopus 로고    scopus 로고
    • An analysis of stochastic game theory for multiagent reinforcement learning
    • CarnegieMellon University, Pittsburgh, US
    • Bowling, M., Veloso, M.: An analysis of stochastic game theory for multiagent reinforcement learning. Tech. rep., Computer Science Dept., CarnegieMellon University, Pittsburgh, US (2000), http://www.cs.ualberta.ca/ ~bowling/papers/00tr.pdf
    • (2000) Tech. Rep., Computer Science Dept.
    • Bowling, M.1    Veloso, M.2
  • 17
    • 0036531878 scopus 로고    scopus 로고
    • Multiagent learning using a variable learning rate
    • Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136(2), 215-250 (2002)
    • (2002) Artificial Intelligence , vol.136 , Issue.2 , pp. 215-250
    • Bowling, M.1    Veloso, M.2
  • 18
    • 0000719863 scopus 로고
    • Packet routing in dynamically changing networks: A reinforcement learning approach
    • Moody, J. (ed.). Morgan Kaufmann, San Francisco
    • Boyan, J.A., Littman,M.L.: Packet routing in dynamically changing networks: A reinforcement learning approach. In: Moody, J. (ed.) Advances in Neural Information Processing Systems 6, pp. 671-678. Morgan Kaufmann, San Francisco (1994)
    • (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 671-678
    • Boyan, J.A.1    Littman, M.L.2
  • 19
    • 0002672918 scopus 로고
    • Iterative solutions of games by fictitious play
    • Koopmans, T.C. (ed.). ch. XXIV, Wiley, Chichester
    • Brown, G.W.: Iterative solutions of games by fictitious play. In: Koopmans, T.C. (ed.) Activitiy Analysis of Production and Allocation, ch. XXIV, pp. 374-376. Wiley, Chichester (1951)
    • (1951) Activitiy Analysis of Production and Allocation , pp. 374-376
    • Brown, G.W.1
  • 25
    • 2642545776 scopus 로고    scopus 로고
    • Opponent modeling in multi-agent systems
    • Weiß, G., Sen, S. (eds.), ch. 3, Springer, Heidelberg
    • Carmel, D., Markovitch, S.: Opponent modeling in multi-agent systems. In: Weiß, G., Sen, S. (eds.) Adaptation and Learning in Multi-Agent Systems, ch. 3, pp. 40-52. Springer, Heidelberg (1996)
    • (1996) Adaptation and Learning in Multi-Agent Systems , pp. 40-52
    • Carmel, D.1    Markovitch, S.2
  • 26
    • 65149097468 scopus 로고    scopus 로고
    • Multiagent reinforcement learning: Stochastic games with multiple learning players
    • University of Toronto, Canada
    • Chalkiadakis, G.:Multiagent reinforcement learning: Stochastic games with multiple learning players. Tech. rep., Dept. of Computer Science, University of Toronto, Canada (2003), http://www.cs.toronto.edu/~gehalk/DepthReport/ DepthReport.ps
    • (2003) Tech. Rep., Dept. of Computer Science
    • Chalkiadakis, G.1
  • 28
    • 85156238953 scopus 로고
    • Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control
    • Touretzky, D.S., Mozer, M., Hasselmo, M.E. (eds.), MIT Press, Cambridge
    • Choi, S.P.M., Yeung, D.Y.: Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control. In: Touretzky, D.S., Mozer, M., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems 8, pp. 945-951. MIT Press, Cambridge (1995)
    • (1995) Advances in Neural Information Processing Systems , vol.8 , pp. 945-951
    • Choi, S.P.M.1    Yeung, D.Y.2
  • 31
    • 1942421183 scopus 로고    scopus 로고
    • AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
    • Washington, US
    • Conitzer, V., Sandholm, T.: AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In: Proceedings 20th International Conference on Machine Learning (ICML-2003), Washington, US, pp. 83-90 (2003)
    • (2003) Proceedings 20th International Conference on Machine Learning (ICML-2003) , pp. 83-90
    • Conitzer, V.1    Sandholm, T.2
  • 32
    • 85156187730 scopus 로고    scopus 로고
    • Improving elevator performance using reinforcement learning
    • Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.), MIT Press, Cambridge
    • Crites, R.H., Barto, A.G.: Improving elevator performance using reinforcement learning. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems 8, pp. 1017-1023. MIT Press, Cambridge (1996)
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1017-1023
    • Crites, R.H.1    Barto, A.G.2
  • 33
    • 0032208335 scopus 로고    scopus 로고
    • Elevator group control using multiple reinforcement learning agents
    • Crites, R.H., Barto, A.G.: Elevator group control using multiple reinforcement learning agents. Machine Learning 33(2-3), 235-262 (1998)
    • (1998) Machine Learning , vol.33 , Issue.2-3 , pp. 235-262
    • Crites, R.H.1    Barto, A.G.2
  • 36
    • 84947918810 scopus 로고    scopus 로고
    • A game-theoretic approach to the simple coevolutionary algorithm
    • Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. Springer, Heidelberg
    • Ficici, S.G., Pollack, J.B.: A game-theoretic approach to the simple coevolutionary algorithm. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 467-476. Springer, Heidelberg (2000)
    • (2000) LNCS , vol.1917 , pp. 467-476
    • Ficici, S.G.1    Pollack, J.B.2
  • 38
    • 33745586802 scopus 로고    scopus 로고
    • Structural abstraction experiments in reinforcement learning
    • Zhang, S., Jarvis, R.A. (eds.) AI 2005. Springer, Heidelberg
    • Fitch, R., Hengst, B., Suc, D., Calbert, G., Scholz, J.B.: Structural abstraction experiments in reinforcement learning. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 164-175. Springer, Heidelberg (2005)
    • (2005) LNCS (LNAI) , vol.3809 , pp. 164-175
    • Fitch, R.1    Hengst, B.2    Suc, D.3    Calbert, G.4    Scholz, J.B.5
  • 46
    • 0032207350 scopus 로고    scopus 로고
    • Learning coordination strategies for cooperative multiagent systems
    • Ho, F., Kamel, M.: Learning coordination strategies for cooperative multiagent systems. Machine Learning 33(2-3), 155-177 (1998)
    • (1998) Machine Learning , vol.33 , Issue.2-3 , pp. 155-177
    • Ho, F.1    Kamel, M.2
  • 48
    • 34248683404 scopus 로고    scopus 로고
    • Market performance of adaptive trading agents in synchronous double auctions
    • Yuan, S.-T., Yokoo, M. (eds.) PRIMA 2001. Springer, Heidelberg
    • Hsu, W.T., Soo, V.W.: Market performance of adaptive trading agents in synchronous double auctions. In: Yuan, S.-T., Yokoo, M. (eds.) PRIMA 2001. LNCS (LNAI), vol. 2132, pp. 108-121. Springer, Heidelberg (2001)
    • (2001) LNCS (LNAI) , vol.2132 , pp. 108-121
    • Hsu, W.T.1    Soo, V.W.2
  • 50
  • 51
  • 52
    • 0037843409 scopus 로고    scopus 로고
    • An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning
    • Ishiwaka, Y., Sato, T., Kakazu, Y.: An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning. Robotics and Autonomous Systems 43(4), 245-256 (2003)
    • (2003) Robotics and Autonomous Systems , vol.43 , Issue.4 , pp. 245-256
    • Ishiwaka, Y.1    Sato, T.2    Kakazu, Y.3
  • 53
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakkola, T., Jordan, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6(6), 1185-1201 (1994)
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 61
    • 12244304892 scopus 로고    scopus 로고
    • Non-communicative multi-robot coordination in dynamic environment
    • Kok, J.R., Spaan, M.T.J., Vlassis, N.: Non-communicative multi-robot coordination in dynamic environment. Robotics and Autonomous Systems 50(2-3), 99-114 (2005)
    • (2005) Robotics and Autonomous Systems , vol.50 , Issue.2-3 , pp. 99-114
    • Kok, J.R.1    Spaan, M.T.J.2    Vlassis, N.3
  • 65
    • 35048865922 scopus 로고    scopus 로고
    • Gradient based method for symmetric and asymmetric multiagent reinforcement learning
    • Liu, J., Cheung, Y.-m., Yin, H. (eds.) IDEAL 2003. Springer, Heidelberg
    • Könönen, V.: Gradient based method for symmetric and asymmetric multiagent reinforcement learning. In: Liu, J., Cheung, Y.-m., Yin, H. (eds.) IDEAL 2003. LNCS, vol. 2690, pp. 68-75. Springer, Heidelberg (2003)
    • (2003) LNCS , vol.2690 , pp. 68-75
    • Könönen, V.1
  • 68
    • 84949746648 scopus 로고    scopus 로고
    • A multi-agent Q-learning framework for optimizing stock trading systems
    • Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. Springer, Heidelberg
    • Lee, J.-W., Jang Min, O.: A multi-agent Q-learning framework for optimizing stock trading systems. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 153-162. Springer, Heidelberg (2002)
    • (2002) LNCS , vol.2453 , pp. 153-162
    • Lee, J.-W.1    Jang Min, O.2
  • 70
    • 0001547175 scopus 로고    scopus 로고
    • Value-function reinforcement learning in Markov games
    • Littman, M.L.: Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research 2(1), 55-66 (2001)
    • (2001) Journal of Cognitive Systems Research , vol.2 , Issue.1 , pp. 55-66
    • Littman, M.L.1
  • 71
    • 80053136974 scopus 로고    scopus 로고
    • Implicit negotiation in repeated games
    • Meyer, J.-J.C., Tambe, M. (eds.) ATAL 2001. Springer, Heidelberg
    • Littman, M.L., Stone, P.: Implicit negotiation in repeated games. In: Meyer, J.-J.C., Tambe, M. (eds.) ATAL 2001. LNCS (LNAI), vol. 2333, pp. 96-105. Springer, Heidelberg (2002)
    • (2002) LNCS (LNAI) , vol.2333 , pp. 96-105
    • Littman, M.L.1    Stone, P.2
  • 72
    • 0000494894 scopus 로고
    • Computationally feasible bounds for partially observed Markov decision processes
    • Lovejoy, W.S.: Computationally feasible bounds for partially observed Markov decision processes. Operations Research 39(1), 162-175 (1991)
    • (1991) Operations Research , vol.39 , Issue.1 , pp. 162-175
    • Lovejoy, W.S.1
  • 74
    • 84949949419 scopus 로고    scopus 로고
    • Learning in multi-robot systems
    • Weiß, G., Sen, S. (eds.), ch. 10, Springer, Heidelberg
    • Mataríc, M.J.: Learning in multi-robot systems. In: Weiß, G., Sen, S. (eds.) Adaptation and Learning in Multi-Agent Systems, ch. 10, pp. 152-163. Springer, Heidelberg (1996)
    • (1996) Adaptation and Learning in Multi-Agent Systems , pp. 152-163
    • Mataríc, M.J.1
  • 75
    • 0030647149 scopus 로고    scopus 로고
    • Reinforcement learning in the multi-robot domain
    • Mataríc, M.J.: Reinforcement learning in the multi-robot domain. Autonomous Robots 4(1), 73-83 (1997)
    • (1997) Autonomous Robots , vol.4 , Issue.1 , pp. 73-83
    • Mataríc, M.J.1
  • 77
    • 84867463287 scopus 로고    scopus 로고
    • Karlsruhe brainstormers - A reinforcement learning approach to robotic soccer
    • Birk, A., Coradeschi, S., Tadokoro, S. (eds.) RoboCup 2001. Springer, Heidelberg
    • Merke, A., Riedmiller, M.A.: Karlsruhe brainstormers - A reinforcement learning approach to robotic soccer. In: Birk, A., Coradeschi, S., Tadokoro, S. (eds.) RoboCup 2001. LNCS (LNAI), vol. 2377, pp. 435-440. Springer, Heidelberg (2002)
    • (2002) LNCS (LNAI) , vol.2377 , pp. 435-440
    • Merke, A.1    Riedmiller, M.A.2
  • 79
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less time
    • Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103-130 (1993)
    • (1993) Machine Learning , vol.13 , pp. 103-130
    • Moore, A.W.1    Atkeson, C.G.2
  • 85
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • Ormoneit, D., Sen, S.: Kernel-based reinforcement learning. Machine Learning 49(2-3), 161-178 (2002)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161-178
    • Ormoneit, D.1    Sen, S.2
  • 86
    • 26444601262 scopus 로고    scopus 로고
    • Cooperative multi-agent learning: The state of the art
    • Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems 11(3), 387-434 (2005)
    • (2005) Autonomous Agents and Multi-Agent Systems , vol.11 , Issue.3 , pp. 387-434
    • Panait, L.1    Luke, S.2
  • 89
    • 0000955979 scopus 로고    scopus 로고
    • Incremental multi-step Q-learning
    • Peng, J., Williams, R.J.: Incremental multi-step Q-learning. Machine Learning 22(1-3), 283-290 (1996)
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 283-290
    • Peng, J.1    Williams, R.J.2
  • 90
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7-9), 1180-1190 (2008)
    • (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 91
    • 85027492413 scopus 로고
    • A cooperative coevolutionary approach to function optimization
    • Davidor, Y., Männer, R., Schwefel, H.-P. (eds.) PPSN 1994. Springer, Heidelberg
    • Potter, M.A., Jong, K.A.D.: A cooperative coevolutionary approach to function optimization. In: Davidor, Y., Männer, R., Schwefel, H.-P. (eds.) PPSN 1994. LNCS, vol. 866, pp. 249-257. Springer, Heidelberg (1994)
    • (1994) LNCS , vol.866 , pp. 249-257
    • Potter, M.A.1    Jong, K.A.D.2
  • 93
    • 84898936075 scopus 로고    scopus 로고
    • New criteria and a new algorithm for learning in multi-agent systems
    • Saul, L.K.,Weiss, Y., Bottou, L. (eds.), MIT Press, Cambridge
    • Powers, R., Shoham, Y.: New criteria and a new algorithm for learning in multi-agent systems. In: Saul, L.K.,Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 1089-1096. MIT Press, Cambridge (2005)
    • (2005) Advances in Neural Information Processing Systems , vol.17 , pp. 1089-1096
    • Powers, R.1    Shoham, Y.2
  • 97
    • 1142292938 scopus 로고    scopus 로고
    • The communicative multiagent team decision problem: Analyzing teamwork theories and models
    • Pynadath, D.V., Tambe, M.: The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research 16, 389-423 (2002)
    • (2002) Journal of Artificial Intelligence Research , vol.16 , pp. 389-423
    • Pynadath, D.V.1    Tambe, M.2
  • 99
    • 33646398129 scopus 로고    scopus 로고
    • Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method
    • Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. Springer, Heidelberg
    • Riedmiller, M.: Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317-328. Springer, Heidelberg (2005)
    • (2005) LNCS (LNAI) , vol.3720 , pp. 317-328
    • Riedmiller, M.1
  • 100
    • 84864835333 scopus 로고    scopus 로고
    • Reinforcement learning for cooperating and communicating reactive agents in electrical power grids
    • Hannebauer, M., Wendler, J., Pagello, E. (eds.), Springer, Heidelberg
    • Riedmiller, M.A., Moore, A.W., Schneider, J.G.: Reinforcement learning for cooperating and communicating reactive agents in electrical power grids. In: Hannebauer, M., Wendler, J., Pagello, E. (eds.) Balancing Reactivity and Social Deliberation in Multi-Agent Systems, pp. 137-149. Springer, Heidelberg (2000)
    • (2000) Balancing Reactivity and Social Deliberation in Multi-Agent Systems , pp. 137-149
    • Riedmiller, M.A.1    Moore, A.W.2    Schneider, J.G.3
  • 101
    • 0032208296 scopus 로고    scopus 로고
    • Learning team strategies: Soccer case studies
    • Salustowicz, R.,Wiering, M., Schmidhuber, J.: Learning team strategies: Soccer case studies. Machine Learning 33(2-3), 263-282 (1998)
    • (1998) Machine Learning , vol.33 , Issue.2-3 , pp. 263-282
    • Salustowicz, R.1    Wiering, M.2    Schmidhuber, J.3
  • 103
    • 0007918330 scopus 로고    scopus 로고
    • A general method for incremental self-improvement and multi-agent learning
    • Yao, X. (ed.), ch. 3. World Scientific, Singapore
    • Schmidhuber, J.: A general method for incremental self-improvement and multi-agent learning. In: Yao, X. (ed.) Evolutionary Computation: Theory and Applications, ch. 3, pp. 81-123. World Scientific, Singapore (1999)
    • (1999) Evolutionary Computation: Theory and Applications , pp. 81-123
    • Schmidhuber, J.1
  • 108
    • 34147161536 scopus 로고    scopus 로고
    • If multi-agent learning is the answer, what is the question?
    • Shoham, Y., Powers, R., Grenager, T.: If multi-agent learning is the answer, what is the question? Artificial Intelligence 171(7), 365-377 (2007)
    • (2007) Artificial Intelligence , vol.171 , Issue.7 , pp. 365-377
    • Shoham, Y.1    Powers, R.2    Grenager, T.3
  • 110
    • 85153965130 scopus 로고
    • Reinforcement learning with soft state aggregation
    • Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.), MIT Press, Cambridge
    • Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems 7, pp. 361-368. MIT Press, Cambridge (1995)
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 361-368
    • Singh, S.P.1    Jaakkola, T.2    Jordan, M.I.3
  • 115
    • 0034205975 scopus 로고    scopus 로고
    • Multiagent systems: A survey from the machine learning perspective
    • Stone, P., Veloso, M.: Multiagent systems: A survey from the machine learning perspective. Autonomous Robots 8(3), 345-383 (2000)
    • (2000) Autonomous Robots , vol.8 , Issue.3 , pp. 345-383
    • Stone, P.1    Veloso, M.2
  • 117
    • 36249003814 scopus 로고    scopus 로고
    • An agent-based decision support system for wholesale electricity markets
    • Sueyoshi, T., Tadiparthi, G.R.: An agent-based decision support system for wholesale electricity markets. Decision Support Systems 44, 425-446 (2008)
    • (2008) Decision Support Systems , vol.44 , pp. 425-446
    • Sueyoshi, T.1    Tadiparthi, G.R.2
  • 118
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • Sutton, R.S.: Learning to predict by the method of temporal differences. Machine Learning 3, 9-44 (1988)
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 119
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Austin, US
    • Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings 7th International Conference on Machine Learning (ICML-1990), Austin, US, pp. 216-224 (1990)
    • (1990) Proceedings 7th International Conference on Machine Learning (ICML-1990) , pp. 216-224
    • Sutton, R.S.1
  • 122
    • 40949126042 scopus 로고    scopus 로고
    • Multiagent reinforcement learning applied to a chase problem in a continuous world
    • Tamakoshi, H., Ishii, S.:Multiagent reinforcement learning applied to a chase problem in a continuous world. Artificial Life and Robotics 5(4), 202-206 (2001)
    • (2001) Artificial Life and Robotics , vol.5 , Issue.4 , pp. 202-206
    • Tamakoshi, H.1    Ishii, S.2
  • 124
    • 84898941549 scopus 로고    scopus 로고
    • Extending Q-learning to general adaptive multi-agent systems
    • Thrun, S., Saul, L.K., Schölkopf, B. (eds.), MIT Press, Cambridge
    • Tesauro, G.: Extending Q-learning to general adaptive multi-agent systems. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16, MIT Press, Cambridge (2004)
    • (2004) Advances in Neural Information Processing Systems , vol.16
    • Tesauro, G.1
  • 127
    • 0031341345 scopus 로고    scopus 로고
    • Neural reinforcement learning for behaviour synthesis
    • Touzet, C.F.: Neural reinforcement learning for behaviour synthesis. Robotics and Autonomous Systems 22(3-4), 251-281 (1997)
    • (1997) Robotics and Autonomous Systems , vol.22 , Issue.3-4 , pp. 251-281
    • Touzet, C.F.1
  • 128
    • 0003481349 scopus 로고    scopus 로고
    • Robot awareness in cooperative mobile robot learning
    • Touzet, C.F.: Robot awareness in cooperative mobile robot learning. Autonomous Robots 8(1), 87-97 (2000)
    • (2000) Autonomous Robots , vol.8 , Issue.1 , pp. 87-97
    • Touzet, C.F.1
  • 129
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-learning. Machine Learning 16(1), 185-202 (1994)
    • (1994) Machine Learning , vol.16 , Issue.1 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 132
    • 28544446213 scopus 로고    scopus 로고
    • Evolutionary game theory and multi-agent reinforcement learning
    • Tuyls, K., Noẃe, A.: Evolutionary game theory and multi-agent reinforcement learning. The Knowledge Engineering Review 20(1), 63-90 (2005)
    • (2005) The Knowledge Engineering Review , vol.20 , Issue.1 , pp. 63-90
    • Tuyls, K.1    Noẃe, A.2
  • 133
    • 0004196515 scopus 로고    scopus 로고
    • Adversarial reinforcement learning
    • Carnegie Mellon University, Pittsburgh, US
    • Uther, W.T., Veloso, M.: Adversarial reinforcement learning. Tech. rep., School of Computer Science, Carnegie Mellon University, Pittsburgh, US (1997), http://www.cs.cmu.edu/afs/cs/user/will/www/papers/ Uther97a.ps
    • (1997) Tech. Rep., School of Computer Science
    • Uther, W.T.1    Veloso, M.2
  • 134
    • 23144455713 scopus 로고    scopus 로고
    • Learning in multiagent systems: An introduction from a game-theoretic perspective
    • Alonso, E., Kudenko, D., Kazakov, D. (eds.) AAMAS 2000 and AAMAS 2002. Springer, Heidelberg
    • Vidal, J.M.: Learning in multiagent systems: An introduction from a game-theoretic perspective. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) AAMAS 2000 and AAMAS 2002. LNCS (LNAI), vol. 2636, pp. 202-215. Springer, Heidelberg (2003)
    • (2003) LNCS (LNAI) , vol.2636 , pp. 202-215
    • Vidal, J.M.1
  • 135
    • 52949118902 scopus 로고    scopus 로고
    • A concise introduction to multiagent systems and distributed artificial intelligence
    • Morgan & Claypool Publishers
    • Vlassis, N.: A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence. Synthesis Lectures in Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers (2007)
    • (2007) Synthesis Lectures in Artificial Intelligence and Machine Learning
    • Vlassis, N.1
  • 136
    • 27744448185 scopus 로고    scopus 로고
    • Reinforcement learning to play an optimal Nash equilibrium in team Markov games
    • Becker, S., Thrun, S., Obermayer, K. (eds.), MIT Press, Cambridge
    • Wang, X., Sandholm, T.: Reinforcement learning to play an optimal Nash equilibrium in team Markov games. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 1571-1578. MIT Press, Cambridge (2003)
    • (2003) Advances in Neural Information Processing Systems , vol.15 , pp. 1571-1578
    • Wang, X.1    Sandholm, T.2
  • 142
    • 0345073177 scopus 로고    scopus 로고
    • Reinforcement learning soccer teams with incomplete world models
    • Wiering, M., Salustowicz, R., Schmidhuber, J.: Reinforcement learning soccer teams with incomplete world models. Autonomous Robots 7(1), 77-88 (1999)
    • (1999) Autonomous Robots , vol.7 , Issue.1 , pp. 77-88
    • Wiering, M.1    Salustowicz, R.2    Schmidhuber, J.3
  • 143
    • 77956312600 scopus 로고    scopus 로고
    • Limit behavior of no-regret dynamics
    • Kyiv School of Economics, Kyiv, Ucraine
    • Zapechelnyuk, A.: Limit behavior of no-regret dynamics. Discussion Papers 21, Kyiv School of Economics, Kyiv, Ucraine (2009)
    • (2009) Discussion Papers , vol.21
    • Zapechelnyuk, A.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.