메뉴 건너뛰기




Volumn , Issue , 2005, Pages 292-299

A survey on multiagent reinforcement learning towards multi-robot systems

Author keywords

[No Author keywords available]

Indexed keywords

MULTI-AGENT REINFORCEMENT LEARNING; MULTI-ROBOT SYSTEMS; SCALING-UP; THEORETICAL RESEARCH;

EID: 65149099581     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (16)

References (63)
  • 1
    • 0030674885 scopus 로고    scopus 로고
    • Cooperative mobile robotics: Antecedents and directions
    • Y. U. Cao, A. S. Fukunaga, and A. B. Kahng, "Cooperative mobile robotics: antecedents and directions, " Auton. Robot., vol. 4, pp. 1-23, 1997.
    • (1997) Auton. Robot. , vol.4 , pp. 1-23
    • Cao, Y.U.1    Fukunaga, A.S.2    Kahng, A.B.3
  • 2
    • 0030647149 scopus 로고    scopus 로고
    • Reinforcement learning in the multi-robot domain
    • M. J. Matarić, "Reinforcement learning in the multi-robot domain, " Auton. Robots, vol. 4, pp. 73-83, 1997. (Pubitemid 127508276)
    • (1997) Autonomous Robots , vol.4 , Issue.1 , pp. 73-83
    • Mataric, M.J.1
  • 3
    • 0032117054 scopus 로고    scopus 로고
    • Learning from history for behavior-based mobile robots in non-stationary conditions
    • F. Michaud and M. Matarić, "Learning from history for behavior-based mobile robots in non-stationary conditions, " Auton. Robots, vol. 5, pp. 335-354, 1998. (Pubitemid 128512026)
    • (1998) Autonomous Robots , vol.5 , Issue.3-4 , pp. 335-354
    • Michaud, F.1    Mataric, M.J.2
  • 4
    • 0032308533 scopus 로고    scopus 로고
    • Behavior-based formation control for multirobot teams
    • PII S1042296X98094464
    • T. Balch and R. C. Arkin, "Behavior-based formation control for multirobot teams, " IEEE Trans. Robot. Automat., vol. 14, no. 6, pp. 926-939, 1998. (Pubitemid 128743571)
    • (1998) IEEE Transactions on Robotics and Automation , vol.14 , Issue.6 , pp. 926-939
    • Balch, T.1    Arkin, R.C.2
  • 5
    • 0033148990 scopus 로고    scopus 로고
    • Co-operative behaviour acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development
    • M. Asada, E. Uchibe, and K. Hosoda, "Co-operative behaviour acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development, " Art. Intel., vol. 110, pp. 275-292, 1999.
    • (1999) Art. Intel , vol.110 , pp. 275-292
    • Asada, M.1    Uchibe, E.2    Hosoda, K.3
  • 6
    • 0345073177 scopus 로고    scopus 로고
    • Reinforcement learning soccer teams with incomplete world models
    • M.Wiering, R. Salustowicz, and J. Schmidhuber, "Reinforcement learning soccer teams with incomplete world models, " Auton. Robots, vol. 7, pp. 77-88, 1999.
    • (1999) Auton. Robots , vol.7 , pp. 77-88
    • Wiering, M.1    Salustowicz, R.2    Schmidhuber, J.3
  • 8
    • 0003481349 scopus 로고    scopus 로고
    • Robot awareness in cooperative mobile robot learning
    • C. F. Touzet, "Robot awareness in cooperative mobile robot learning, " Auton. Robots, vol. 8, pp. 87-97, 2000.
    • (2000) Auton. Robots , vol.8 , pp. 87-97
    • Touzet, C.F.1
  • 11
    • 0002797521 scopus 로고    scopus 로고
    • Learning in behavior-based multi-robot systems: Policies, models, and other agents
    • PII S1389041701000171
    • M. J. Matarić, "Learning in behavior-based multi-robot systems: policies, models, and other agents, " J. Cogn. Syst. Res., vol. 2, pp. 81-93, 2001. (Pubitemid 33718552)
    • (2001) Cognitive Systems Research , vol.2 , Issue.1 , pp. 81-93
    • Mataric, M.J.1
  • 13
    • 0344465292 scopus 로고    scopus 로고
    • Design and analysis of internet-based tele-coordinated multi-robot systems
    • I. H. Elhajj, A. Goradia, N. Xi, and et al, "Design and analysis of internet-based tele-coordinated multi-robot systems, " Auton. Robots, vol. 15, pp. 237-254, 2003.
    • (2003) Auton. Robots , vol.15 , pp. 237-254
    • Elhajj, I.H.1    Goradia, A.2    Xi, N.3
  • 14
    • 0141973893 scopus 로고    scopus 로고
    • Distributed coordination in heterogeneous multi-robot systems
    • L. Iocchi, D. Nardi, M. Piaggio, and et al, "Distributed coordination in heterogeneous multi-robot systems, " Auton. Robots, vol. 15, pp. 155-168, 2003.
    • (2003) Auton. Robots , vol.15 , pp. 155-168
    • Iocchi, L.1    Nardi, D.2    Piaggio, M.3
  • 15
    • 0037338242 scopus 로고    scopus 로고
    • Multi-robot task allocation in uncertain environments
    • M. J. Mataric, G. S. Sukhatme, and E. H. Østergaard, "Multi-robot task allocation in uncertain environments, " Auton. Robots, vol. 14, pp. 255-263, 2003.
    • (2003) Auton. Robots , vol.14 , pp. 255-263
    • Mataric, M.J.1    Sukhatme, G.S.2    Østergaard, E.H.3
  • 16
    • 33745834823 scopus 로고    scopus 로고
    • Distributed lazy Q-learning for cooperative mobile robots
    • C. F. Touzet, "Distributed lazy Q-learning for cooperative mobile robots, " Int. J. Advanced Robot. Syst., vol. 1, no. 1, pp. 5-13, 2004.
    • (2004) Int. J. Advanced Robot. Syst. , vol.1 , Issue.1 , pp. 5-13
    • Touzet, C.F.1
  • 17
    • 85149834820 scopus 로고
    • Markov games as a framework for multi-agent learning
    • San Francisco
    • M. L. Littman, "Markov games as a framework for multi-agent learning, " in Proc. 11th Int. Conf. Machine Learning, San Francisco, 1994, pp. 157-163.
    • (1994) Proc. 11th Int. Conf. Machine Learning , pp. 157-163
    • Littman, M.L.1
  • 18
    • 0001961616 scopus 로고    scopus 로고
    • A generalized reinforcement-learning model: Convergence and applications
    • Bari, Italy, July 3-6
    • M. L. Littman and C. Szepesvári, "A generalized reinforcement-learning model: convergence and applications, " in Proc. 13th Int. Conf. Machine Learning, Bari, Italy, July 3-6 1996, pp. 310-318.
    • (1996) Proc. 13th Int. Conf. Machine Learning , pp. 310-318
    • Littman, M.L.1    Szepesvári, C.2
  • 19
    • 0003629453 scopus 로고    scopus 로고
    • Generalized markov decision processes: Dynamic-programming and reinforcement-learning algorithms
    • Department of Computer Science
    • C. Szepesvári and M. L. Littman, "Generalized markov decision processes: dynamic-programming and reinforcement-learning algorithms, " Department of Computer Science, Brown University, Technical report CS-96-11, 1996.
    • (1996) Brown University, Technical report CS-96-11
    • Szepesvári, C.1    Littman, M.L.2
  • 20
    • 0031630561 scopus 로고    scopus 로고
    • The dynamics of reinforcement learning in coopertive multiagent systems
    • Madison, WI
    • C. Claus and C. Boutilier, "The dynamics of reinforcement learning in coopertive multiagent systems, " in Proc. 15th National Conf. Artificial Intelligence, Madison, WI, 1998, pp. 746-752.
    • (1998) Proc. 15th National Conf. Artificial Intelligence , pp. 746-752
    • Claus, C.1    Boutilier, C.2
  • 22
    • 84943276328 scopus 로고    scopus 로고
    • Rationality assumptions and optimality of co-learning
    • Springer
    • R. Sun and D. Qi, "Rationality assumptions and optimality of co-learning, " Lecture Notes in Computer Science, vol. 1881. Springer, 2000, pp. 61-75.
    • (2000) Lecture Notes in Computer Science , vol.1881 , pp. 61-75
    • Sun, R.1    Qi, D.2
  • 23
    • 0033570798 scopus 로고    scopus 로고
    • A unified analysis of value-function-based reinforcement learning algorithms
    • C. Szepesvári and M. L. Littman, "A unified analysis of value-function-based reinforcement learning algorithms, " Neur. Comput., vol. 11, no. 8, pp. 2017- 2059, 1999.
    • (1999) Neur. Comput , vol.11 , Issue.8 , pp. 2017-2059
    • Szepesvári, C.1    Littman, M.L.2
  • 24
    • 0034205975 scopus 로고    scopus 로고
    • Multiagent systems: A survey from a machine learning perspective
    • P. Stone and M. Veloso, "Multiagent systems: a survey from a machine learning perspective, " Auton. Robots, vol. 8, pp. 345-383, 2000.
    • (2000) Auton. Robots , vol.8 , pp. 345-383
    • Stone, P.1    Veloso, M.2
  • 25
    • 0002550841 scopus 로고    scopus 로고
    • Learning about other agents in a dynamic multiagent system
    • PII S138904170100016X
    • J. Hu and M. P.Wellman, "Learning about other agents in a dynamic multiagent system, " J. Cogn. Syst. Res., vol. 2, pp. 67-79, 2001. (Pubitemid 33718551)
    • (2001) Cognitive Systems Research , vol.2 , Issue.1 , pp. 67-79
    • Hu, J.1    Weliman, M.P.2
  • 26
    • 0001547175 scopus 로고    scopus 로고
    • Value-function reinforcement learning in Markov games
    • PII S1389041701000158
    • M. L. Littman, "Value-function reinforcement learning in markov games, " J. Cogn. Syst. Res., vol. 2, pp. 55-66, 2001. (Pubitemid 33718550)
    • (2001) Cognitive Systems Research , vol.2 , Issue.1 , pp. 55-66
    • Littman, M.L.1
  • 28
    • 0242466944 scopus 로고    scopus 로고
    • Friend-or-foe Q-learning in general-sum games
    • Morgan Kaufman
    • M. L. Littman, "Friend-or-foe Q-learning in general-sum games, " in Proc. 18th Int. Conf. Machine Learning, Morgan Kaufman, 2001, pp. 322-328.
    • (2001) Proc. 18th Int. Conf. Machine Learning , pp. 322-328
    • Littman, M.L.1
  • 29
    • 0036778915 scopus 로고    scopus 로고
    • The lagging anchor algorithm: Reinforcement learning in two-player zero-sum games with imperfect information
    • DOI 10.1023/A:1014063505958
    • F. A. Dahl, "The lagging anchor algorithm: reinforcement learning in twoplayer zero-sum games with imperfect information, " Mach. Learn., vol. 49, pp. 5-37, 2002. (Pubitemid 34325693)
    • (2002) Machine Learning , vol.49 , Issue.1 , pp. 5-37
    • Dahl, F.A.1
  • 31
    • 65149097468 scopus 로고    scopus 로고
    • Multiagent reinforcement learning: Stochastic games with multiple learning players
    • Department of Computer Science
    • G. Chalkiadakis, "Multiagent reinforcement learning: stochastic games with multiple learning players, " Department of Computer Science, University of Toronto, " Technical report, 2003.
    • (2003) University of Toronto, Technical report
    • Chalkiadakis, G.1
  • 32
    • 0036355732 scopus 로고    scopus 로고
    • A multiagent reinforcement learning algorithm using extended optimal response
    • N. Suematsu and A. Hayashi, "A multiagent reinforcement learning algorithm using extended optimal response, " in Proc. 1st Int. Joint Conf. Auton. Agents & Multiagent Syst., Bologna, Italy, July 15-19 2002, pp. 370-377. (Pubitemid 34975488)
    • (2002) Proceedings of the International Conference on Autonomous Agents , Issue.2 , pp. 370-377
    • Suematsu, N.1    Hayashi, A.2
  • 33
    • 1142280924 scopus 로고    scopus 로고
    • Multiagent reinforcement learning: Theoretical framework and an algorithm
    • Melbourne, Australia, July 14-18
    • G. Chalkiadakis and C. Boutilier, "Multiagent reinforcement learning: theoretical framework and an algorithm, " in 2nd Int. Joint Conf. Auton. Agents & Multiagent Syst., Melbourne, Australia, July 14-18 2003, pp. 709-716.
    • (2003) 2nd Int. Joint Conf. Auton. Agents & Multiagent Syst. , pp. 709-716
    • Chalkiadakis, G.1    Boutilier, C.2
  • 34
    • 4644369748 scopus 로고    scopus 로고
    • Nash Q-learning for general-sum stochastic games
    • J. Hu and M. P.Wellman, "Nash Q-learning for general-sum stochastic games, " J. Mach. Learn. Res., vol. 4, pp. 1039-1069, 2003.
    • (2003) J. Mach. Learn. Res. , vol.4 , pp. 1039-1069
    • Hu, J.1    Wellman, M.P.2
  • 43
    • 0000929496 scopus 로고    scopus 로고
    • Multiagent reinforcement learning: Theoretical framework and an algorithm
    • San Francisco, California
    • J. Hu and M. P. Wellman, "Multiagent reinforcement learning: theoretical framework and an algorithm, " in Proc. the 15th Int. Conf. Machine Learning, San Francisco, California, 1998, pp. 242-250.
    • (1998) Proc. The 15th Int. Conf. Machine Learning , pp. 242-250
    • Hu, J.1    Wellman, M.P.2
  • 45
    • 0036531878 scopus 로고    scopus 로고
    • Multiagent learning using a variable learning rate
    • M. H. Bowling and M. M. Veloso, "Multiagent learning using a variable learning rate, " Art. Intell., vol. 136, no. 2, pp. 215-250, 2002.
    • (2002) Art. Intell. , vol.136 , Issue.2 , pp. 215-250
    • Bowling, M.H.1    Veloso, M.M.2
  • 46
    • 1142305722 scopus 로고    scopus 로고
    • Convergent gradient ascent in general-sum games
    • August 13-19
    • B. Banerjee and J. Peng, "Convergent gradient ascent in general-sum games, " in Proc. 13th Europ. Conf. Mach. Learn., August 13-19 2002, pp. 686-692.
    • (2002) Proc. 13th Europ. Conf. Mach. Learn. , pp. 686-692
    • Banerjee, B.1    Peng, J.2
  • 48
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • MIT Press
    • R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation, " in Advanc. Neur. Inf. Proc. Syst.. MIT Press, 12, pp. 1057-1063.
    • Advanc. Neur. Inf. Proc. Syst. , vol.12 , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 49
    • 4444326434 scopus 로고    scopus 로고
    • Scaling up reinforcement learning with a relational representation
    • Sydney
    • E. F. Morales, "Scaling up reinforcement learning with a relational representation, " in Workshop Adaptabil. Multi-Agent Syst., Sydney, 2003.
    • (2003) Workshop Adaptabil. Multi-Agent Syst.
    • Morales, E.F.1
  • 53
    • 0035558808 scopus 로고    scopus 로고
    • KaBaGe-RL: Kanerva-based generalisation and reinforcement learning for possession football
    • Hawaii
    • K. Kostiadis and H. Hu, "KaBaGe-RL: kanerva-based generalisation and reinforcement learning for possession football, " in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Hawaii, 2001.
    • (2001) Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems
    • Kostiadis, K.1    Hu, H.2
  • 54
    • 0035978635 scopus 로고    scopus 로고
    • Modular Q-learning based multi-agent cooperation for robot soccer
    • DOI 10.1016/S0921-8890(01)00114-2, PII S0921889001001142
    • K.-H. Park, Y.-J. Kim, and J.-H. Kim, "Modular Q-learning based multi-agent cooperation for robot soccer, " Robot. Auton. Syst., vol. 35, pp. 109-122, 2001. (Pubitemid 32427408)
    • (2001) Robotics and Autonomous Systems , vol.35 , Issue.2 , pp. 109-122
    • Park, K.-H.1    Kim, Y.-J.2    Kim, J.-H.3
  • 55
    • 0033280134 scopus 로고    scopus 로고
    • Cooperation and coordination between fuzzy reinforcement learning agents in continuous-state partially observable markov decision processes
    • H. R. Berenji and D. A. Vengerov, "Cooperation and coordination between fuzzy reinforcement learning agents in continuous-state partially observable markov decision processes, " in Proc. 8th IEEE Int. Conf. Fuzzy Systems, 2000.
    • (2000) Proc. 8th IEEE Int. Conf. Fuzzy Systems
    • Berenji, H.R.1    Vengerov, D.A.2
  • 56
    • 0033685787 scopus 로고    scopus 로고
    • Advantages of cooperation between reinforcement learning agents in difficult stochastic problems
    • H. R. Berenji and D. A. Vengerov, "Advantages of cooperation between reinforcement learning agents in difficult stochastic problems, " in Proc. 9th IEEE Int. Conf. Fuzzy Systems, 2000.
    • (2000) Proc. 9th IEEE Int. Conf. Fuzzy Systems
    • Berenji, H.R.1    Vengerov, D.A.2
  • 58
    • 80053654761 scopus 로고    scopus 로고
    • Modular-fuzzy cooperative algorithm for multiagent systems
    • Springer
    • I. Gültekin and A. Arslan, "Modular-fuzzy cooperative algorithm for multiagent systems, " Lecture Notes in Computer Science, vol. 2457. Springer, 2002, pp. 255-263.
    • (2002) Lecture Notes in Computer Science , vol.2457 , pp. 255-263
    • Gültekin, I.1    Arslan, A.2
  • 59
    • 80053654030 scopus 로고    scopus 로고
    • Minimax fuzzy Q-learning in cooperative multi-agent systems
    • Springer
    • A. Kilic and A. Arslan, "Minimax fuzzy Q-learning in cooperative multi-agent systems, " Lecture Notes in Computer Science, vol. 2457. Springer, 2002, pp. 264-272.
    • (2002) Lecture Notes in Computer Science , vol.2457 , pp. 264-272
    • Kilic, A.1    Arslan, A.2
  • 60
    • 0041877717 scopus 로고    scopus 로고
    • A convergent actor-critic-based frl algorithm with application to power management of wireless transmitters
    • August
    • H. R. Berenji and D. Vengerov, "A convergent Actor-Critic-based FRL algorithm with application to power management of wireless transmitters, " IEEE Trans. Fuzz. Syst., vol. 11, no. 4, pp. 478-485, August 2003.
    • (2003) IEEE Trans. Fuzz. Syst. , vol.11 , Issue.4 , pp. 478-485
    • Berenji, H.R.1    Vengerov, D.2
  • 61
    • 0034274415 scopus 로고    scopus 로고
    • A study of reinforcement learning in the continuous case by the means of viscosity solutions
    • R. Munos, "A study of reinforcement learning in the continuous case by the means of viscosity solutions, " Mach. Learn., vol. 40, pp. 265-299, 2000.
    • (2000) Mach. Learn. , vol.40 , pp. 265-299
    • Munos, R.1
  • 63
    • 0035283402 scopus 로고    scopus 로고
    • On the convergence of temporal-difference learning with linear function approximation
    • DOI 10.1023/A:1007609817671
    • V. Tadić, "On the convergence of temporal-difference learning with linear function approximation, " Mach. Learn., vol. 42, pp. 241-267, 2001. (Pubitemid 32188797)
    • (2001) Machine Learning , vol.42 , Issue.3 , pp. 241-267
    • Tadic, V.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.