메뉴 건너뛰기




Volumn 45, Issue 3, 2016, Pages 299-332

Exponential moving average based multiagent reinforcement learning algorithms

Author keywords

Markov decision processes; Multi agent learning systems; Nash equilibrium; Reinforcement learning

Indexed keywords

ALGORITHMS; COMPUTATION THEORY; GAME THEORY; ITERATIVE METHODS; MARKOV PROCESSES; MULTI AGENT SYSTEMS; REINFORCEMENT LEARNING; SOFTWARE AGENTS; STOCHASTIC SYSTEMS; TELECOMMUNICATION NETWORKS;

EID: 84957842278     PISSN: 02692821     EISSN: 15737462     Source Type: Journal    
DOI: 10.1007/s10462-015-9447-5     Document Type: Article
Times cited : (15)

References (48)
  • 1
    • 70350699723 scopus 로고    scopus 로고
    • A multiagent reinforcement learning algorithm with non-linear dynamics
    • Abdallah S, Lesser V (2008) A multiagent reinforcement learning algorithm with non-linear dynamics. J Artif Intell Res 33:521–549
    • (2008) J Artif Intell Res , vol.33 , pp. 521-549
    • Abdallah, S.1    Lesser, V.2
  • 2
    • 84891544020 scopus 로고    scopus 로고
    • Exponential moving average Q-learning algorithm. In: Adaptive dynamic programming and reinforcement learning (ADPRL), 2013 IEEE symposium on, IEEE, pp 31–38
    • Awheda MD, Schwartz HM (2013) Exponential moving average Q-learning algorithm. In: Adaptive dynamic programming and reinforcement learning (ADPRL), 2013 IEEE symposium on, IEEE, pp 31–38. IEEE
    • (2013) IEEE
    • Awheda, M.D.1    Schwartz, H.M.2
  • 3
    • 85119225337 scopus 로고    scopus 로고
    • Awheda MD, Schwartz HM (2015) The residual gradient FACL algorithm for differential games. In Electrical and computer engineering (CCECE). 2015 IEEE 28th Canadian conference on, IEEE, pp 1006–1011. IEEE
    • Awheda MD, Schwartz HM (2015) The residual gradient FACL algorithm for differential games. In Electrical and computer engineering (CCECE). 2015 IEEE 28th Canadian conference on, IEEE, pp 1006–1011. IEEE
  • 4
    • 35248823118 scopus 로고    scopus 로고
    • Generalized multiagent learning with performance bound
    • Banerjee B, Peng J (2007) Generalized multiagent learning with performance bound. Auton Agents Multi-Agent Syst 15(3):281–312
    • (2007) Auton Agents Multi-Agent Syst , vol.15 , Issue.3 , pp. 281-312
    • Banerjee, B.1    Peng, J.2
  • 5
    • 85012688561 scopus 로고
    • Princeton University Press, Princeton
    • Bellman R (1957) Dynamic programming. Princeton University Press, Princeton
    • (1957) Dynamic programming
    • Bellman, R.1
  • 6
    • 84899027977 scopus 로고    scopus 로고
    • Convergence and no-regret in multiagent learning
    • Bowling M (2005) Convergence and no-regret in multiagent learning. Adv Neural Inf Process Syst 17:209–216
    • (2005) Adv Neural Inf Process Syst , vol.17 , pp. 209-216
    • Bowling, M.1
  • 7
    • 84957858286 scopus 로고    scopus 로고
    • Convergence of gradient dynamics with a variable learning rate. In: ICML
    • Bowling M, Veloso M (2001a) Convergence of gradient dynamics with a variable learning rate. In: ICML, pp 27–34
    • (2001) pp 27–34
    • Bowling, M.1    Veloso, M.2
  • 8
    • 84880865940 scopus 로고    scopus 로고
    • Rational and convergent learning in stochastic games. In: International joint conference on artificial intelligence, vol. 17. Lawrence Erlbaum Associates Ltd
    • Bowling M, Veloso M (2001b) Rational and convergent learning in stochastic games. In: International joint conference on artificial intelligence, vol. 17. Lawrence Erlbaum Associates Ltd, pp 1021–1026
    • (2001) pp 1021–1026
    • Bowling, M.1    Veloso, M.2
  • 9
    • 0036531878 scopus 로고    scopus 로고
    • Multiagent learning using a variable learning rate
    • Bowling M, Veloso M (2002) Multiagent learning using a variable learning rate. Artif Intell 136(2):215–250
    • (2002) Artif Intell , vol.136 , Issue.2 , pp. 215-250
    • Bowling, M.1    Veloso, M.2
  • 10
    • 70350566689 scopus 로고    scopus 로고
    • Effective learning in the presence of adaptive counterparts
    • Burkov A, Chaib-draa B (2009) Effective learning in the presence of adaptive counterparts. J Algorithms 64(4):127–138
    • (2009) J Algorithms , vol.64 , Issue.4 , pp. 127-138
    • Burkov, A.1    Chaib-draa, B.2
  • 11
    • 34547192059 scopus 로고    scopus 로고
    • Multi-agent reinforcement learning: A survey. In: Control, automation, robotics and vision, 2006. ICARCV’06. 9th international conference on, IEEE, pp 1–6
    • Busoniu L, Babuska R, De Schutter B (2006) Multi-agent reinforcement learning: A survey. In: Control, automation, robotics and vision, 2006. ICARCV’06. 9th international conference on, IEEE, pp 1–6. IEEE
    • (2006) IEEE
    • Busoniu, L.1    Babuska, R.2    De Schutter, B.3
  • 13
    • 0031630561 scopus 로고    scopus 로고
    • The dynamics of reinforcement learning in cooperative multiagent systems. In: AAAI/IAAI
    • Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. In: AAAI/IAAI, pp 746–752
    • (1998) pp 746–752
    • Claus, C.1    Boutilier, C.2
  • 14
    • 34147159616 scopus 로고    scopus 로고
    • Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
    • Conitzer V, Sandholm T (2007) Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Mach Learn 67(1–2):23–43
    • (2007) Mach Learn , vol.67 , Issue.1-2 , pp. 23-43
    • Conitzer, V.1    Sandholm, T.2
  • 15
    • 27744536933 scopus 로고    scopus 로고
    • An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control
    • Dai X, Li C-K, Rad AB (2005) An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control. Intell Transp Syst, IEEE Trans 6(3):285–293
    • (2005) Intell Transp Syst, IEEE Trans , vol.6 , Issue.3 , pp. 285-293
    • Dai, X.1    Li, C.-K.2    Rad, A.B.3
  • 18
    • 84923229149 scopus 로고    scopus 로고
    • Optimal adaptive control and differential games by reinforcement learning principles
    • Dixon W (2014) Optimal adaptive control and differential games by reinforcement learning principles. J Guid Control Dyn 37(3):1048–1049
    • (2014) J Guid Control Dyn , vol.37 , Issue.3 , pp. 1048-1049
    • Dixon, W.1
  • 19
    • 84880861539 scopus 로고    scopus 로고
    • Predicting and preventing coordination problems in cooperative Q-learning systems
    • Fulda N, Ventura D (2007) Predicting and preventing coordination problems in cooperative Q-learning systems. In: IJCAI, vol. 2007, pp 780–785
    • (2007) IJCAI , vol.2007 , pp. 780-785
    • Fulda, N.1    Ventura, D.2
  • 20
    • 1542334432 scopus 로고    scopus 로고
    • Learning obstacle avoidance with an operant behavior model
    • Gutnisky DA, Zanutto BS (2004) Learning obstacle avoidance with an operant behavior model. Artif Life 10(1):65–81
    • (2004) Artif Life , vol.10 , Issue.1 , pp. 65-81
    • Gutnisky, D.A.1    Zanutto, B.S.2
  • 21
    • 79551653988 scopus 로고    scopus 로고
    • Systems control with generalized probabilistic fuzzy-reinforcement learning
    • Hinojosa W, Nefti S, Kaymak U (2011) Systems control with generalized probabilistic fuzzy-reinforcement learning. Fuzzy Syst, IEEE Trans 19(1):51–64
    • (2011) Fuzzy Syst, IEEE Trans , vol.19 , Issue.1 , pp. 51-64
    • Hinojosa, W.1    Nefti, S.2    Kaymak, U.3
  • 23
    • 4644369748 scopus 로고    scopus 로고
    • Nash Q-learning for general-sum stochastic games
    • Hu J, Wellman MP (2003) Nash Q-learning for general-sum stochastic games. J Mach Learn Res 4:1039–1069
    • (2003) J Mach Learn Res , vol.4 , pp. 1039-1069
    • Hu, J.1    Wellman, M.P.2
  • 24
    • 84957858290 scopus 로고    scopus 로고
    • Multiagent reinforcement learning: theoretical framework and an algorithm. In: ICML, vol. 98, Citeseer
    • Hu J, Wellman MP, et al (1998) Multiagent reinforcement learning: theoretical framework and an algorithm. In: ICML, vol. 98, Citeseer, pp 242–250
    • (1998) pp 242–250
    • Hu, J.1    Wellman, M.P.2
  • 26
    • 0742289960 scopus 로고    scopus 로고
    • A reinforcement learning with evolutionary state recruitment strategy for autonomous mobile robots control
    • Kondo T, Ito K (2004) A reinforcement learning with evolutionary state recruitment strategy for autonomous mobile robots control. Robot Auton Syst 46(2):111–124
    • (2004) Robot Auton Syst , vol.46 , Issue.2 , pp. 111-124
    • Kondo, T.1    Ito, K.2
  • 27
    • 84988290534 scopus 로고    scopus 로고
    • Data-based suboptimal neuro-control design with reinforcement learning for dissipative spatially distributed processes
    • Luo B, Wu H-N, Li H-X (2014a) Data-based suboptimal neuro-control design with reinforcement learning for dissipative spatially distributed processes. Ind Eng Chem Res 53(19):8106–8119
    • (2014) Ind Eng Chem Res , vol.53 , Issue.19 , pp. 8106-8119
    • Luo, B.1    Wu, H.-N.2    Li, H.-X.3
  • 28
    • 84919448289 scopus 로고    scopus 로고
    • Data-based approximate policy iteration for nonlinear continuous-time optimal control design
    • Luo B, Wu H-N, Huang T, Liu D (2014b) Data-based approximate policy iteration for nonlinear continuous-time optimal control design. Automatica 50(12):3281–3290
    • (2014) Automatica , vol.50 , Issue.12 , pp. 3281-3290
    • Luo, B.1    Wu, H.-N.2    Huang, T.3    Liu, D.4
  • 29
    • 84919730591 scopus 로고    scopus 로고
    • Off-policy reinforcement learning for (Formula presented.) control design
    • Luo B, Wu H-N, Huang T (2015a) Off-policy reinforcement learning for (Formula presented.) control design. Cybern, IEEE Trans 45(1):65–76
    • (2015) Cybern, IEEE Trans , vol.45 , Issue.1 , pp. 65-76
    • Luo, B.1    Wu, H.-N.2    Huang, T.3
  • 30
    • 84925883034 scopus 로고    scopus 로고
    • Adaptive optimal control of highly dissipative nonlinear spatially distributed processes with neuro-dynamic programming
    • Luo B, Wu H-N, Li H-X (2015b) Adaptive optimal control of highly dissipative nonlinear spatially distributed processes with neuro-dynamic programming. Neural Netw Learn Syst, IEEE Trans 26(4):684–696
    • (2015) Neural Netw Learn Syst, IEEE Trans , vol.26 , Issue.4 , pp. 684-696
    • Luo, B.1    Wu, H.-N.2    Li, H.-X.3
  • 31
    • 85027939867 scopus 로고    scopus 로고
    • Data-driven (Formula presented.) control for nonlinear distributed parameter systems. Neural Netw Learn Syst
    • Luo B, Huang T, Wu H-N, Yang X (2015c) Data-driven (Formula presented.)∞ control for nonlinear distributed parameter systems. Neural Netw Learn Syst, IEEE Trans 26(11):2949–2961
    • (2015) IEEE Trans , vol.26 , Issue.11 , pp. 2949-2961
    • Luo, B.1    Huang, T.2    Wu, H.-N.3    Yang, X.4
  • 32
    • 84941097144 scopus 로고    scopus 로고
    • Reinforcement learning solution for HJB equation arising in constrained optimal control problem
    • Luo B, Wu H-N, Huang T, Liu D (2015d) Reinforcement learning solution for HJB equation arising in constrained optimal control problem. Neural Netw 71:150–158
    • (2015) Neural Netw , vol.71 , pp. 150-158
    • Luo, B.1    Wu, H.-N.2    Huang, T.3    Liu, D.4
  • 33
    • 84893708995 scopus 로고    scopus 로고
    • Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems
    • Modares H, Lewis FL, Naghibi-Sistani M-B (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202
    • (2014) Automatica , vol.50 , Issue.1 , pp. 193-202
    • Modares, H.1    Lewis, F.L.2    Naghibi-Sistani, M.-B.3
  • 36
    • 0028555752 scopus 로고
    • Learning to coordinate without sharing information. In: AAAI
    • Sen S, Sekaran M, Hale J (1994) Learning to coordinate without sharing information. In: AAAI, pp 426–431
    • (1994) pp 426–431
    • Sen, S.1    Sekaran, M.2    Hale, J.3
  • 37
    • 84957858292 scopus 로고    scopus 로고
    • Nash convergence of gradient dynamics in general-sum games. In: Proceedings of the sixteenth conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc
    • Singh S, Kearns M, Mansour Y (2000) Nash convergence of gradient dynamics in general-sum games. In: Proceedings of the sixteenth conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., pp 541–548
    • (2000) pp 541–548
    • Singh, S.1    Kearns, M.2    Mansour, Y.3
  • 38
    • 0036058423 scopus 로고    scopus 로고
    • Effective reinforcement learning for mobile robots. In: Robotics and automation. Proceedings. ICRA’02. IEEE international conference on, vol. 4, IEEE, 2002, pp. 3404–3410
    • Smart WD, Kaelbling LP (2002) Effective reinforcement learning for mobile robots. In: Robotics and automation. Proceedings. ICRA’02. IEEE international conference on, vol. 4, IEEE, 2002, pp. 3404–3410. IEEE
    • (2002) IEEE
    • Smart, W.D.1    Kaelbling, L.P.2
  • 40
    • 85152198941 scopus 로고
    • Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning
    • Tan M (1993) Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330–337
    • (1993) pp 330–337
    • Tan, M.1
  • 41
    • 84898941549 scopus 로고    scopus 로고
    • Extending Q-learning to general adaptive multi-agent systems. In: Advances in neural information processing systems, vol. 16. MIT press
    • Tesauro G (2004) Extending Q-learning to general adaptive multi-agent systems. In: Advances in neural information processing systems, vol. 16. MIT press, pp 871–878
    • (2004) pp 871–878
    • Tesauro, G.1
  • 43
    • 34249833101 scopus 로고
    • Q-learning
    • Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
    • (1992) Mach Learn , vol.8 , Issue.3-4 , pp. 279-292
    • Watkins, C.J.1    Dayan, P.2
  • 45
    • 84957846862 scopus 로고    scopus 로고
    • Multiagent systems: a modern approach to distributed artificial intelligence
    • Weiss G (1999) Multiagent systems: a modern approach to distributed artificial intelligence. MIT Press
    • (1999) MIT Press
    • Weiss, G.1
  • 46
    • 84876909440 scopus 로고    scopus 로고
    • Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear control
    • Wu H-N, Luo B (2012) Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear control. Neural Netw Learn Syst, IEEE Trans 23(12):1884–1895
    • (2012) Neural Netw Learn Syst, IEEE Trans , vol.23 , Issue.12 , pp. 1884-1895
    • Wu, H.-N.1    Luo, B.2
  • 47
    • 0037278069 scopus 로고    scopus 로고
    • A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance
    • Ye C, Yung NH, Wang D (2003) A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance. Syst Man Cybern Part B: Cybern, IEEE Trans 33(1):17–27
    • (2003) Syst Man Cybern Part B: Cybern, IEEE Trans , vol.33 , Issue.1 , pp. 17-27
    • Ye, C.1    Yung, N.H.2    Wang, D.3
  • 48
    • 85099723578 scopus 로고    scopus 로고
    • Multi-agent learning with policy prediction
    • Zhang C, Lesser VR (2010) Multi-agent learning with policy prediction. In: AAAI
    • (2010) In: AAAI
    • Zhang, C.1    Lesser, V.R.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.