SCOPUS 정보 검색 플랫폼

Journal of Machine Learning Research

Volumn 7, Issue , 2006, Pages 1789-1828

Collaborative multiagent reinforcement learning by payoff propagation

(2) Kok, Jelle R a Vlassis, Nikos a

a UNIVERSITY OF AMSTERDAM (Netherlands)

Author keywords

Collaborative multiagent system; Coordination graph; Q learning belief propagation; Reinforcement learning

Indexed keywords

ALGORITHMS; APPROXIMATION THEORY; DECISION MAKING; GRAPH THEORY; MULTI AGENT SYSTEMS;

COLLABORATIVE MULTIAGENT SYSTEM; COORDINATION GRAPH; Q-LEARNING BELIEF PROPAGATION; REINFORCEMENT LEARNING;

LEARNING SYSTEMS;

EID: 33748543203 PISSN: 15337928 EISSN: 15337928 Source Type: Journal
DOI: None Document Type: Article

Times cited : (316)

References (57)

1
- 33644800166
- Preprocessing techniques for accelerating the DCOP algorithm ADOPT
- Utrecht, The Netherlands
- S. Muhammad Ali, S. Koenig, and M. Tambe. Preprocessing techniques for accelerating the DCOP algorithm ADOPT. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 1041-1048, Utrecht, The Netherlands, 2005.
- (2005) Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) , pp. 1041-1048
- Ali, S.M.¹ Koenig, S.² Tambe, M.³

2
- 0036817725
- Editorial: Advances in multi-robot systems
- T. Aral, E. Pagello, and L. E. Parker. Editorial: Advances in multi-robot systems. IEEE Transactions on Robotics and Automation, 18(5):665-661, 2002.
- (2002) IEEE Transactions on Robotics and Automation , vol.18 , Issue.5 , pp. 665-1661
- Aral, T.¹ Pagello, E.² Parker, L.E.³

3
- 1142293055
- Transition-independent decentralized Markov decision processes
- Melbourne, Australia
- R. Becker, S. Zilberstein, V. Lesser, and C. V. Goldman. Transition-independent decentralized Markov decision processes. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), Melbourne, Australia, 2003.
- (2003) Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS)
- Becker, R.¹ Zilberstein, S.² Lesser, V.³ Goldman, C.V.⁴

4
- 0141965747
- The complexity of decentralized control of Markov decision processes
- Stanford, CA
- D. S. Bernstein, S. Zilberstein, and N. Immerman. The complexity of decentralized control of Markov decision processes. In Proceedings of Uncertainty in Artificial Intelligence (UAI), Stanford, CA, 2000.
- (2000) Proceedings of Uncertainty in Artificial Intelligence (UAI)
- Bernstein, D.S.¹ Zilberstein, S.² Immerman, N.³

5
- 0004089406
- Academic Press
- U. Bertelé and F. Brioschi. Nonserial dynamic programming. Academic Press, 1972.
- (1972) Nonserial Dynamic Programming
- Bertelé, U.¹ Brioschi, F.²

6
- 0003487482
- Athena Scientific
- D. P. Bertsekas and J. N. Tsitsiklis. Neuro-dynamic programming. Athena Scientific, 1996.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

7
- 0002500351
- Planning, learning and coordination in multiagent decision processes
- C. Boutilier. Planning, learning and coordination in multiagent decision processes. In Proceedings of the Conference on Theoretical Aspects of Rationality and Knowledge, 1996.
- (1996) Proceedings of the Conference on Theoretical Aspects of Rationality and Knowledge
- Boutilier, C.¹

8
- 0000719863
- Packet routing in dynamically changing networks: A reinforcement learning approach
- Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors. Morgan Kaufmann Publishers, Inc.
- J. A. Boyan and M. L. Littman. Packet routing in dynamically changing networks: A reinforcement learning approach. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems (NIPS) 6, pages 671-678. Morgan Kaufmann Publishers, Inc., 1994.
- (1994) Advances in Neural Information Processing Systems (NIPS) , vol.6 , pp. 671-678
- Boyan, J.A.¹ Littman, M.L.²

9
- 0033692328
- Collaborative multi-robot exploration
- W. Burgard, M. Moors, D. Fox, R. Simmons, and S. Thrun. Collaborative multi-robot exploration. In Proceedings of the IEEE International Conference on Robotics and Automation, 2000.
- (2000) Proceedings of the IEEE International Conference on Robotics and Automation
- Burgard, W.¹ Moors, M.² Fox, D.³ Simmons, R.⁴ Thrun, S.⁵

10
- 1142280924
- Coordination in multiagent reinforcement learning: A Bayesian approach
- Melbourne, Australia. ACM Press
- G. Chalkiadakis and C. Boutilier. Coordination in multiagent reinforcement learning: A Bayesian approach. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 709-716, Melbourne, Australia, 2003. ACM Press.
- (2003) Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) , pp. 709-716
- Chalkiadakis, G.¹ Boutilier, C.²

11
- 0031630561
- The dynamics of reinforcement learning in cooperative multiagent systems
- Madison, WI
- C. Claus and C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Madison, WI, 1998.
- (1998) Proceedings of the National Conference on Artificial Intelligence (AAAI)
- Claus, C.¹ Boutilier, C.²

12
- 33846241688
- Loopy belief propagation as a basis for communication in sensor networks
- C. Crick and A. Pfeffer. Loopy belief propagation as a basis for communication in sensor networks. In Proceedings of Uncertainty in Artificial Intelligence (UAI), 2003.
- (2003) Proceedings of Uncertainty in Artificial Intelligence (UAI)
- Crick, C.¹ Pfeffer, A.²

13
- 85156187730
- Improving elevator performance using reinforcement learning
- MIT Press
- R. Crites and A. Barto. Improving elevator performance using reinforcement learning. In Advances in Neural Information Processing Systems (NIPS) 8, pages 1017-1023. MIT Press, 1996.
- (1996) Advances in Neural Information Processing Systems (NIPS) , vol.8 , pp. 1017-1023
- Crites, R.¹ Barto, A.²

14
- 57349118422
- Morgan Kaufmann
- R. Dechter. Constraint Processing. Morgan Kaufmann, 2003.
- (2003) Constraint Processing
- Dechter, R.¹

15
- 0001899541
- A scheme for approximating probabilistic inference
- R. Dechter and I. Rish. A scheme for approximating probabilistic inference. In Proceedings of Uncertainty in Artificial Intelligence (UAI), pages 132-141, 1997.
- (1997) Proceedings of Uncertainty in Artificial Intelligence (UAI) , pp. 132-141
- Dechter, R.¹ Rish, I.²

16
- 0035395660
- Scaling up agent coordination strategies
- July
- E. H. Durfee. Scaling up agent coordination strategies. IEEE Computer, 34(7):39-46, July 2001.
- (2001) IEEE Computer , vol.34 , Issue.7 , pp. 39-46
- Durfee, E.H.¹

17
- 31144432283
- Cooperative information sharing to improve distributed learning in multi-agent systems
- P. S. Dutta, N. R. Jennings, and L. Moreau. Cooperative information sharing to improve distributed learning in multi-agent systems. Journal of Artificial Intelligence Research, 24:407-463, 2005.
- (2005) Journal of Artificial Intelligence Research , vol.24 , pp. 407-463
- Dutta, P.S.¹ Jennings, N.R.² Moreau, L.³

18
- 27344449757
- Decentralized control of cooperative systems: Categorization and complexity analysis
- November
- C. Goldman and S. Zilberstein. Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research, 22:143-174, November 2004.
- (2004) Journal of Artificial Intelligence Research , vol.22 , pp. 143-174
- Goldman, C.¹ Zilberstein, S.²

19
- 1142293050
- Optimizing information exchange in cooperative multi-agent systems
- New York, NY, USA. ACM Press
- C. V. Goldman and S. Zilberstein. Optimizing information exchange in cooperative multi-agent systems. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 137-144, New York, NY, USA, 2003. ACM Press.
- (2003) Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) , pp. 137-144
- Goldman, C.V.¹ Zilberstein, S.²

20
- 84899028010
- Multiagent planning with factored MDPs
- The MIT Press
- C. Guestrin, D. Koller, and R. Parr. Multiagent planning with factored MDPs. In Advances in Neural Information Processing Systems (NIPS) 14. The MIT Press, 2002a.
- (2002) Advances in Neural Information Processing Systems (NIPS) , vol.14
- Guestrin, C.¹ Koller, D.² Parr, R.³

21
- 14344256227
- PhD thesis, Computer Science Department, Stanford University, August
- C. Guestrin. Planning under uncertainty in complex structured environments. PhD thesis, Computer Science Department, Stanford University, August 2003.
- (2003) Planning under Uncertainty in Complex Structured Environments
- Guestrin, C.¹

22
- 4544236179
- Coordinated reinforcement learning
- Sydney, Australia, July
- C. Guestrin, M. Lagoudakis, and R. Parr. Coordinated reinforcement learning. In International Conference on Machine Learning (ICML), Sydney, Australia, July 2002b.
- (2002) International Conference on Machine Learning (ICML)
- Guestrin, C.¹ Lagoudakis, M.² Parr, R.³

23
- 0036923118
- Context-specific multiagent coordination and planning with factored MDPs
- Edmonton, Canada, July
- C. Guestrin, S. Venkataraman, and D. Koller. Context-specific multiagent coordination and planning with factored MDPs. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Edmonton, Canada, July 2002c.
- (2002) Proceedings of the National Conference on Artificial Intelligence (AAAI)
- Guestrin, C.¹ Venkataraman, S.² Koller, D.³

24
- 9444233318
- Dynamic programming for partially observable stochastic games
- San Jose, CA
- E. A. Hansen, D. S. Bernstein, and S. Zilberstein. Dynamic programming for partially observable stochastic games. In Proceedings of the National Conference on Artificial Intelligence (AAAI), San Jose, CA, 2004.
- (2004) Proceedings of the National Conference on Artificial Intelligence (AAAI)
- Hansen, E.A.¹ Bernstein, D.S.² Zilberstein, S.³

25
- 0004491880
- RoboCup: The robot world cup initiative
- H. Kitano, M. Asada, Y. Kuniyoshi, I. Noda, and E. Osawa. RoboCup: The robot world cup initiative. In Proceedings of the IJCAI-95 Workshop on Entertainment and AI/AHfe, 1995.
- (1995) Proceedings of the IJCAI-95 Workshop on Entertainment and AI/AHfe
- Kitano, H.¹ Asada, M.² Kuniyoshi, Y.³ Noda, I.⁴ Osawa, E.⁵

26
- 14344250637
- Sparse cooperative Q-learning
- RUSS Greiner and Dale Schuurmans, editors, Banff, Canada, July. ACM
- J. R. Kok and N. Vlassis. Sparse cooperative Q-learning. In RUSS Greiner and Dale Schuurmans, editors, Proceedings of the International Conference on Machine Learning, pages 481-488, Banff, Canada, July 2004. ACM.
- (2004) Proceedings of the International Conference on Machine Learning , pp. 481-488
- Kok, J.R.¹ Vlassis, N.²

27
- 33748562008
- Using the max-plus algorithm for multiagent decision making in coordination graphs
- Osaka, Japan, July
- J. R. Kok and N. Vlassis. Using the max-plus algorithm for multiagent decision making in coordination graphs. In RoboCup-2005: Robot Soccer World Cup IX, Osaka, Japan, July 2005.
- (2005) RoboCup-2005: Robot Soccer World Cup IX
- Kok, J.R.¹ Vlassis, N.²

28
- 12244304892
- Non-communicative multi-robot coordination in dynamic environments
- February
- J. R. Kok, M. T. J. Spaan, and N. Vlassis. Non-communicative multi-robot coordination in dynamic environments. Robotics and Autonomous Systems, 50(2-3):99-114, February 2005.
- (2005) Robotics and Autonomous Systems , vol.50 , Issue.2-3 , pp. 99-114
- Kok, J.R.¹ Spaan, M.T.J.² Vlassis, N.³

29
- 33748533364
- PhD thesis, Faculty of Science, University of Amsterdam
- J. R. Kok. Coordination and Learning in Cooperative Multiagent Systems. PhD thesis, Faculty of Science, University of Amsterdam, 2006.
- (2006) Coordination and Learning in Cooperative Multiagent Systems
- Kok, J.R.¹

30
- 0035246564
- Factor graphs and the sum-product algorithm
- F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47:498-519, 2001.
- (2001) IEEE Transactions on Information Theory , vol.47 , pp. 498-519
- Kschischang, F.R.¹ Frey, B.J.² Loeliger, H.-A.³

31
- 3042527480
- Kluwer academic publishers
- V. Lesser, C. Ortiz, and M. Tambe. Distributed sensor nets: A multiagent perspective. Kluwer academic publishers, 2003.
- (2003) Distributed Sensor Nets: A Multiagent Perspective
- Lesser, V.¹ Ortiz, C.² Tambe, M.³

32
- 85032780651
- An introduction to factor graphs
- January
- H.-A. Loeliger. An introduction to factor graphs. IEEE Signal Processing Magazine, pages 28-41, January 2004.
- (2004) IEEE Signal Processing Magazine , pp. 28-41
- Loeliger, H.-A.¹

33
- 33746360402
- Distributed optimization in adaptive networks
- S. Thrun, L. Saul, and B. Schölkopf, editors. MIT Press, Cambridge, MA
- C. C. Moallemi and B. Van Roy. Distributed optimization in adaptive networks. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems (NIPS) 16. MIT Press, Cambridge, MA, 2004.
- (2004) Advances in Neural Information Processing Systems (NIPS) , vol.16
- Moallemi, C.C.¹ Van Roy, B.²

34
- 10044277219
- ADOPT: Asynchronous distributed constraint optimization with quality guarantees
- P. Jay Modi, W-M. Shen, M. Tambe, and M. Yokoo. ADOPT: Asynchronous distributed constraint optimization with quality guarantees. Artificial Intelligence, 161(1-2): 149-180, 2005.
- (2005) Artificial Intelligence , vol.161 , Issue.1-2 , pp. 149-180
- Modi, P.J.¹ Shen, W.-M.² Tambe, M.³ Yokoo, M.⁴

35
- 0002425879
- Loopy belief propagation for approximate inference: An empirical study
- Stockholm, Sweden
- K. Murphy, Y. Weiss, and M. Jordan. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of Uncertainty in Artificial Intelligence (UAI), Stockholm, Sweden, 1999.
- (1999) Proceedings of Uncertainty in Artificial Intelligence (UAI)
- Murphy, K.¹ Weiss, Y.² Jordan, M.³

36
- 84898980684
- Autonomous helicopter flight via reinforcement learning
- A. Y. Ng, H. Jin Kim, M. Jordan, and S. Sastry. Autonomous helicopter flight via reinforcement learning. In Advances in Neural Information Processing Systems (NIPS) 16, 2004.
- (2004) Advances in Neural Information Processing Systems (NIPS) , vol.16
- Ng, A.Y.¹ Kim, H.J.² Jordan, M.³ Sastry, S.⁴

37
- 0036573011
- Distributed algorithms for multi-robot observation of multiple moving targets
- L. E. Parker. Distributed algorithms for multi-robot observation of multiple moving targets. Autonomous Robots, 12(3):231-255, 2002.
- (2002) Autonomous Robots , vol.12 , Issue.3 , pp. 231-255
- Parker, L.E.¹

38
- 0003391330
- Morgan Kaufman, San Mateo
- J. Pearl. Probabilistic reasoning in intelligent systems. Morgan Kaufman, San Mateo, 1988.
- (1988) Probabilistic Reasoning in Intelligent Systems
- Pearl, J.¹

39
- 0012646255
- Learning to cooperate via policy search
- Morgan Kaufmann Publishers
- L. Peshkin, K.-E. Kim, N. Meuleau, and L. P. Kaelbling. Learning to cooperate via policy search. In Proceedings of Uncertainty in Artificial Intelligence (UAI), pages 489-496. Morgan Kaufmann Publishers, 2000.
- (2000) Proceedings of Uncertainty in Artificial Intelligence (UAI) , pp. 489-496
- Peshkin, L.¹ Kim, K.-E.² Meuleau, N.³ Kaelbling, L.P.⁴

40
- 85102627959
- Wiley, New York
- M. L. Puterman. Markov decision processes: Discrete stochastic dynamic programming. Wiley, New York, 1994.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

41
- 1142292938
- The communicative multiagent team decision problem: Analyzing teamwork theories and models
- D. V. Pynadath and M. Tambe. The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research, 16:389-423, 2002.
- (2002) Journal of Artificial Intelligence Research , vol.16 , pp. 389-423
- Pynadath, D.V.¹ Tambe, M.²

42
- 0001395498
- Distributed value functions
- Bled, Slovenia
- J. Schneider, W.-K. Wong, A. Moore, and M. Riedmiller. Distributed value functions. In International Conference on Machine Learning (ICML), Bled, Slovenia, 1999.
- (1999) International Conference on Machine Learning (ICML)
- Schneider, J.¹ Wong, W.-K.² Moore, A.³ Riedmiller, M.⁴

43
- 0028555752
- Learning to coordinate without sharing information
- Seattle, WA
- S. Sen, M. Sekaran, and J. Hale. Learning to coordinate without sharing information. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Seattle, WA, 1994.
- (1994) Proceedings of the National Conference on Artificial Intelligence (AAAI)
- Sen, S.¹ Sekaran, M.² Hale, J.³

44
- 0000392613
- Stochastic games
- L. Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39:1095-1100, 1953.
- (1953) Proceedings of the National Academy of Sciences , vol.39 , pp. 1095-1100
- Shapley, L.¹

45
- 27544506565
- Reinforcement learning for RoboCup-soccer keepaway
- P. Stone, R. S. Sutton, and G. Kuhlmann. Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3): 165-188, 2005.
- (2005) Adaptive Behavior , vol.13 , Issue.3 , pp. 165-188
- Stone, P.¹ Sutton, R.S.² Kuhlmann, G.³

46
- 0004102479
- MIT Press, Cambridge, MA
- R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. MIT Press, Cambridge, MA, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

47
- 0032096675
- Multiagent systems
- K. Sycara. Multiagent systems. AI Magazine, 19(2):79-92, 1998.
- (1998) AI Magazine , vol.19 , Issue.2 , pp. 79-92
- Sycara, K.¹

48
- 85152198941
- Multi-agent reinforcement learning: Independent vs. cooperative agents
- Amherst, MA
- M. Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In International Conference on Machine Learning (ICML), Amherst, MA, 1993.
- (1993) International Conference on Machine Learning (ICML)
- Tan, M.¹

49
- 0029276036
- Temporal difference learning and TD-Gammon
- March
- G. Tesauro. Temporal difference learning and TD-Gammon. Communications offne ACM, 38(3), March 1995.
- (1995) Communications Offne ACM , vol.38 , Issue.3
- Tesauro, G.¹

50
- 12244314056
- Informatics Institute, University of Amsterdam, September
- N. Vlassis. A concise introduction to multiagent systems and distributed AI. Informatics Institute, University of Amsterdam, September 2003.
- (2003) A Concise Introduction to Multiagent Systems and Distributed AI
- Vlassis, N.¹

51
- 15744395091
- Anytime algorithms for multiagent decision making using coordination graphs
- The Hague, The Netherlands, October
- N, Vlassis, R. Elhorst, and J. R. Kok. Anytime algorithms for multiagent decision making using coordination graphs. In Proceedings of the International Conference on Systems, Man, and Cybernetics (SMC), The Hague, The Netherlands, October 2004.
- (2004) Proceedings of the International Conference on Systems, Man, and Cybernetics (SMC)
- Vlassis, N.¹ Elhorst, R.² Kok, J.R.³

52
- 3943084089
- Tree consistency and bounds on the performance of the max-product algorithm and its generalizations
- April
- M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky. Tree consistency and bounds on the performance of the max-product algorithm and its generalizations. Statistics and Computing, 14: 143-166, April 2004.
- (2004) Statistics and Computing , vol.14 , pp. 143-166
- Wainwright, M.J.¹ Jaakkola, T.S.² Willsky, A.S.³

53
- 34249833101
- Technical note: Q-learning
- C. Watkins and P. Dayan. Technical note: Q-learning. Machine Learning, 8(3-4):279-292, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

54
- 0003744207
- G. Weiss, editor. MIT Press
- G. Weiss, editor. Multiagent systems: A modern approach to distributed artificial intelligence. MIT Press, 1999.
- (1999) Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence

55
- 0141695638
- Understanding belief propagation and its generalizations
- chapter 8. Morgan Kaufmann Publishers Inc., January
- J. S. Yedidia, W. T. Freeman, and Y. Weiss. Understanding belief propagation and its generalizations. In Exploring Artificial Intelligence in the New Millennium, chapter 8, pages 239-269. Morgan Kaufmann Publishers Inc., January 2003.
- (2003) Exploring Artificial Intelligence in the New Millennium , pp. 239-269
- Yedidia, J.S.¹ Freeman, W.T.² Weiss, Y.³

56
- 1142305807
- Distributed constraint optimization as a formal model of partially adversarial cooperation
- University of Michigan, Ann Arbor, MI 48109
- M. Yokoo and E. H. Durfee. Distributed constraint optimization as a formal model of partially adversarial cooperation. Technical Report CSE-TR-101-91, University of Michigan, Ann Arbor, MI 48109, 1991.
- (1991) Technical Report , vol.CSE-TR-101-91
- Yokoo, M.¹ Durfee, E.H.²

57
- 0000049635
- Exploiting causal independence in Bayesian network inference
- N. Lianwen Zhang and D. Poole. Exploiting causal independence in Bayesian network inference. Journal of Artificial Intelligence Research, 5:301-328, 1996.
- (1996) Journal of Artificial Intelligence Research , vol.5 , pp. 301-328
- Zhang, N.L.¹ Poole, D.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.