SCOPUS 정보 검색 플랫폼

Neurocomputing

Volumn 190, Issue , 2016, Pages 82-94

Multi-agent reinforcement learning as a rehearsal for decentralized planning

(2) Kraemer, Landon a Banerjee, Bikramjit a

a UNIVERSITY OF SOUTHERN MISSISSIPPI (United States)

Author keywords

Decentralized planning; Multi agent reinforcement learning

Indexed keywords

BEHAVIORAL RESEARCH; BENCHMARKING; DECISION MAKING; FERTILIZERS; MARKOV PROCESSES; MULTI AGENT SYSTEMS; UNCERTAINTY ANALYSIS;

CENTRALIZED COMPUTATION; DECENTRALIZED PLANNING; DECISION MAKING UNDER UNCERTAINTY; DISTRIBUTED SOLUTIONS; MULTI-AGENT PLANNING; MULTI-AGENT REINFORCEMENT LEARNING; PARTIALLY OBSERVABLE MARKOV DECISION PROCESS; SOLUTION TECHNIQUES;

REINFORCEMENT LEARNING;

ALGORITHM; ARTICLE; CONCEPTUAL FRAMEWORK; DECENTRALIZED PARTIALLY OBSERVABLE MARKOV DECISION PROCESS; MACHINE LEARNING; MATHEMATICAL COMPUTING; MATHEMATICAL MODEL; MATHEMATICAL PARAMETERS; MULTI AGENT REINFORCEMENT LEARNING; PRIORITY JOURNAL; PROBABILITY;

EID: 84962082047 PISSN: 09252312 EISSN: 18728286 Source Type: Journal
DOI: 10.1016/j.neucom.2016.01.031 Document Type: Article

Times cited : (404)

References (30)

1
- 77958561050
- Incremental policy generation for finite-horizon Dec-POMDPs
- Thessaloniki, Greece, September 19-23, AAAI
- Christopher Amato, Jilles Steeve Dibangoye, Shlomo Zilberstein, Incremental policy generation for finite-horizon Dec-POMDPs, in: Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, September 19-23, AAAI, 2009.
- (2009) Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS)
- Christopher, A.¹ Jilles, S.D.² Shlomo, Z.³

2
- 77952736651
- An investigation into mathematical programming for finite horizon decentralized POMDPs
- Aras Raghav, Dutech Alain An investigation into mathematical programming for finite horizon decentralized POMDPs. J. Artif. Intell. Res. 2010, 37:329-396.
- (2010) J. Artif. Intell. Res. , vol.37 , pp. 329-396
- Aras, R.¹ Dutech, A.²

3
- 0041966002
- Using confidence bounds for exploitation-exploration trade-offs
- Auer Peter Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 2002, 3:397-422.
- (2002) J. Mach. Learn. Res. , vol.3 , pp. 397-422
- Auer, P.¹

4
- 84868275593
- Sample bounded distributed reinforcement learning for decentralized POMDPs
- Toronto, Canada, July
- Bikramjit Banerjee, Jeremy Lyle, Landon Kraemer, Rajesh Yellamraju. Sample bounded distributed reinforcement learning for decentralized POMDPs, in: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI-12), Toronto, Canada, July 2012, pp. 1256-1262.
- (2012) Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI-12) , pp. 1256-1262
- Bikramjit, B.¹ Jeremy, L.² Landon, K.³ Rajesh, Y.⁴

5
- 84880904080
- General game learning using knowledge transfer
- Hyderabad, India
- Bikramjit Banerjee, Peter Stone, General game learning using knowledge transfer, in: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), Hyderabad, India, 2007, pp. 672-677.
- (2007) Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07) , pp. 672-677
- Bikramjit, B.¹ Peter, S.²

6
- 0036874366
- The complexity of decentralized control of Markov decision processes
- Bernstein Daniel S., Givan Robert, Immerman Neil, Zilberstein Shlomo The complexity of decentralized control of Markov decision processes. Math. Oper. Res. 2002, 27:819-840.
- (2002) Math. Oper. Res. , vol.27 , pp. 819-840
- Bernstein, D.S.¹ Givan, R.² Immerman, N.³ Zilberstein, S.⁴

7
- 0002500351
- Planning, learning and coordination in multiagent decision processes
- Craig Boutilier, Planning, learning and coordination in multiagent decision processes, in: Proceedings of 6th Conference on Theoretical Aspects of Rationality and Knowledge, 1996, pp. 195-210.
- (1996) Proceedings of 6th Conference on Theoretical Aspects of Rationality and Knowledge , pp. 195-210
- Craig, B.¹

8
- 84962119252
- The MARL Toolbox version 1.3
- Lucian Busoniu, The MARL Toolbox version 1.3, 2010. http://busoniu.net/repository.php.
- (2010)
- Busoniu, L.¹

9
- 40949147745
- A comprehensive survey of multiagent reinforcement learning
- Busoniu Lucian., Babuska Robert., De Schutter Bart. A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. Part C 2008, 38(2):156-172.
- (2008) IEEE Trans. Syst. Man Cybern. Part C , vol.38 , Issue.2 , pp. 156-172
- Busoniu, L.¹ Babuska, R.² De Schutter, B.³

10
- 84899853392
- Point-based incremental pruning heuristic for solving finite-horizon Dec-POMDPs
- Budapest, Hungary
- Jilles S. Dibangoye, Abdel-Illah Mouaddib, Brahim Chai-draa, Point-based incremental pruning heuristic for solving finite-horizon Dec-POMDPs, in: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-09), Budapest, Hungary, 2009, pp. 569-576.
- (2009) Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-09) , pp. 569-576
- Jilles, S.¹ Dibangoye² Abdel-Illah, M.³ Brahim, C.-D.⁴

11
- 85162479771
- Action-gap phenomenon in reinforcement learning
- Farahmand Amir Massoud Action-gap phenomenon in reinforcement learning. Advances in Neural Information Processing Systems 2011, vol. 24:172-180.
- (2011) Advances in Neural Information Processing Systems , vol.24 , pp. 172-180
- Farahmand, A.M.¹

12
- 84962119248
- Mobile robotics: Kilobots
- K-Team, Mobile robotics: Kilobots, 〈〉. http://www.k-team.com/mobile-robotics-products/kilobot.

13
- 84884357468
- Combining manual feedback with subsequent MDP reward signals for reinforcement learning
- May
- W. Bradley Knox, Peter Stone, Combining manual feedback with subsequent MDP reward signals for reinforcement learning, in: Proceedings of 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), May 2010.
- (2010) Proceedings of 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010)
- Bradley, K.W.¹ Peter, S.²

14
- 84876899100
- Informed initial policies for learning in Dec-POMDPs
- Valencia, Spain, June
- Landon Kraemer, Bikramjit Banerjee, Informed initial policies for learning in Dec-POMDPs, in: Proceedings of the AAMAS-12 Workshop on Adaptive Learning Agents (ALA-12), Valencia, Spain, June 2012, pp. 135-143.
- (2012) Proceedings of the AAMAS-12 Workshop on Adaptive Learning Agents (ALA-12) , pp. 135-143
- Landon, K.¹ Bikramjit, B.²

15
- 84899441707
- Concurrent reinforcement learning as a rehearsal for decentralized planning under uncertainty (extended abstract)
- St. Paul, MN, May
- Landon Kraemer, Bikramjit Banerjee, Concurrent reinforcement learning as a rehearsal for decentralized planning under uncertainty (extended abstract), in: Proceedings of the 12th International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-13), St. Paul, MN, May 2013, pp. 1291-1292.
- (2013) Proceedings of the 12th International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-13) , pp. 1291-1292
- Landon, K.¹ Bikramjit, B.²

16
- 33750742257
- Value-function-based transfer for reinforcement learning using structure mapping
- July
- Yaxin Liu, Peter Stone, Value-function-based transfer for reinforcement learning using structure mapping, in: Proceedings of the Twenty-First National Conference on Artificial Intelligence, July 2006, pp. 415-420.
- (2006) Proceedings of the Twenty-First National Conference on Artificial Intelligence , pp. 415-420
- Yaxin, L.¹ Peter, S.²

17
- 0030647149
- Reinforcement learning in the multi-robot domain
- Mataric Maja J. Reinforcement learning in the multi-robot domain. Auton. Robots 1997, 4:73-83.
- (1997) Auton. Robots , vol.4 , pp. 73-83
- Mataric, M.J.¹

18
- 0141596576
- Policy invariance under reward transformations: theory and application to reward shaping
- Morgan Kaufmann, Bled, Slovenia
- Andrew Y. Ng, Daishi Harada, Stuart Russell, Policy invariance under reward transformations: theory and application to reward shaping, in: Proceedings of 16th International Conference on Machine Learning, Morgan Kaufmann, Bled, Slovenia, 1999, pp. 278-287.
- (1999) Proceedings of 16th International Conference on Machine Learning , pp. 278-287
- Andrew, Y.N.¹ Daishi, H.² Stuart, R.³

19
- 84868289680
- Heuristic search for identical payoff Bayesian games
- Toronto, Canada
- Frans A. Oliehoek, Matthijs T.J. Spaan, Jilles S. Dibangoye, Christopher Amato, Heuristic search for identical payoff Bayesian games, in: Proceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems (AAMAS-10), Toronto, Canada, 2010, pp. 1115-1122.
- (2010) Proceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems (AAMAS-10) , pp. 1115-1122
- Frans, A.¹ Oliehoek² Matthijs, T.J.³ Spaan⁴ Jilles, S.⁵ Dibangoye⁶ Christopher, A.⁷

20
- 52249098423
- Optimal and approximate Q-value functions for decentralized POMDPs
- Oliehoek Frans A., Spaan Matthijs T.J., Vlassis Nikos Optimal and approximate Q-value functions for decentralized POMDPs. JAIR 2008, 32:289-353.
- (2008) JAIR , vol.32 , pp. 289-353
- Oliehoek, F.A.¹ Spaan, M.T.J.² Vlassis, N.³

21
- 37348998542
- Q-value heuristics for approximate solutions of Dec-POMDPs
- March
- Frans A. Oliehoek, Nikos Vlassis, Q-value heuristics for approximate solutions of Dec-POMDPs, in: Proceedings of the AAAI Spring Symposium on Game Theoretic and Decision Theoretic Agents, March 2007, pp. 31-37.
- (2007) Proceedings of the AAAI Spring Symposium on Game Theoretic and Decision Theoretic Agents , pp. 31-37
- Frans, A.¹ Oliehoek² Vlassis, N.³

22
- 84899811776
- Spaan, Lossless clustering of histories in decentralized POMDPs
- Budapest, Hungary
- Frans A. Oliehoek, Shimon Whiteson, Matthijs T.J. Spaan, Lossless clustering of histories in decentralized POMDPs, in: Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-09), Budapest, Hungary, 2009, pp. 577-584.
- (2009) Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-09) , pp. 577-584
- Frans, A.¹ Oliehoek² Shimon, W.³ Matthijs, T.J.⁴

23
- 27344432348
- Accelerating reinforcement learning through implicit imitation
- Price Bob, Boutilier Craig Accelerating reinforcement learning through implicit imitation. J. Artif. Intell. Res. 2003, 19:569-629.
- (2003) J. Artif. Intell. Res. , vol.19 , pp. 569-629
- Price, B.¹ Boutilier, C.²

24
- 84880856384
- Memory-bounded dynamic programming for Dec-POMDPs
- Hyderabad, India
- Sven Seuken, Shlomo Zilberstein, Memory-bounded dynamic programming for Dec-POMDPs, in: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), Hyderabad, India, 2007, pp. 2009-2015.
- (2007) Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07) , pp. 2009-2015
- Sven, S.¹ Shlomo, Z.²

25
- 84962095723
- Dec-POMDP problem domains and format
- Matthijs Spaan, Dec-POMDP problem domains and format. 〈〉. http://masplan.org/.
- Matthijs, S.¹

26
- 84868299292
- Scaling up optimal heuristic search in Dec-POMDPs via incremental expansion
- Barcelona, Spain
- Matthijs T.J. Spaan, Frans A. Oliehoek, Christopher Amato, Scaling up optimal heuristic search in Dec-POMDPs via incremental expansion, in: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI-11), Barcelona, Spain, 2011, pp. 2027-2032.
- (2011) Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI-11) , pp. 2027-2032
- Matthijs, T.J.S.¹ Frans, A.² Oliehoek³ Christopher, A.⁴

27
- 0004102479
- MIT Press
- Sutton Richard S., Barto Andrew G. Reinforcement Learning: An Introduction 1998, MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

28
- 33750691009
- Point-based dynamic programming for Dec-POMDPs
- Boston, MA
- Daniel Szer, François Charpillet, Point-based dynamic programming for Dec-POMDPs, in: Proceedings of the 21st National Conference on Artificial Intelligence, Boston, MA, 2006, pp. 1233-1238.
- (2006) Proceedings of the 21st National Conference on Artificial Intelligence , pp. 1233-1238
- Daniel, S.¹ François, C.²

29
- 80053153738
- Rollout sampling policy iteration for decentralized POMDPs
- Feng Wu, Shlomo Zilberstein, Xiaoping Chen, Rollout sampling policy iteration for decentralized POMDPs, in: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI-10), 2010, pp. 666-673.
- (2010) Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI-10) , pp. 666-673
- Feng, W.¹ Shlomo, Z.² Xiaoping, C.³

30
- 80055062322
- Victor Lesser, Coordinated multi-agent reinforcement learning in networked distributed POMDPs
- San Francisco, CA
- Chongjie Zhang, Victor Lesser, Coordinated multi-agent reinforcement learning in networked distributed POMDPs, in: Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-11), San Francisco, CA, 2011.
- (2011) Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-11)
- Chongjie, Z.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.