SCOPUS 정보 검색 플랫폼

Autonomous Agents and Multi-Agent Systems

Volumn 15, Issue 2, 2007, Pages 197-220

Shaping multi-agent systems with gradient reinforcement learning

(3) Buffet, Olivier a Dutech, Alain b Charpillet, François b

a LAAS CNRS (France)

b LORIA (France)

Author keywords

Multi agent systems; Partially observable Markov decision processes; Policy gradient; Reinforcement learning; Shaping

Indexed keywords

EID: 34548099216 PISSN: 13872532 EISSN: 15737454 Source Type: Journal
DOI: 10.1007/s10458-006-9010-5 Document Type: Article

Times cited : (36)

References (53)

1
- 0030149709
- Purposive behavior acquisition for a real robot by vision-based reinforcement learning
- Asada, M., Noda S., Tawaratsumida, S., & Hosodaal, K. (1996). Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23(2-3), 279-303.
- (1996) Machine Learning , vol.23 , Issue.2-3 , pp. 279-303
- Asada, M.¹ Noda, S.² Tawaratsumida, S.³ Hosodaal, K.⁴

2
- 2542506169
- Hebbian synaptic modifications in spiking neurons that learn
- Technical report, Australian National University
- Bartlett, P., & Baxter, J. (1999). Hebbian synaptic modifications in spiking neurons that learn. Technical report, Australian National University.
- (1999)
- Bartlett, P.¹ Baxter, J.²

3
- 0013535965
- Infinite-horizon policy-gradient estimation
- Baxter, J., & Bartlett, P. (2001). Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15, 319-350.
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.²

4
- 0013495368
- Experiments with infinite-horizon, policy-gradient estimation
- Baxter, J., Bartlett, P., & Weaver, L. (2001). Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research, 15, 351-381.
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 351-381
- Baxter, J.¹ Bartlett, P.² Weaver, L.³

5
- 0036874366
- The complexity of decentralized control of Markov decision processes
- Bernstein, D., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819-840.
- (2002) Mathematics of Operations Research , vol.27 , Issue.4 , pp. 819-840
- Bernstein, D.¹ Givan, R.² Immerman, N.³ Zilberstein, S.⁴

6
- 0004211236
- Athena Scientific
- Bertsekas, D., & Tsitsiklis, X (1996). Neurodynamic programming. Athena Scientific.
- (1996) Neurodynamic programming
- Bertsekas, D.¹ Tsitsiklis, X.²

7
- 0002500351
- Planning, learning and coordination in multiagent decision processes
- Y. Shoham Ed
- Boutilier, C., (1996). Planning, learning and coordination in multiagent decision processes. In Y. Shoham (Ed.), Proceedings of the sixth conference on theoretical aspects of rationality and knowledge (TARK '96) (pp. 195-210).
- (1996) Proceedings of the sixth conference on theoretical aspects of rationality and knowledge (TARK '96) , pp. 195-210
- Boutilier, C.¹

8
- 33645843610
- Ph.D. thesis, Université Henri Poincaré, Nancy, Laboratoire Lorrain de recherche en informatique et ses applications LORIA
- Buffet, O. (2003). Une double approche modulaire de l'apprentissage par renforcement pour des agents intelligents adaptatifs. Ph.D. thesis, Université Henri Poincaré, Nancy 1. Laboratoire Lorrain de recherche en informatique et ses applications (LORIA).
- (2003) Une double approche modulaire de l'apprentissage par renforcement pour des agents intelligents adaptatifs , pp. 1
- Buffet, O.¹

9
- 34548075393
- Buffet, O., & Aberdeen, D. (2006). The factored policy gradient planner (IPC-06 Version). In A. Gerevini, B. Bonet, & B. Givan (Eds.), Proceedings of the fifth international planning competition (IPC-5) (pp. 69-71). Winner, probabilistic track of the 5th International Planning Competition.
- Buffet, O., & Aberdeen, D. (2006). The factored policy gradient planner (IPC-06 Version). In A. Gerevini, B. Bonet, & B. Givan (Eds.), Proceedings of the fifth international planning competition (IPC-5) (pp. 69-71). Winner, probabilistic track of the 5th International Planning Competition.

10
- 34548090879
- Self-growth of basic behaviors in an action selection based agent
- S. Schaal, A. Ijspeert, A. Billard, S. Vijayakumar, J. Hallam, & J.-A. Meyer Eds
- Buffet, O., Dutech, A., & Charpillet, F. (2004). Self-growth of basic behaviors in an action selection based agent. In S. Schaal, A. Ijspeert, A. Billard, S. Vijayakumar, J. Hallam, & J.-A. Meyer (Eds.), From animals to animats 8: Proceedings of the eighth international conference on simulation of adaptive behavior (SAB'04) (pp. 223-232).
- (2004) From animals to animats 8: Proceedings of the eighth international conference on simulation of adaptive behavior (SAB'04) , pp. 223-232
- Buffet, O.¹ Dutech, A.² Charpillet, F.³

11
- 33645896149
- Développement autonome des comportements de base d'un agent
- Buffet, O., Dutech, A., & Charpillet, F. (2005). Développement autonome des comportements de base d'un agent. Revue. d'Intelligence Artificielle, 19(4-5), 603-632.
- (2005) Revue. d'Intelligence Artificielle , vol.19 , Issue.4-5 , pp. 603-632
- Buffet, O.¹ Dutech, A.² Charpillet, F.³

12
- 34548066148
- Carmel, D., & Markovitch, S. (1996). Adaption and learning in multi-agent systems, 1042, Lecture notes in artificial intelligence, Chapt. Opponent modeling in multi-agent systems (pp. 40-52). Springer-Verlag.
- Carmel, D., & Markovitch, S. (1996). Adaption and learning in multi-agent systems, Vol. 1042, Lecture notes in artificial intelligence, Chapt. Opponent modeling in multi-agent systems (pp. 40-52). Springer-Verlag.

13
- 0003989210
- Ph.D. thesis, Brown. University, Department of Computer Science, Providence, RI
- Cassandra, A. R. (1998). Exact and approximate algorithms for partially observable Markov decision processes. Ph.D. thesis, Brown. University, Department of Computer Science, Providence, RI.
- (1998) Exact and approximate algorithms for partially observable Markov decision processes
- Cassandra, A.R.¹

14
- 84901418243
- Ant colony optimization: A new meta-heuristic
- P. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao, & A. Zalzala Eds
- Dorigo, M., & Di Caro, G. (1999). Ant colony optimization: A new meta-heuristic. In P. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao, & A. Zalzala (Eds.), Proceedings of the congress on evolutionary computation (CEC-99) (pp. 1470-1477).
- (1999) Proceedings of the congress on evolutionary computation (CEC-99) , pp. 1470-1477
- Dorigo, M.¹ Di Caro, G.²

15
- 1142269177
- Solving POMDP using selected past-events
- W Horn Ed
- Dutech, A. (2000). Solving POMDP using selected past-events. In W Horn (Ed.), Proceedings of the fourteenth european conference on artificial intelligence (ECAV'00) (pp. 281-285).
- (2000) Proceedings of the fourteenth european conference on artificial intelligence (ECAV'00) , pp. 281-285
- Dutech, A.¹

16
- 5644261272
- Learning in large cooperative multi-robot domains
- Fernández, F., & Parker, L. (2001). Learning in large cooperative multi-robot domains. International Journal of Robotics and Automation, 16(4), 217-226.
- (2001) International Journal of Robotics and Automation , vol.16 , Issue.4 , pp. 217-226
- Fernández, F.¹ Parker, L.²

17
- 4444338336
- A formal analysis and taxonomy of task allocation in multi-robot systems
- Gerkey, B., & Matarić, M. (2004). A formal analysis and taxonomy of task allocation in multi-robot systems. International Journal of Robotics Research, 23(9), 939-954.
- (2004) International Journal of Robotics Research , vol.23 , Issue.9 , pp. 939-954
- Gerkey, B.¹ Matarić, M.²

18
- 4544268459
- Interactive POMDPs: Properties and preliminary results
- Gmytrasiewicz, P., & Doshi, P. (2004). Interactive POMDPs: Properties and preliminary results. In Proceedings of the third international joint conference on autonomous agents and multi-agent systems (AAMAS'04).
- (2004) Proceedings of the third international joint conference on autonomous agents and multi-agent systems (AAMAS'04)
- Gmytrasiewicz, P.¹ Doshi, P.²

19
- 4544234100
- Decentralized language learning through acting
- Goldman, C., Allen, M., & Zilberstein, S. (2004). Decentralized language learning through acting. In Proceedings of the third international joint conference on autonomous agents and multi-agent systems (AAMAS'04).
- (2004) Proceedings of the third international joint conference on autonomous agents and multi-agent systems (AAMAS'04)
- Goldman, C.¹ Allen, M.² Zilberstein, S.³

20
- 0013465036
- Discovering hierarchy in reinforcement learning with HEXQ
- C. Sammut & A. G. Hoffmann Eds
- Hengst, B. (2002). Discovering hierarchy in reinforcement learning with HEXQ. In C. Sammut & A. G. Hoffmann (Eds.), Proceedings of the nineteenth international conference on machine learning (ICML'02) (pp. 243-250).
- (2002) Proceedings of the nineteenth international conference on machine learning (ICML'02) , pp. 243-250
- Hengst, B.¹

21
- 0031706903
- Online learning about other agents in a dynamic multiagent system
- K. P. Sycara & M. Wooldridge Eds
- Hu, J., & Wellman, M. (1998). Online learning about other agents in a dynamic multiagent system. In K. P. Sycara & M. Wooldridge (Eds.), Proceedings of the second international conference on autonomous agents (Agents'98) (pp. 239-246).
- (1998) Proceedings of the second international conference on autonomous agents (Agents'98) , pp. 239-246
- Hu, J.¹ Wellman, M.²

22
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakkola, T., Jordan, M., & Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1186-1201.
- (1994) Neural Computation , vol.6 , Issue.6 , pp. 1186-1201
- Jaakkola, T.¹ Jordan, M.² Singh, S.³

23
- 34548094725
- Jong, E. D. (2000). Attractors in the development of communication. In J.-A. Meyer, A. Berthoz, D. Floreano, H. L. Roitblat, & S. W Wilson (Eds.), From animals to animats 6: Proceedings of the sixth international conference on simulation of adaptive behavior (SAB-00).
- Jong, E. D. (2000). Attractors in the development of communication. In J.-A. Meyer, A. Berthoz, D. Floreano, H. L. Roitblat, & S. W Wilson (Eds.), From animals to animats 6: Proceedings of the sixth international conference on simulation of adaptive behavior (SAB-00).

24
- 0029679044
- Reinforcement learning: A survey
- Kaelbling, L., Littman, M., & Moore, A. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285.
- (1996) Journal of Artificial Intelligence Research , vol.4 , pp. 237-285
- Kaelbling, L.¹ Littman, M.² Moore, A.³

25
- 34548090340
- Ph.D. thesis, University of Illinois at Urbana-Champaign
- Laud, A. (2004). Theory and application of reward shaping in reinforcement learning. Ph.D. thesis, University of Illinois at Urbana-Champaign.
- (2004) Theory and application of reward shaping in reinforcement learning
- Laud, A.¹

26
- 85138579181
- Learning policies for partially observable environments: Scaling up
- A. Prieditis & S. X Russell Eds
- Littman, M., Cassandra, A., & Kaelbling, L. (1995). Learning policies for partially observable environments: Scaling up. In A. Prieditis & S. X Russell (Eds.), Proceedings of the twelveth international conference on machine learning (ICML'95) (pp. 362-370).
- (1995) Proceedings of the twelveth international conference on machine learning (ICML'95) , pp. 362-370
- Littman, M.¹ Cassandra, A.² Kaelbling, L.³

27
- 0030647149
- Reinforcement learning in the multi-robot domain
- Matarić, M. (1997). Reinforcement learning in the multi-robot domain. Autonomous Robots, 4(1), 73-83.
- (1997) Autonomous Robots , vol.4 , Issue.1 , pp. 73-83
- Matarić, M.¹

28
- 0003932121
- Ph.D. thesis, University of Rochester
- McCallum, R. A. (1995). Reinforcement learning with selective perception and hidden state. Ph.D. thesis, University of Rochester.
- (1995) Reinforcement learning with selective perception and hidden state
- McCallum, R.A.¹

29
- 0141596576
- Policy invariance under reward transformations: Theory and application to reward shaping
- I. Bratko & S. Dzeroski Eds
- Ng, A., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In I. Bratko & S. Dzeroski (Eds.), Proceedings of the sixteenth international conference on machine learning (ICML'99) (pp. 278-287).
- (1999) Proceedings of the sixteenth international conference on machine learning (ICML'99) , pp. 278-287
- Ng, A.¹ Harada, D.² Russell, S.³

30
- 0012646255
- Learning to cooperate via policy search
- C. Boutilier & M. Goldszmidt Eds
- Peshkin, L., Kim, K., Meuleau, N., & Kaelbling, L. (2000). Learning to cooperate via policy search. In C. Boutilier & M. Goldszmidt (Eds.), Proceedings of the sixteenth conference on uncertainty in artificial intelligence (UAI'00) (pp. 489-496).
- (2000) Proceedings of the sixteenth conference on uncertainty in artificial intelligence (UAI'00) , pp. 489-496
- Peshkin, L.¹ Kim, K.² Meuleau, N.³ Kaelbling, L.⁴

31
- 33646413135
- Natural actor-critic
- J. Gama, R. Camacho, P. Brazdil, A. Jorge, & L. Torgo (Eds, Proceedings of the sixteenth european conference on machine, learning ECML'05
- Peters, J., Vijayakumar, S., & Schaal, S. (2005). Natural actor-critic. In J. Gama, R. Camacho, P. Brazdil, A. Jorge, & L. Torgo (Eds.), Proceedings of the sixteenth european conference on machine, learning (ECML'05), Vol. 3720, Lecture notes in computer science.
- (2005) Lecture notes in computer science , vol.3720
- Peters, J.¹ Vijayakumar, S.² Schaal, S.³

32
- 85102627959
- New York, USA: Wiley
- Puterman, M. L. (1994). Markov decision processes-Discrete stochastic, dynamic programming. New York, USA: Wiley.
- (1994) Markov decision processes-Discrete stochastic, dynamic programming
- Puterman, M.L.¹

33
- 1142292938
- The communicative multiagent team decision problem: Analyzing teamwork theories and models
- Pynadath, D., & Tambe, M. (2002). The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research, 16, 389-423.
- (2002) Journal of Artificial Intelligence Research , vol.16 , pp. 389-423
- Pynadath, D.¹ Tambe, M.²

34
- 34547736781
- Shaping in reinforcement learning by changing the physics of the problem
- P. Langley Ed
- Randløv, J. (2000). Shaping in reinforcement learning by changing the physics of the problem. In P. Langley (Ed.), Proceedings of the seventeenth international conference on machine learning (ICML'00) (pp. 767-774).
- (2000) Proceedings of the seventeenth international conference on machine learning (ICML'00) , pp. 767-774
- Randløv, J.¹

35
- 1642401055
- Learning to drive a bicycle using reinforcement learning and shaping
- X W Shavlik Ed
- Randløv, J., & Alstrøm, P. (1998). Learning to drive a bicycle using reinforcement learning and shaping. In X W Shavlik (Ed.), Proceedings of the fifteenth international conference on machine learning (ICML'98) (pp. 463-471).
- (1998) Proceedings of the fifteenth international conference on machine learning (ICML'98) , pp. 463-471
- Randløv, J.¹ Alstrøm, P.²

36
- 34548081574
- Model-based opponent modelling in domains beyond the prisoner's dilemma
- Rogowski, C. (2004). Model-based opponent modelling in domains beyond the prisoner's dilemma. In Proceedings of modeling other agents from observations (MOO 2004), AAMAS'04 workshop.
- (2004) Proceedings of modeling other agents from observations (MOO 2004), AAMAS'04 workshop
- Rogowski, C.¹

37
- 0032208296
- Learning team strategies: Soccer case studies
- Salustowicz, R., Wiering, M., & Schmidhuber, J. (1998). Learning team strategies: Soccer case studies. Machine Learning, 33, 263-282.
- (1998) Machine Learning , vol.33 , pp. 263-282
- Salustowicz, R.¹ Wiering, M.² Schmidhuber, J.³

38
- 4544279348
- Multi-agent reinforcement learning: A critical survey
- Technical report, Stanford
- Shoham, Y., Powers, R., & Grenager, T. (2003). Multi-agent reinforcement learning: A critical survey. Technical report, Stanford.
- (2003)
- Shoham, Y.¹ Powers, R.² Grenager, T.³

39
- 34548076726
- Singh, S., Jaakkola, T., & Jordan, M. (1994). Learning without state estimation in partially observable Markovian decision processes. In W. W. Cohen & H. Hirsh (Eds.), Proceedings of the eleventh international conference on machine learning (ICML'94).
- Singh, S., Jaakkola, T., & Jordan, M. (1994). Learning without state estimation in partially observable Markovian decision processes. In W. W. Cohen & H. Hirsh (Eds.), Proceedings of the eleventh international conference on machine learning (ICML'94).

40
- 0004144751
- New-York: Collier-Macmillian
- Skinner, B. (1953). Science and human behavior. New-York: Collier-Macmillian.
- (1953) Science and human behavior
- Skinner, B.¹

41
- 34548064145
- Cambridge University Press
- Staddon, X (1983). Adaptative behavior and learning. Cambridge University Press.
- (1983) Adaptative behavior and learning
- Staddon, X.¹

42
- 84974678409
- Layered learning
- R. L. de Mántaras & E. Plaza (Eds, Proceedings of the eleventh european conference on machine learning ECML'00
- Stone, P., & Veloso, M. (2000a). Layered learning. In R. L. de Mántaras & E. Plaza (Eds.), Proceedings of the eleventh european conference on machine learning (ECML'00), Vol. 1810, Lecture notes in computer science.
- (2000) Lecture notes in computer science , vol.1810
- Stone, P.¹ Veloso, M.²

43
- 0034205975
- Multiagent systems: A survey from a machine learning perspective
- Stone, P., & Veloso, M. (2000b). Multiagent systems: A survey from a machine learning perspective. Autonomous Robotics, 8(3).
- (2000) Autonomous Robotics , vol.8 , Issue.3
- Stone, P.¹ Veloso, M.²

44
- 34548106782
- Team-partitioned, opaque-transition reinforcement learning
- Stone, P., & Veloso, M. (2000c). Team-partitioned, opaque-transition reinforcement learning. In Proceedings of the third international conference on autonomous agents (Agents'00).
- (2000) Proceedings of the third international conference on autonomous agents (Agents'00)
- Stone, P.¹ Veloso, M.²

45
- 0004102479
- Cambridge, MA: Bradford Book, MIT Press
- Sutton, R., & Barto, G. (1998). Reinforcement learning: An introduction. Cambridge, MA: Bradford Book, MIT Press.
- (1998) Reinforcement learning: An introduction
- Sutton, R.¹ Barto, G.²

46
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- S. A. Solla, T. K. Leen, & K.-R. Müller Eds
- Sutton, R., McAllester, D., Singh, S., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.), Advances in neural information processing systems 11 (NIPS'99), Vol. 12 (pp. 1057-1063).
- (1999) Advances in neural information processing systems 11 (NIPS'99) , vol.12 , pp. 1057-1063
- Sutton, R.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

47
- 85158118268
- Collective intelligence and Braess paradox
- Tumer, K., & Wolpert, D. (2000). Collective intelligence and Braess paradox. In Proceedings of the sixteenth national conference on artificial intelligence (AAAI'00) (pp. 104-109).
- (2000) Proceedings of the sixteenth national conference on artificial intelligence (AAAI'00) , pp. 104-109
- Tumer, K.¹ Wolpert, D.²

48
- 0003702006
- Ph.D. thesis, University of Edinburgh
- Tyrrell, T. (1993). Computational mechanisms for action, selection. Ph.D. thesis, University of Edinburgh.
- (1993) Computational mechanisms for action, selection
- Tyrrell, T.¹

49
- 14844354192
- Agent learning about agents: A framework and analysis
- S. Sen Ed
- Vidal, J., & Durfee, E. (1997). Agent learning about agents: A framework and analysis. In S. Sen (Ed.), Collected papers from the AAAI-97 workshop on multiagent learning (pp. 71-76).
- (1997) Collected papers from the AAAI-97 workshop on multiagent learning , pp. 71-76
- Vidal, J.¹ Durfee, E.²

50
- 0004049893
- Ph.D. thesis, King's College of Cambridge, UK
- Watkins, C. (1989). Learning from delayed rewards. Ph.D. thesis, King's College of Cambridge, UK.
- (1989) Learning from delayed rewards
- Watkins, C.¹

51
- 0004320981
- An introduction to collective intelligence
- Technical Report NASA-ARC-IC-99-63, NASA AMES Research Center
- Wolpert, D., & Tumer, K. (1999). An introduction to collective intelligence. Technical Report NASA-ARC-IC-99-63, NASA AMES Research Center.
- (1999)
- Wolpert, D.¹ Tumer, K.²

52
- 0032691530
- General principles of learning-based multi-agent systems
- Wolpert, D., Wheeler, K., & Turner, K. (1999). General principles of learning-based multi-agent systems. In Proceedings of the third international conference on autonomous agents (Agents'99) (pp. 77-83).
- (1999) Proceedings of the third international conference on autonomous agents (Agents'99) , pp. 77-83
- Wolpert, D.¹ Wheeler, K.² Turner, K.³

53
- 84962090726
- Xuan, P., Lesser, V., & Zilberstein, S. (2000). Communication in multi-agent Markov decision processes. In S. Parsons & P. Gmytrasiewicz (Eds.), Proceedings of ICMAS workshop on game theoretic and decision theoretic agents.
- Xuan, P., Lesser, V., & Zilberstein, S. (2000). Communication in multi-agent Markov decision processes. In S. Parsons & P. Gmytrasiewicz (Eds.), Proceedings of ICMAS workshop on game theoretic and decision theoretic agents.

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.