메뉴 건너뛰기




Volumn 15, Issue 2, 2007, Pages 197-220

Shaping multi-agent systems with gradient reinforcement learning

Author keywords

Multi agent systems; Partially observable Markov decision processes; Policy gradient; Reinforcement learning; Shaping

Indexed keywords


EID: 34548099216     PISSN: 13872532     EISSN: 15737454     Source Type: Journal    
DOI: 10.1007/s10458-006-9010-5     Document Type: Article
Times cited : (36)

References (53)
  • 1
    • 0030149709 scopus 로고    scopus 로고
    • Purposive behavior acquisition for a real robot by vision-based reinforcement learning
    • Asada, M., Noda S., Tawaratsumida, S., & Hosodaal, K. (1996). Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23(2-3), 279-303.
    • (1996) Machine Learning , vol.23 , Issue.2-3 , pp. 279-303
    • Asada, M.1    Noda, S.2    Tawaratsumida, S.3    Hosodaal, K.4
  • 2
    • 2542506169 scopus 로고    scopus 로고
    • Hebbian synaptic modifications in spiking neurons that learn
    • Technical report, Australian National University
    • Bartlett, P., & Baxter, J. (1999). Hebbian synaptic modifications in spiking neurons that learn. Technical report, Australian National University.
    • (1999)
    • Bartlett, P.1    Baxter, J.2
  • 9
    • 34548075393 scopus 로고    scopus 로고
    • Buffet, O., & Aberdeen, D. (2006). The factored policy gradient planner (IPC-06 Version). In A. Gerevini, B. Bonet, & B. Givan (Eds.), Proceedings of the fifth international planning competition (IPC-5) (pp. 69-71). Winner, probabilistic track of the 5th International Planning Competition.
    • Buffet, O., & Aberdeen, D. (2006). The factored policy gradient planner (IPC-06 Version). In A. Gerevini, B. Bonet, & B. Givan (Eds.), Proceedings of the fifth international planning competition (IPC-5) (pp. 69-71). Winner, probabilistic track of the 5th International Planning Competition.
  • 11
    • 33645896149 scopus 로고    scopus 로고
    • Développement autonome des comportements de base d'un agent
    • Buffet, O., Dutech, A., & Charpillet, F. (2005). Développement autonome des comportements de base d'un agent. Revue. d'Intelligence Artificielle, 19(4-5), 603-632.
    • (2005) Revue. d'Intelligence Artificielle , vol.19 , Issue.4-5 , pp. 603-632
    • Buffet, O.1    Dutech, A.2    Charpillet, F.3
  • 12
    • 34548066148 scopus 로고    scopus 로고
    • Carmel, D., & Markovitch, S. (1996). Adaption and learning in multi-agent systems, 1042, Lecture notes in artificial intelligence, Chapt. Opponent modeling in multi-agent systems (pp. 40-52). Springer-Verlag.
    • Carmel, D., & Markovitch, S. (1996). Adaption and learning in multi-agent systems, Vol. 1042, Lecture notes in artificial intelligence, Chapt. Opponent modeling in multi-agent systems (pp. 40-52). Springer-Verlag.
  • 14
    • 84901418243 scopus 로고    scopus 로고
    • Ant colony optimization: A new meta-heuristic
    • P. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao, & A. Zalzala Eds
    • Dorigo, M., & Di Caro, G. (1999). Ant colony optimization: A new meta-heuristic. In P. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao, & A. Zalzala (Eds.), Proceedings of the congress on evolutionary computation (CEC-99) (pp. 1470-1477).
    • (1999) Proceedings of the congress on evolutionary computation (CEC-99) , pp. 1470-1477
    • Dorigo, M.1    Di Caro, G.2
  • 17
    • 4444338336 scopus 로고    scopus 로고
    • A formal analysis and taxonomy of task allocation in multi-robot systems
    • Gerkey, B., & Matarić, M. (2004). A formal analysis and taxonomy of task allocation in multi-robot systems. International Journal of Robotics Research, 23(9), 939-954.
    • (2004) International Journal of Robotics Research , vol.23 , Issue.9 , pp. 939-954
    • Gerkey, B.1    Matarić, M.2
  • 22
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakkola, T., Jordan, M., & Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1186-1201.
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 1186-1201
    • Jaakkola, T.1    Jordan, M.2    Singh, S.3
  • 23
    • 34548094725 scopus 로고    scopus 로고
    • Jong, E. D. (2000). Attractors in the development of communication. In J.-A. Meyer, A. Berthoz, D. Floreano, H. L. Roitblat, & S. W Wilson (Eds.), From animals to animats 6: Proceedings of the sixth international conference on simulation of adaptive behavior (SAB-00).
    • Jong, E. D. (2000). Attractors in the development of communication. In J.-A. Meyer, A. Berthoz, D. Floreano, H. L. Roitblat, & S. W Wilson (Eds.), From animals to animats 6: Proceedings of the sixth international conference on simulation of adaptive behavior (SAB-00).
  • 27
    • 0030647149 scopus 로고    scopus 로고
    • Reinforcement learning in the multi-robot domain
    • Matarić, M. (1997). Reinforcement learning in the multi-robot domain. Autonomous Robots, 4(1), 73-83.
    • (1997) Autonomous Robots , vol.4 , Issue.1 , pp. 73-83
    • Matarić, M.1
  • 31
    • 33646413135 scopus 로고    scopus 로고
    • Natural actor-critic
    • J. Gama, R. Camacho, P. Brazdil, A. Jorge, & L. Torgo (Eds, Proceedings of the sixteenth european conference on machine, learning ECML'05
    • Peters, J., Vijayakumar, S., & Schaal, S. (2005). Natural actor-critic. In J. Gama, R. Camacho, P. Brazdil, A. Jorge, & L. Torgo (Eds.), Proceedings of the sixteenth european conference on machine, learning (ECML'05), Vol. 3720, Lecture notes in computer science.
    • (2005) Lecture notes in computer science , vol.3720
    • Peters, J.1    Vijayakumar, S.2    Schaal, S.3
  • 33
    • 1142292938 scopus 로고    scopus 로고
    • The communicative multiagent team decision problem: Analyzing teamwork theories and models
    • Pynadath, D., & Tambe, M. (2002). The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research, 16, 389-423.
    • (2002) Journal of Artificial Intelligence Research , vol.16 , pp. 389-423
    • Pynadath, D.1    Tambe, M.2
  • 38
    • 4544279348 scopus 로고    scopus 로고
    • Multi-agent reinforcement learning: A critical survey
    • Technical report, Stanford
    • Shoham, Y., Powers, R., & Grenager, T. (2003). Multi-agent reinforcement learning: A critical survey. Technical report, Stanford.
    • (2003)
    • Shoham, Y.1    Powers, R.2    Grenager, T.3
  • 39
    • 34548076726 scopus 로고    scopus 로고
    • Singh, S., Jaakkola, T., & Jordan, M. (1994). Learning without state estimation in partially observable Markovian decision processes. In W. W. Cohen & H. Hirsh (Eds.), Proceedings of the eleventh international conference on machine learning (ICML'94).
    • Singh, S., Jaakkola, T., & Jordan, M. (1994). Learning without state estimation in partially observable Markovian decision processes. In W. W. Cohen & H. Hirsh (Eds.), Proceedings of the eleventh international conference on machine learning (ICML'94).
  • 42
    • 84974678409 scopus 로고    scopus 로고
    • Layered learning
    • R. L. de Mántaras & E. Plaza (Eds, Proceedings of the eleventh european conference on machine learning ECML'00
    • Stone, P., & Veloso, M. (2000a). Layered learning. In R. L. de Mántaras & E. Plaza (Eds.), Proceedings of the eleventh european conference on machine learning (ECML'00), Vol. 1810, Lecture notes in computer science.
    • (2000) Lecture notes in computer science , vol.1810
    • Stone, P.1    Veloso, M.2
  • 43
    • 0034205975 scopus 로고    scopus 로고
    • Multiagent systems: A survey from a machine learning perspective
    • Stone, P., & Veloso, M. (2000b). Multiagent systems: A survey from a machine learning perspective. Autonomous Robotics, 8(3).
    • (2000) Autonomous Robotics , vol.8 , Issue.3
    • Stone, P.1    Veloso, M.2
  • 46
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • S. A. Solla, T. K. Leen, & K.-R. Müller Eds
    • Sutton, R., McAllester, D., Singh, S., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.), Advances in neural information processing systems 11 (NIPS'99), Vol. 12 (pp. 1057-1063).
    • (1999) Advances in neural information processing systems 11 (NIPS'99) , vol.12 , pp. 1057-1063
    • Sutton, R.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 51
    • 0004320981 scopus 로고    scopus 로고
    • An introduction to collective intelligence
    • Technical Report NASA-ARC-IC-99-63, NASA AMES Research Center
    • Wolpert, D., & Tumer, K. (1999). An introduction to collective intelligence. Technical Report NASA-ARC-IC-99-63, NASA AMES Research Center.
    • (1999)
    • Wolpert, D.1    Tumer, K.2
  • 53
    • 84962090726 scopus 로고    scopus 로고
    • Xuan, P., Lesser, V., & Zilberstein, S. (2000). Communication in multi-agent Markov decision processes. In S. Parsons & P. Gmytrasiewicz (Eds.), Proceedings of ICMAS workshop on game theoretic and decision theoretic agents.
    • Xuan, P., Lesser, V., & Zilberstein, S. (2000). Communication in multi-agent Markov decision processes. In S. Parsons & P. Gmytrasiewicz (Eds.), Proceedings of ICMAS workshop on game theoretic and decision theoretic agents.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.