SCOPUS 정보 검색 플랫폼

Volumn 37, Issue 1-2, 1996, Pages 147-166

Multiagent reinforcement learning in the Iterated Prisoner's Dilemma

(2) Sandholm, Tuomas W a Crites, Robert H a

a The Manning College of Information and Computer Sciences (United States)

Author keywords

Exploration; Machine learning; Multiagent learning; Prisoner's Dilemma; Recurrent neural network; Reinforcement learning

Indexed keywords

ALGORITHM; ARTICLE; COMPUTER PROGRAM; GAME; HUMAN; PRISON; PRISONER; REINFORCEMENT;

EID: 0030050933 PISSN: 03032647 EISSN: None Source Type: Journal
DOI: 10.1016/0303-2647(95)01551-5 Document Type: Article

Times cited : (274)

References (40)

1
- 0343937610
- This Issue
- Ashlock, D., Smucker, M.D., Stanley, E.A. and Tesfatsion, L., 1995, Preferential partner selection in an evolutionary study of prisoner's dilemma. This Issue.
- (1995) Preferential Partner Selection in an Evolutionary Study of Prisoner's Dilemma
- Ashlock, D.¹ Smucker, M.D.² Stanley, E.A.³ Tesfatsion, L.⁴

2
- 84936824515
- Basic Books, New York
- Axelrod, R., 1984, The Evolution of Cooperation (Basic Books, New York).
- (1984) The Evolution of Cooperation
- Axelrod, R.¹

3
- 0020970738
- Neuron-like adaptive elements that can solve difficult learning control problems
- Barto, A.G., Sutton, R. and Anderson, C.W., 1983, Neuron-like adaptive elements that can solve difficult learning control problems. IEEE Trans. Sys., Man Cybern. 13, 834-846.
- (1983) IEEE Trans. Sys., Man Cybern. , vol.13 , pp. 834-846
- Barto, A.G.¹ Sutton, R.² Anderson, C.W.³

4
- 0010367132
- From chemotaxis to cooperativity: Abstracted exercises in neuronal learning strategies
- R. Durbin, C. Miall and G. Mitchison (eds.) (Addison-Wesley)
- Barto, A., 1989, From chemotaxis to cooperativity: Abstracted exercises in neuronal learning strategies, in: The Computing Neuron, R. Durbin, C. Miall and G. Mitchison (eds.) (Addison-Wesley) pp. 73-98.
- (1989) The Computing Neuron , pp. 73-98
- Barto, A.¹

5
- 0029210635
- Learning to act using real-time dynamic programming
- Barto, A.G., Bradtke, S.J. and Singh, S.P., 1995, Learning to act using real-time dynamic programming. Artif. Intell. 72, 81-138.
- (1995) Artif. Intell. , vol.72 , pp. 81-138
- Barto, A.G.¹ Bradtke, S.J.² Singh, S.P.³

6
- 84927461265
- Pattern recognizing stochastic learning automata
- Barto, A.G. and Anandan, P., 1985, Pattern recognizing stochastic learning automata. IEEE Trans. Sys., Man Cybern. 15, 360-375.
- (1985) IEEE Trans. Sys., Man Cybern. , vol.15 , pp. 360-375
- Barto, A.G.¹ Anandan, P.²

7
- 0022213383
- Learning by statistical cooperation of self-interested neuron-like adaptive elements
- Barto, A.G., 1985, Learning by statistical cooperation of self-interested neuron-like adaptive elements. Hum. Neurobiol. 4, 229-256.
- (1985) Hum. Neurobiol. , vol.4 , pp. 229-256
- Barto, A.G.¹

8
- 0003636164
- Prentice-Hall, Englewood Cliffs, NJ
- Bertsekas, D.P. and Tsitsiklis, J.N., 1989, Parallel and Distributed Computation: Numerical Methods (Prentice-Hall, Englewood Cliffs, NJ).
- (1989) Parallel and Distributed Computation: Numerical Methods
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

9
- 0343920391
- Computer Science Department, University of Massachusetts at Amherst. Unpublished draft
- Bradtke, S.J., 1993, Distributed Adaptive Optimal Control of Flexible Structures. Computer Science Department, University of Massachusetts at Amherst. Unpublished draft.
- (1993) Distributed Adaptive Optimal Control of Flexible Structures
- Bradtke, S.J.¹

10
- 85194555832
- PhD dissertation proposal. Computer Science Department, University of Massachusetts at Amherst
- Crites, R., 1994, Multi-Agent Reinforcement Learning. PhD dissertation proposal. Computer Science Department, University of Massachusetts at Amherst.
- (1994) Multi-agent Reinforcement Learning
- Crites, R.¹

11
- 26444565569
- Finding structure in time
- Elman, J., 1990, Finding structure in time. Cognit. Sci. 14, 179-211.
- (1990) Cognit. Sci. , vol.14 , pp. 179-211
- Elman, J.¹

12
- 0004260007
- MIT Press, Cambridge, MA
- Fudenberg, D. and Tirole, J., 1991, Game Theory (MIT Press, Cambridge, MA).
- (1991) Game Theory
- Fudenberg, D.¹ Tirole, J.²

13
- 0003413187
- Macmillan, New York
- Haykin, S., 1994, Neural Networks: A Comprehensive Foundation (Macmillan, New York).
- (1994) Neural Networks: A Comprehensive Foundation
- Haykin, S.¹

14
- 0004199140
- Addison-Wesley, Reading, MA
- Hecht-Nielsen, R., 1991, Neurocomputing (Addison-Wesley, Reading, MA).
- (1991) Neurocomputing
- Hecht-Nielsen, R.¹

15
- 0343920389
- Working paper WP-93-4. Intelligent Design Laboratory, University of Kansas
- Kinney, M. and Tsatsoulis, C., 1993, Learning Communication Strategies in Distributed Agent Environments. Working paper WP-93-4. Intelligent Design Laboratory, University of Kansas.
- (1993) Learning Communication Strategies in Distributed Agent Environments
- Kinney, M.¹ Tsatsoulis, C.²

16
- 0003758853
- Princeton University Press, Princeton, NJ
- Kreps, D., 1990, A Course in Microeconomic Theory (Princeton University Press, Princeton, NJ).
- (1990) A Course in Microeconomic Theory
- Kreps, D.¹

17
- 0003673017
- Ph.D. dissertation, School of Computer Science, Carnegie Mellon University
- Lin, L.-J., 1993, Reinforcement Learning for Robots Using Neural Networks. Ph.D. dissertation, School of Computer Science, Carnegie Mellon University.
- (1993) Reinforcement Learning for Robots Using Neural Networks
- Lin, L.-J.¹

18
- 85149834820
- Markov games as a framework for multi-agent reinforcement learning
- Machine Learning, (Rutgers University, NJ)
- Littman, M., 1993, Markov games as a framework for multi-agent reinforcement learning, in: Machine Learning, Proceedings of the Eleventh International Conference (Rutgers University, NJ) pp. 157-163.
- (1993) Proceedings of the Eleventh International Conference , pp. 157-163
- Littman, M.¹

19
- 0343048727
- A distributed reinforcement learning scheme for network routing
- Carnegie Mellon University
- Littman, M. and Boyan, J., 1993, A Distributed Reinforcement Learning Scheme for Network Routing. Technical Report CMU-CS-93-165, Carnegie Mellon University.
- (1993) Technical Report CMU-CS-93-165
- Littman, M.¹ Boyan, J.²

20
- 0004145762
- Reprint: 1989 (Dover Publications, New York)
- Luce, D. and Raiffa, H., 1957, Games and Decisions. Reprint: 1989 (Dover Publications, New York).
- (1957) Games and Decisions
- Luce, D.¹ Raiffa, H.²

21
- 0343920388
- Efficient learning of multiple degree-of-freedom control problems with quasi-independent Q-agents
- Erlbaum Associates, Hillsdale, NJ
- Markey, K.L., 1993, Efficient learning of multiple degree-of-freedom control problems with quasi-independent Q-agents, in: Proceedings of the 1993 Connectionist Models Summer School (Erlbaum Associates, Hillsdale, NJ).
- (1993) Proceedings of the 1993 Connectionist Models Summer School
- Markey, K.L.¹

22
- 0003891507
- Prentice-Hall, Englewood Cliffs, NJ
- Narendra, K.S. and Thathachar, M.A.L., 1989, Learning Automata: An Introduction (Prentice-Hall, Englewood Cliffs, NJ).
- (1989) Learning Automata: An Introduction
- Narendra, K.S.¹ Thathachar, M.A.L.²

23
- 0027336968
- A strategy of win-stay, lose-shift that outperforms Tit-For-Tat in the Prisoner's Dilemma game
- Nowak, M. and Sigmund, K., 1993, A strategy of win-stay, lose-shift that outperforms Tit-For-Tat in the Prisoner's Dilemma game. Nature 364, 56-58.
- (1993) Nature , vol.364 , pp. 56-58
- Nowak, M.¹ Sigmund, K.²

24
- 0003444646
- Learning internal representations by error propagation
- D.E. Rumelhart and J.L. Mc-Clellan (eds.) (MIT Press, Cambridge, MA)
- Rumelhart, D.E., Hinton, G.E. and Williams, R.J., 1986, Learning internal representations by error propagation, in: Parallel Distributed Processing: Explorations in the Microstructure of Cognition 1, D.E. Rumelhart and J.L. Mc-Clellan (eds.) (MIT Press, Cambridge, MA) pp. 318-362.
- (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition , vol.1 , pp. 318-362
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

25
- 0001201756
- Some studies in machine learning using the game of checkers
- Reprinted: 1963, in: Computers and Thought, E.A. Feigenbaum and J. Feldman (eds.) (McGraw-Hill, New York)
- Samuel, A.L., 1959, Some studies in machine learning using the game of checkers, IBM Journal on Research and Development, Reprinted: 1963, in: Computers and Thought, E.A. Feigenbaum and J. Feldman (eds.) (McGraw-Hill, New York) pp. 210-229
- (1959) IBM Journal on Research and Development , pp. 210-229
- Samuel, A.L.¹

26
- 0027708201
- An implementation of the contract net protocol based on marginal cost calculations
- (Washington DC)
- Sandholm, T., 1993, An implementation of the contract net protocol based on marginal cost calculations, in: Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI-93) (Washington DC) pp. 256-262.
- (1993) Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI-93) , pp. 256-262
- Sandholm, T.¹

27
- 0039643993
- Utility-based termination of anytime algorithms
- Amsterdam, The Netherlands. Extended version: University of Massachusetts at Amherst, Computer Science Technical Report 94-54
- Sandholm, T. and Lesser, V., 1994, Utility-based termination of anytime algorithms, in: Proceedings of the European Conference on Artificial Intelligence (ECAI-94) Workshop on Decision Theory for Distributed Artificial Intelligence Applications, pp. 88-99, Amsterdam, The Netherlands. Extended version: University of Massachusetts at Amherst, Computer Science Technical Report 94-54.
- (1994) Proceedings of the European Conference on Artificial Intelligence (ECAI-94) Workshop on Decision Theory for Distributed Artificial Intelligence Applications , pp. 88-99
- Sandholm, T.¹ Lesser, V.²

28
- 0001937488
- Coalition formation among bounded rational agents
- Montreal, Canada
- Sandholm, T. and Lesser, V., 1995a, Coalition formation among bounded rational agents, in: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95) (Montreal, Canada).
- (1995) Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95)
- Sandholm, T.¹ Lesser, V.²

29
- 0003101953
- Issues in automated negotiation and electronic commerce: Extending the contract net framework
- San Francisco, CA
- Sandholm, T. and Lesser, V., 1995b, Issues in automated negotiation and electronic commerce: extending the contract net framework, in: Proceedings of the First International Conference on Multiagent Systems (ICMAS-95) (San Francisco, CA).
- (1995) Proceedings of the First International Conference on Multiagent Systems (ICMAS-95)
- Sandholm, T.¹ Lesser, V.²

30
- 85194558779
- Learning pursuit strategies
- Computer Science Department, University of Massachusetts at Amherst, Spring 1993
- Sandholm, T. and Nagendraprasad, M., 1993, Learning Pursuit Strategies. Class project for CmpSci 689 Machine Learning. Computer Science Department, University of Massachusetts at Amherst, Spring 1993.
- (1993) Class Project for CmpSci 689 Machine Learning
- Sandholm, T.¹ Nagendraprasad, M.²

31
- 0028555752
- Learning to coordinate without sharing information
- Seattle, WA
- Sen, S., Sekaran, M. and Hale, J., 1994, Learning to coordinate without sharing information, in: Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94) (Seattle, WA) pp. 426-431.
- (1994) Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94) , pp. 426-431
- Sen, S.¹ Sekaran, M.² Hale, J.³

32
- 0343048713
- Shoham, Y. and Tennenholtz, M., 1993, Co-Learning and the Evolution of Coordinated Multi-Agent Activity.
- (1993) Co-learning and the Evolution of Coordinated Multi-agent Activity
- Shoham, Y.¹ Tennenholtz, M.²

33
- 0042049192
- On-line learning of coordination plans
- University of Massachusetts, Amherst
- Sugawara, T. and Lesser, V., 1993, On-Line Learning of Coordination Plans. Computer Science Technical Report 93-27 (University of Massachusetts, Amherst).
- (1993) Computer Science Technical Report 93-27
- Sugawara, T.¹ Lesser, V.²

34
- 33847202724
- Learning to predict by the methods of temporal differences
- Sutton, R.S., 1988, Learning to predict by the methods of temporal differences. Mach. Learning 3, 9-44.
- (1988) Mach. Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

35
- 85152198941
- Multi-agent reinforcement learning: Independent vs. cooperative agents
- Machine Learning, (University of Massachusetts, Amherst)
- Tan, M., 1993, Multi-agent reinforcement learning: independent vs. cooperative agents, in: Machine Learning, Proceedings of the Tenth International Conference (University of Massachusetts, Amherst), pp. 330-337.
- (1993) Proceedings of the Tenth International Conference , pp. 330-337
- Tan, M.¹

36
- 0001046225
- Practical issues in temporal difference learning
- Tesauro, G.J., 1992, Practical issues in temporal difference learning. Mach. Learning 8, 257-277.
- (1992) Mach. Learning , vol.8 , pp. 257-277
- Tesauro, G.J.¹

37
- 0004162272
- Academic Press, New York
- Tsetlin, M.L., 1973, Automaton Theory and Modeling of Biological Systems (Academic Press, New York).
- (1973) Automaton Theory and Modeling of Biological Systems
- Tsetlin, M.L.¹

38
- 0004049893
- PhD Thesis, University of Cambridge, England
- Watkins, C., 1989, Learning from delayed rewards. PhD Thesis, University of Cambridge, England.
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

39
- 0001473356
- Learning to coordinate actions in multi-agent systems
- Chambery, France
- Wei, G., 1993, Learning to coordinate actions in multi-agent systems, in: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93) (Chambery, France) pp. 311-316.
- (1993) Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93) , pp. 311-316
- Wei, G.¹

40
- 0001202594
- A learning algorithm for continually running fully recurrent neural networks
- Williams, R.J. and Zipser, D. 1989, A learning algorithm for continually running fully recurrent neural networks. Neural Computat. 1, 270-280.
- (1989) Neural Computat. , vol.1 , pp. 270-280
- Williams, R.J.¹ Zipser, D.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.