메뉴 건너뛰기




Volumn 37, Issue 1-2, 1996, Pages 147-166

Multiagent reinforcement learning in the Iterated Prisoner's Dilemma

Author keywords

Exploration; Machine learning; Multiagent learning; Prisoner's Dilemma; Recurrent neural network; Reinforcement learning

Indexed keywords

ALGORITHM; ARTICLE; COMPUTER PROGRAM; GAME; HUMAN; PRISON; PRISONER; REINFORCEMENT;

EID: 0030050933     PISSN: 03032647     EISSN: None     Source Type: Journal    
DOI: 10.1016/0303-2647(95)01551-5     Document Type: Article
Times cited : (274)

References (40)
  • 3
    • 0020970738 scopus 로고
    • Neuron-like adaptive elements that can solve difficult learning control problems
    • Barto, A.G., Sutton, R. and Anderson, C.W., 1983, Neuron-like adaptive elements that can solve difficult learning control problems. IEEE Trans. Sys., Man Cybern. 13, 834-846.
    • (1983) IEEE Trans. Sys., Man Cybern. , vol.13 , pp. 834-846
    • Barto, A.G.1    Sutton, R.2    Anderson, C.W.3
  • 4
    • 0010367132 scopus 로고
    • From chemotaxis to cooperativity: Abstracted exercises in neuronal learning strategies
    • R. Durbin, C. Miall and G. Mitchison (eds.) (Addison-Wesley)
    • Barto, A., 1989, From chemotaxis to cooperativity: Abstracted exercises in neuronal learning strategies, in: The Computing Neuron, R. Durbin, C. Miall and G. Mitchison (eds.) (Addison-Wesley) pp. 73-98.
    • (1989) The Computing Neuron , pp. 73-98
    • Barto, A.1
  • 5
    • 0029210635 scopus 로고
    • Learning to act using real-time dynamic programming
    • Barto, A.G., Bradtke, S.J. and Singh, S.P., 1995, Learning to act using real-time dynamic programming. Artif. Intell. 72, 81-138.
    • (1995) Artif. Intell. , vol.72 , pp. 81-138
    • Barto, A.G.1    Bradtke, S.J.2    Singh, S.P.3
  • 6
    • 84927461265 scopus 로고
    • Pattern recognizing stochastic learning automata
    • Barto, A.G. and Anandan, P., 1985, Pattern recognizing stochastic learning automata. IEEE Trans. Sys., Man Cybern. 15, 360-375.
    • (1985) IEEE Trans. Sys., Man Cybern. , vol.15 , pp. 360-375
    • Barto, A.G.1    Anandan, P.2
  • 7
    • 0022213383 scopus 로고
    • Learning by statistical cooperation of self-interested neuron-like adaptive elements
    • Barto, A.G., 1985, Learning by statistical cooperation of self-interested neuron-like adaptive elements. Hum. Neurobiol. 4, 229-256.
    • (1985) Hum. Neurobiol. , vol.4 , pp. 229-256
    • Barto, A.G.1
  • 10
    • 85194555832 scopus 로고
    • PhD dissertation proposal. Computer Science Department, University of Massachusetts at Amherst
    • Crites, R., 1994, Multi-Agent Reinforcement Learning. PhD dissertation proposal. Computer Science Department, University of Massachusetts at Amherst.
    • (1994) Multi-agent Reinforcement Learning
    • Crites, R.1
  • 11
    • 26444565569 scopus 로고
    • Finding structure in time
    • Elman, J., 1990, Finding structure in time. Cognit. Sci. 14, 179-211.
    • (1990) Cognit. Sci. , vol.14 , pp. 179-211
    • Elman, J.1
  • 18
    • 85149834820 scopus 로고
    • Markov games as a framework for multi-agent reinforcement learning
    • Machine Learning, (Rutgers University, NJ)
    • Littman, M., 1993, Markov games as a framework for multi-agent reinforcement learning, in: Machine Learning, Proceedings of the Eleventh International Conference (Rutgers University, NJ) pp. 157-163.
    • (1993) Proceedings of the Eleventh International Conference , pp. 157-163
    • Littman, M.1
  • 19
    • 0343048727 scopus 로고
    • A distributed reinforcement learning scheme for network routing
    • Carnegie Mellon University
    • Littman, M. and Boyan, J., 1993, A Distributed Reinforcement Learning Scheme for Network Routing. Technical Report CMU-CS-93-165, Carnegie Mellon University.
    • (1993) Technical Report CMU-CS-93-165
    • Littman, M.1    Boyan, J.2
  • 20
    • 0004145762 scopus 로고
    • Reprint: 1989 (Dover Publications, New York)
    • Luce, D. and Raiffa, H., 1957, Games and Decisions. Reprint: 1989 (Dover Publications, New York).
    • (1957) Games and Decisions
    • Luce, D.1    Raiffa, H.2
  • 21
    • 0343920388 scopus 로고
    • Efficient learning of multiple degree-of-freedom control problems with quasi-independent Q-agents
    • Erlbaum Associates, Hillsdale, NJ
    • Markey, K.L., 1993, Efficient learning of multiple degree-of-freedom control problems with quasi-independent Q-agents, in: Proceedings of the 1993 Connectionist Models Summer School (Erlbaum Associates, Hillsdale, NJ).
    • (1993) Proceedings of the 1993 Connectionist Models Summer School
    • Markey, K.L.1
  • 23
    • 0027336968 scopus 로고
    • A strategy of win-stay, lose-shift that outperforms Tit-For-Tat in the Prisoner's Dilemma game
    • Nowak, M. and Sigmund, K., 1993, A strategy of win-stay, lose-shift that outperforms Tit-For-Tat in the Prisoner's Dilemma game. Nature 364, 56-58.
    • (1993) Nature , vol.364 , pp. 56-58
    • Nowak, M.1    Sigmund, K.2
  • 25
    • 0001201756 scopus 로고
    • Some studies in machine learning using the game of checkers
    • Reprinted: 1963, in: Computers and Thought, E.A. Feigenbaum and J. Feldman (eds.) (McGraw-Hill, New York)
    • Samuel, A.L., 1959, Some studies in machine learning using the game of checkers, IBM Journal on Research and Development, Reprinted: 1963, in: Computers and Thought, E.A. Feigenbaum and J. Feldman (eds.) (McGraw-Hill, New York) pp. 210-229
    • (1959) IBM Journal on Research and Development , pp. 210-229
    • Samuel, A.L.1
  • 30
    • 85194558779 scopus 로고
    • Learning pursuit strategies
    • Computer Science Department, University of Massachusetts at Amherst, Spring 1993
    • Sandholm, T. and Nagendraprasad, M., 1993, Learning Pursuit Strategies. Class project for CmpSci 689 Machine Learning. Computer Science Department, University of Massachusetts at Amherst, Spring 1993.
    • (1993) Class Project for CmpSci 689 Machine Learning
    • Sandholm, T.1    Nagendraprasad, M.2
  • 33
    • 0042049192 scopus 로고
    • On-line learning of coordination plans
    • University of Massachusetts, Amherst
    • Sugawara, T. and Lesser, V., 1993, On-Line Learning of Coordination Plans. Computer Science Technical Report 93-27 (University of Massachusetts, Amherst).
    • (1993) Computer Science Technical Report 93-27
    • Sugawara, T.1    Lesser, V.2
  • 34
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton, R.S., 1988, Learning to predict by the methods of temporal differences. Mach. Learning 3, 9-44.
    • (1988) Mach. Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 35
    • 85152198941 scopus 로고
    • Multi-agent reinforcement learning: Independent vs. cooperative agents
    • Machine Learning, (University of Massachusetts, Amherst)
    • Tan, M., 1993, Multi-agent reinforcement learning: independent vs. cooperative agents, in: Machine Learning, Proceedings of the Tenth International Conference (University of Massachusetts, Amherst), pp. 330-337.
    • (1993) Proceedings of the Tenth International Conference , pp. 330-337
    • Tan, M.1
  • 36
    • 0001046225 scopus 로고
    • Practical issues in temporal difference learning
    • Tesauro, G.J., 1992, Practical issues in temporal difference learning. Mach. Learning 8, 257-277.
    • (1992) Mach. Learning , vol.8 , pp. 257-277
    • Tesauro, G.J.1
  • 40
    • 0001202594 scopus 로고
    • A learning algorithm for continually running fully recurrent neural networks
    • Williams, R.J. and Zipser, D. 1989, A learning algorithm for continually running fully recurrent neural networks. Neural Computat. 1, 270-280.
    • (1989) Neural Computat. , vol.1 , pp. 270-280
    • Williams, R.J.1    Zipser, D.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.