SCOPUS 정보 검색 플랫폼

NIPS 2002: Proceedings of the 15th International Conference on Neural Information Processing Systems

Volumn , Issue , 2002, Pages 1571-1578

Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

(2) Wang, Xiaofeng a Sandholm, Tuomas b

a CARNEGIE MELLON UNIVERSITY (United States)

b CARNEGIE MELLON UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATION THEORY; GAME THEORY; MULTI AGENT SYSTEMS;

ADAPTIVE LEARNING; COORDINATION POLICIES; LEARN+; LEARNING TO PLAY; MARKOV GAMES; MULTI AGENT; MULTI-AGENT LEARNING; NASH EQUILIBRIA; OPTIMAL COORDINATION; REINFORCEMENT LEARNINGS;

REINFORCEMENT LEARNING;

EID: 67649405225 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (149)

References (18)

1
- 0002500351
- Planning, learning and coordination in multi-agent decision processes
- C.Boutilier. Planning, learning and coordination in multi-agent decision processes. In TARK, 1996.
- (1996) TARK
- Boutilier, C.¹

2
- 0010247544
- The dynamics of reinforcement learning in cooperative multi-agent systems
- C.Claus and C.Boutilier. The dynamics of reinforcement learning in cooperative multi-agent systems. In AAAI, 1998.
- (1998) AAAI
- Claus, C.¹ Boutilier, C.²

3
- 0004247096
- MIT Press
- D.Fudenberg and D.K.Levine. The theory of learning in games. MIT Press, 1998.
- (1998) The theory of learning in games
- Fudenberg, D.¹ Levine, D.K.²

4
- 0003499462
- John Wiley and Sons, Inc
- D.L.Isaacson and R.W.Madsen. Markov chain: theory and applications. John Wiley and Sons, Inc, 1976.
- (1976) Markov chain: theory and applications
- Isaacson, D.L.¹ Madsen, R.W.²

5
- 0001473356
- Learning to coordinate actions in multi-agent systems
- G.Wei. Learning to coordinate actions in multi-agent systems. In IJCAI, 1993.
- (1993) IJCAI
- Wei, G.¹

6
- 0000929496
- Multiagent reinforcement learning: theoretical framework and an algorithm
- J.Hu and W.P.Wellman. Multiagent reinforcement learning: theoretical framework and an algorithm. In ICML, 1998.
- (1998) ICML
- Hu, J.¹ Wellman, W.P.²

7
- 0002730095
- Learning, mutation, and long run equilibria in games
- M.Kandori, G.J.Mailath, and R.Rob. Learning, mutation, and long run equilibria in games. Econometrica, 61(1):29-56, 1993.
- (1993) Econometrica , vol.61 , Issue.1 , pp. 29-56
- Kandori, M.¹ Mailath, G.J.² Rob, R.³

8
- 0242466944
- Friend-or-Foe Q-learning in general sum game
- M.Littman. Friend-or-Foe Q-learning in general sum game. In ICML, 2001.
- (2001) ICML
- Littman, M.¹

9
- 0001547175
- Value-function reinforcement learning in markov games
- M.L.Littman. Value-function reinforcement learning in markov games. J. of Cognitive System Research, 2:55-66, 2000.
- (2000) J. of Cognitive System Research , vol.2 , pp. 55-66
- Littman, M.L.¹

10
- 0003998452
- John Wiley
- M.L.Purterman. Markov decision processes-discrete stochastic dynamic programming. John Wiley, 1994.
- (1994) Markov decision processes-discrete stochastic dynamic programming
- Purterman, M.L.¹

11
- 85152198941
- Multi-agent reinforcement learning: independent vs. cooperative agents
- M.Tan. Multi-agent reinforcement learning: independent vs. cooperative agents. In ICML, 1993.
- (1993) ICML
- Tan, M.¹

12
- 0003644124
- MIT Press
- R.A.Howard. Dynamic programming and Markov processes. MIT Press, 1960.
- (1960) Dynamic programming and Markov processes
- Howard, R.A.¹

13
- 0001181267
- Spieltheoretische behandlung eines oligopolmodells mit nachfrageträgheit
- R. Selten. Spieltheoretische behandlung eines oligopolmodells mit nachfrageträgheit. Zeitschrift für die gesamte Staatswissenschaft, 12:301-324, 1965.
- (1965) Zeitschrift für die gesamte Staatswissenschaft , vol.12 , pp. 301-324
- Selten, R.¹

14
- 0033901602
- Convergence results for single-step on-policy reinforcement learning algorithms
- S. Singh, T.Jaakkola, M.L.Littman, and C.Szepesvari. Convergence results for single-step on-policy reinforcement learning algorithms. Machine Learning, 2000.
- (2000) Machine Learning
- Singh, S.¹ Jaakkola, T.² Littman, M.L.³ Szepesvari, C.⁴

15
- 0002626229
- Learning to coordinate without sharing information
- S.Sen, M.Sekaran, and J. Hale. Learning to coordinate without sharing information. In AAAI, 1994.
- (1994) AAAI
- Sen, S.¹ Sekaran, M.² Hale, J.³

16
- 0141909347
- Centrum voor Wiskunde en Informatica
- F. Thusijsman. Optimality and equilibrium in stochastic games. Centrum voor Wiskunde en Informatica, 1992.
- (1992) Optimality and equilibrium in stochastic games
- Thusijsman, F.¹

17
- 0030050933
- Learning in the iterated prisoner's dilemma
- T.Sandholm and R.Crites. Learning in the iterated prisoner's dilemma. Biosystems, 37:147-166, 1995.
- (1995) Biosystems , vol.37 , pp. 147-166
- Sandholm, T.¹ Crites, R.²

18
- 0001944917
- The evolution of conventions
- H. Young. The evolution of conventions. Econometrica, 61(1):57-84, 1993.
- (1993) Econometrica , vol.61 , Issue.1 , pp. 57-84
- Young, H.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.