SCOPUS 정보 검색 플랫폼

IEEE Transactions on Automatic Control

Volumn 55, Issue 2, 2010, Pages 463-468

Adaptive adversarial multi-armed bandit approach to two-person zero-sum markov games

(4) Chang, Hyeong Soo a Hu, Jiaqiao b Fu, Michael C c Marcus, Steven I d

a Sogang University (South Korea)

b Stony Brook University (United States)

c UNIVERSITY OF MARYLAND (United States)

d UNIVERSITY OF MARYLAND (United States)

Author keywords

Multi armed bandit; Sample average approximation; Sampling; Two person zero sum Markov game (MG)

Indexed keywords

ASYMPTOTIC CONVERGENCE; EQUILIBRIUM VALUE; FINITE HORIZONS; ITERATION BOUND; MARKOV GAMES; MULTI ARMED BANDIT; MULTI-ARMED BANDIT PROBLEM; SAMPLE AVERAGE APPROXIMATION; SAMPLING-BASED ALGORITHMS; STATE SPACE; TECHNICAL NOTES; TIME AND SPACE;

EID: 76949084015 PISSN: 00189286 EISSN: None Source Type: Journal
DOI: 10.1109/TAC.2009.2036333 Document Type: Article

Times cited : (26)

References (13)

1
- 0000611954
- Zero-sum Markov games and worst-cast optimal control of queueing systems
- E. Altman, "Zero-sum Markov games and worst-cast optimal control of queueing systems," Queueing Syst., Theory Appl., vol.21, pp. 415-447, 1995.
- (1995) Queueing Syst., Theory Appl. , vol.21 , pp. 415-447
- Altman, E.¹

2
- 0037709910
- The nonstochastic multiarmed bandit problem
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, "The nonstochastic multiarmed bandit problem," SIAM J. Comput., vol.32, no.1, pp. 48-77, 2002.
- (2002) SIAM J. Comput. , vol.32 , Issue.1 , pp. 48-77
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

3
- 84926078662
- Cambridge, U.K.: Cambridge Univ. Press
- N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games. Cambridge, U.K.: Cambridge Univ. Press, 2006.
- (2006) Prediction Learning Games
- Cesa-Bianchi, N.¹ Lugosi, G.²

4
- 0344395590
- Two-person zero-sum Markov games: Receding horizon approach
- Nov.
- H. S. Chang and S. I. Marcus, "Two-person zero-sum Markov games: Receding horizon approach," IEEE Trans. Autom. Control, vol.48, no.11, pp. 1951-1961, Nov. 2003.
- (2003) IEEE Trans. Autom. Control , vol.48 , Issue.11 , pp. 1951-1961
- Chang, H.S.¹ Marcus, S.I.²

5
- 0003989209
- New York: Springer-Verlag
- J. Filar and K. Vrieze, Competitive Markov Decision Processes. New York: Springer-Verlag, 1996.
- (1996) Competitive Markov Decision Processes
- Filar, J.¹ Vrieze, K.²

6
- 9444295723
- Fast planning in stochastic games
- M. Kearns, Y. Mansour, and S. Singh, "Fast planning in stochastic games," in Proc. 16th Conf. Uncertainty Artif. Intell., 2000, pp. 309-316.
- (2000) Proc. 16th Conf. Uncertainty Artif. Intell. , pp. 309-316
- Kearns, M.¹ Mansour, Y.² Singh, S.³

7
- 0036013019
- The sample average approximation method for stochastic discrete optimization
- A. J. Kleywegt, A. Shapiro, and T. Homem-De-Mello, "The sample average approximation method for stochastic discrete optimization," SIAM J. Optim., vol.12, no.2, pp. 479-502, 2001.
- (2001) SIAM J. Optim. , vol.12 , Issue.2 , pp. 479-502
- Kleywegt, A.J.¹ Shapiro, A.² Homem-De-Mello, T.³

8
- 0000268071
- Learning algorithms for twoperson zero-sum stochastic games with incomplete information
- S. Lakshmivarahan and K. S. Narendra, "Learning algorithms for twoperson zero-sum stochastic games with incomplete information," Math. Oper. Res., vol.6, pp. 379-386, 1981.
- (1981) Math. Oper. Res. , vol.6 , pp. 379-386
- Lakshmivarahan, S.¹ Narendra, K.S.²

9
- 0020159814
- Learning algorithms for twoperson zero-sum stochastic games with incomplete information: A unified approach
- S. Lakshmivarahan and K. S. Narendra, "Learning algorithms for twoperson zero-sum stochastic games with incomplete information: A unified approach," SIAM J. Control Optim., vol.20, pp. 541-552, 1982.
- (1982) SIAM J. Control Optim. , vol.20 , pp. 541-552
- Lakshmivarahan, S.¹ Narendra, K.S.²

10
- 0030212543
- Finite time analysis of the pursuit algorithm for learning automata
- Aug.
- K. Rajaraman and P. S. Sastry, "Finite time analysis of the pursuit algorithm for learning automata," IEEE Trans. Syst., Man, Cybern. B, vol.26, no.4, pp. 590-598, Aug. 1996.
- (1996) IEEE Trans. Syst., Man, Cybern. B , vol.26 , Issue.4 , pp. 590-598
- Rajaraman, K.¹ Sastry, P.S.²

11
- 84966203785
- Some aspects of the sequential design of experiments
- H. Robbins, "Some aspects of the sequential design of experiments," Bull. Amer. Math. Soc., vol.55, pp. 527-535, 1952.
- (1952) Bull. Amer. Math. Soc. , vol.55 , pp. 527-535
- Robbins, H.¹

12
- 0028423534
- Decentralized learning of Nash equilibria in multi-person stochastic games with incomplete information
- May
- P. S. Sastry, V. V. Phansalkar, and M. A. L. Thathachar, "Decentralized learning of Nash equilibria in multi-person stochastic games with incomplete information," IEEE Trans. Syst., Man, Cybern., vol.24, no.5, pp. 769-777, May 1994.
- (1994) IEEE Trans. Syst., Man, Cybern. , vol.24 , Issue.5 , pp. 769-777
- Sastry, P.S.¹ Phansalkar, V.V.² Thathachar, M.A.L.³

13
- 0141824325
- Ph.D. dissertation, Department of Mathematics, Technische Hogeschool Eindhoven, Eindhoven, The Netherlands
- J. Van Der Wal, "Stochastic Dynamic Programming: Successive Approximations and Nearly Optimal Strategies for Markov Decision Processes and Markov Games," Ph.D. dissertation, Department of Mathematics, Technische Hogeschool Eindhoven, Eindhoven, The Netherlands, 1980.
- (1980) Stochastic Dynamic Programming: Successive Approximations and Nearly Optimal Strategies for Markov Decision Processes and Markov Games
- Wal Der J.Van¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.