메뉴 건너뛰기




Volumn 55, Issue 2, 2010, Pages 463-468

Adaptive adversarial multi-armed bandit approach to two-person zero-sum markov games

Author keywords

Multi armed bandit; Sample average approximation; Sampling; Two person zero sum Markov game (MG)

Indexed keywords

ASYMPTOTIC CONVERGENCE; EQUILIBRIUM VALUE; FINITE HORIZONS; ITERATION BOUND; MARKOV GAMES; MULTI ARMED BANDIT; MULTI-ARMED BANDIT PROBLEM; SAMPLE AVERAGE APPROXIMATION; SAMPLING-BASED ALGORITHMS; STATE SPACE; TECHNICAL NOTES; TIME AND SPACE;

EID: 76949084015     PISSN: 00189286     EISSN: None     Source Type: Journal    
DOI: 10.1109/TAC.2009.2036333     Document Type: Article
Times cited : (26)

References (13)
  • 1
    • 0000611954 scopus 로고
    • Zero-sum Markov games and worst-cast optimal control of queueing systems
    • E. Altman, "Zero-sum Markov games and worst-cast optimal control of queueing systems," Queueing Syst., Theory Appl., vol.21, pp. 415-447, 1995.
    • (1995) Queueing Syst., Theory Appl. , vol.21 , pp. 415-447
    • Altman, E.1
  • 4
    • 0344395590 scopus 로고    scopus 로고
    • Two-person zero-sum Markov games: Receding horizon approach
    • Nov.
    • H. S. Chang and S. I. Marcus, "Two-person zero-sum Markov games: Receding horizon approach," IEEE Trans. Autom. Control, vol.48, no.11, pp. 1951-1961, Nov. 2003.
    • (2003) IEEE Trans. Autom. Control , vol.48 , Issue.11 , pp. 1951-1961
    • Chang, H.S.1    Marcus, S.I.2
  • 7
    • 0036013019 scopus 로고    scopus 로고
    • The sample average approximation method for stochastic discrete optimization
    • A. J. Kleywegt, A. Shapiro, and T. Homem-De-Mello, "The sample average approximation method for stochastic discrete optimization," SIAM J. Optim., vol.12, no.2, pp. 479-502, 2001.
    • (2001) SIAM J. Optim. , vol.12 , Issue.2 , pp. 479-502
    • Kleywegt, A.J.1    Shapiro, A.2    Homem-De-Mello, T.3
  • 8
    • 0000268071 scopus 로고
    • Learning algorithms for twoperson zero-sum stochastic games with incomplete information
    • S. Lakshmivarahan and K. S. Narendra, "Learning algorithms for twoperson zero-sum stochastic games with incomplete information," Math. Oper. Res., vol.6, pp. 379-386, 1981.
    • (1981) Math. Oper. Res. , vol.6 , pp. 379-386
    • Lakshmivarahan, S.1    Narendra, K.S.2
  • 9
    • 0020159814 scopus 로고
    • Learning algorithms for twoperson zero-sum stochastic games with incomplete information: A unified approach
    • S. Lakshmivarahan and K. S. Narendra, "Learning algorithms for twoperson zero-sum stochastic games with incomplete information: A unified approach," SIAM J. Control Optim., vol.20, pp. 541-552, 1982.
    • (1982) SIAM J. Control Optim. , vol.20 , pp. 541-552
    • Lakshmivarahan, S.1    Narendra, K.S.2
  • 10
    • 0030212543 scopus 로고    scopus 로고
    • Finite time analysis of the pursuit algorithm for learning automata
    • Aug.
    • K. Rajaraman and P. S. Sastry, "Finite time analysis of the pursuit algorithm for learning automata," IEEE Trans. Syst., Man, Cybern. B, vol.26, no.4, pp. 590-598, Aug. 1996.
    • (1996) IEEE Trans. Syst., Man, Cybern. B , vol.26 , Issue.4 , pp. 590-598
    • Rajaraman, K.1    Sastry, P.S.2
  • 11
    • 84966203785 scopus 로고
    • Some aspects of the sequential design of experiments
    • H. Robbins, "Some aspects of the sequential design of experiments," Bull. Amer. Math. Soc., vol.55, pp. 527-535, 1952.
    • (1952) Bull. Amer. Math. Soc. , vol.55 , pp. 527-535
    • Robbins, H.1
  • 12
    • 0028423534 scopus 로고
    • Decentralized learning of Nash equilibria in multi-person stochastic games with incomplete information
    • May
    • P. S. Sastry, V. V. Phansalkar, and M. A. L. Thathachar, "Decentralized learning of Nash equilibria in multi-person stochastic games with incomplete information," IEEE Trans. Syst., Man, Cybern., vol.24, no.5, pp. 769-777, May 1994.
    • (1994) IEEE Trans. Syst., Man, Cybern. , vol.24 , Issue.5 , pp. 769-777
    • Sastry, P.S.1    Phansalkar, V.V.2    Thathachar, M.A.L.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.