메뉴 건너뛰기




Volumn 1, Issue January, 2014, Pages 199-207

Stochastic multi-armed-bandit problem with non-stationary rewards

Author keywords

[No Author keywords available]

Indexed keywords

ECONOMIC AND SOCIAL EFFECTS; STOCHASTIC SYSTEMS;

EID: 84937906754     PISSN: 10495258     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (436)

References (32)
  • 1
    • 0001395850 scopus 로고
    • On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
    • W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25:285-294, 1933.
    • (1933) Biometrika , vol.25 , pp. 285-294
    • Thompson, W.R.1
  • 3
    • 34247981226 scopus 로고
    • Play the winner rule and the controlled clinical trials
    • M. Zelen. Play the winner rule and the controlled clinical trials. Journal of the American Statistical Association, 64:131-146, 1969.
    • (1969) Journal of the American Statistical Association , vol.64 , pp. 131-146
    • Zelen, M.1
  • 4
    • 0030352286 scopus 로고    scopus 로고
    • Learning and strategic pricing
    • D. Bergemann and J. Valimaki. Learning and strategic pricing. Econometrica, 64:1125-1149, 1996.
    • (1996) Econometrica , vol.64 , pp. 1125-1149
    • Bergemann, D.1    Valimaki, J.2
  • 5
    • 33744719690 scopus 로고    scopus 로고
    • The financing of innovation: Learning and stopping
    • D. Bergemann and U. Hege. The financing of innovation: Learning and stopping. RAND Journal of Economics, 36(4):719-752, 2005.
    • (2005) RAND Journal of Economics , vol.36 , Issue.4 , pp. 719-752
    • Bergemann, D.1    Hege, U.2
  • 8
    • 33847255926 scopus 로고    scopus 로고
    • Dynamic assortment with demand learning for seasonal consumer goods
    • F. Caro and G. Gallien. Dynamic assortment with demand learning for seasonal consumer goods. Management Science, 53:276-292, 2007.
    • (2007) Management Science , vol.53 , pp. 276-292
    • Caro, F.1    Gallien, G.2
  • 10
    • 85050365667 scopus 로고
    • Bandit problems: Sequential allocation of experiments
    • D. A. Berry and B. Fristedt. Bandit problems: sequential allocation of experiments. Chapman and Hall, 1985.
    • (1985) Chapman and Hall
    • Berry, D.A.1    Fristedt, B.2
  • 13
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
    • (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 15
    • 0000169010 scopus 로고
    • Bandit processes and dynamic allocation indices (with discussion)
    • Series B
    • J. C. Gittins. Bandit processes and dynamic allocation indices (with discussion). Journal of the Royal Statistical Society, Series B, 41:148-177, 1979.
    • (1979) Journal of the Royal Statistical Society , vol.41 , pp. 148-177
    • Gittins, J.C.1
  • 17
    • 0001043843 scopus 로고
    • Restless bandits: Activity allocation in a changing world
    • P. Whittle. Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25A:287-298, 1988.
    • (1988) Journal of Applied Probability , vol.25 A , pp. 287-298
    • Whittle, P.1
  • 19
    • 0343441515 scopus 로고    scopus 로고
    • Restless bandits, linear programming relaxations, and primal dual index heuristic
    • D. Bertsimas and J. Nino-Mora. Restless bandits, linear programming relaxations, and primal dual index heuristic. Operations Research, 48(1):80-90, 2000.
    • (2000) Operations Research , vol.48 , Issue.1 , pp. 80-90
    • Bertsimas, D.1    Nino-Mora, J.2
  • 21
    • 84867856114 scopus 로고    scopus 로고
    • Regret bounds for restless Markov bandits
    • Springer Berlin Heidelberg
    • R. Ortner, D. Ryabko, P. Auer, and R. Munos. Regret bounds for restless markov bandits. In Algorithmic Learning Theory, pages 214-228. Springer Berlin Heidelberg, 2012.
    • (2012) Algorithmic Learning Theory , pp. 214-228
    • Ortner, R.1    Ryabko, D.2    Auer, P.3    Munos, R.4
  • 23
    • 84972545864 scopus 로고
    • An analog of the minimax theorem for vector payoffs
    • D. Blackwell. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 6:1-8, 1956.
    • (1956) Pacific Journal of Mathematics , vol.6 , pp. 1-8
    • Blackwell, D.1
  • 27
    • 80054097465 scopus 로고    scopus 로고
    • On upper-confidence bound policies for switching bandit problems
    • Springer Berlin Heidelberg
    • A. Garivier and E. Moulines. On upper-confidence bound policies for switching bandit problems. In Algorithmic Learning Theory, pages 174-188. Springer Berlin Heidelberg, 2011.
    • (2011) Algorithmic Learning Theory , pp. 174-188
    • Garivier, A.1    Moulines, E.2
  • 29
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235-246, 2002.
    • (2002) Machine Learning , vol.47 , pp. 235-246
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 30
    • 0031211090 scopus 로고    scopus 로고
    • A decision-theoretic generalization of on-line learning and an application to boosting
    • Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci., 55:119-139, 1997.
    • (1997) J. Comput. System Sci. , vol.55 , pp. 119-139
    • Freund, Y.1    Schapire, R.E.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.