메뉴 건너뛰기




Volumn 4, Issue , 2011, Pages 1363-1372

The value of information in multi-armed bandits with exponentially distributed rewards

Author keywords

Exponential rewards; Knowledge gradient; Multi armed bandit; Optimal learning

Indexed keywords

BAYESIAN MODEL; EXPONENTIAL DISTRIBUTIONS; EXPONENTIAL REWARDS; GAMMA PRIOR; GENERAL APPROACH; KNOWLEDGE GRADIENT; LEARNING MODELS; LEARNING PROBLEM; MAKING DECISION; MULTI ARMED BANDIT; MULTI-ARMED BANDIT PROBLEM; OPTIMAL LEARNING; VALUE OF INFORMATION;

EID: 79958262308     PISSN: None     EISSN: 18770509     Source Type: Conference Proceeding    
DOI: 10.1016/j.procs.2011.04.147     Document Type: Conference Paper
Times cited : (8)

References (31)
  • 1
    • 0018709825 scopus 로고
    • A dynamic allocation index for the discounted multiarmed bandit problem
    • DOI 10.2307/2335176
    • J. C. Gittins, D. M. Jones, A dynamic allocation index for the discounted multiarmed bandit problem, Biometrika 66 (3) (1979) 561-565. (Pubitemid 10218405)
    • (1979) Biometrika , vol.66 , Issue.3 , pp. 561-565
    • Gittins, J.C.1    Jones, D.M.2
  • 2
    • 0022409707 scopus 로고
    • Optimal designs for clinical trials with dichotomous responses
    • D. Berry, L. Pearson, Optimal designs for clinical trials with dichotomous responses, Statistics in Medicine 4 (4) (1985) 497-508. (Pubitemid 16176681)
    • (1985) Statistics in Medicine , vol.4 , Issue.4 , pp. 497-508
    • Berry, D.A.1    Pearson, L.M.2
  • 3
    • 0002955623 scopus 로고    scopus 로고
    • A dynamic allocation index for the sequential design of experiments
    • J. Gani (Ed)
    • J. C. Gittins, D. M. Jones, A dynamic allocation index for the sequential design of experiments, in: J. Gani (Ed.), Progress in Statistics, 1974, pp. 241-266.
    • Progress in Statistics , vol.1974 , pp. 241-266
    • Gittins, J.C.1    Jones, D.M.2
  • 5
    • 0036334330 scopus 로고    scopus 로고
    • Optimal learning and experimentation in bandit problems
    • M. Brezzi, T. Lai, Optimal learning and experimentation in bandit problems, Journal of Economic Dynamics and Control 27 (1) (2002) 87-108.
    • (2002) Journal of Economic Dynamics and Control , vol.27 , Issue.1 , pp. 87-108
    • Brezzi, M.1    Lai, T.2
  • 6
    • 67650362301 scopus 로고    scopus 로고
    • Some results on the Gittins index for a normal reward process
    • H. Ho, C. Ing, T. Lai (Eds.), Institute of Mathematical Statistics, Beachwood, OH, USA
    • Y. Yao, Some results on the Gittins index for a normal reward process, in: H. Ho, C. Ing, T. Lai (Eds.), Time Series and Related Topics: In Memory of Ching-Zong Wei, Institute of Mathematical Statistics, Beachwood, OH, USA, 2006, pp. 284-294.
    • (2006) Time Series and Related Topics: In Memory of Ching-Zong Wei , pp. 284-294
    • Yao, Y.1
  • 7
    • 67649990621 scopus 로고    scopus 로고
    • Economic analysis of simulation selection options
    • S. Chick, N. Gans, Economic analysis of simulation selection options, Management Science 55 (3) (2009) 421-437.
    • (2009) Management Science , vol.55 , Issue.3 , pp. 421-437
    • Chick, S.1    Gans, N.2
  • 8
    • 0000854435 scopus 로고
    • Adaptive treatment allocation and the multi-armed bandit problem
    • T. Lai, Adaptive treatment allocation and the multi-armed bandit problem, The Annals of Statistics 15 (3) (1987) 1091-1114.
    • (1987) The Annals of Statistics , vol.15 , Issue.3 , pp. 1091-1114
    • Lai, T.1
  • 9
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • DOI 10.1023/A:1013689704352, Computational Learning Theory
    • P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning 47 (2-3) (2002) 235-256. (Pubitemid 34126111)
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 10
    • 21144463800 scopus 로고
    • The learning component of dynamic allocation indices
    • J. Gittins, Y. Wang, The learning component of dynamic allocation indices, The Annals of Statistics 20 (3) (1992) 1625-1636.
    • (1992) The Annals of Statistics , vol.20 , Issue.3 , pp. 1625-1636
    • Gittins, J.1    Wang, Y.2
  • 11
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • T. L. Lai, H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics 6 (1985) 4-22.
    • (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 12
    • 0000616723 scopus 로고
    • Sample mean based index policies with O (log n) regret for the multi-armed bandit problem
    • R. Agrawal, Sample mean based index policies with O (log n) regret for the multi-armed bandit problem, Advances in Applied Probability 27 (4) (1995) 1054-1078.
    • (1995) Advances in Applied Probability , vol.27 , Issue.4 , pp. 1054-1078
    • Agrawal, R.1
  • 13
    • 79953827701 scopus 로고    scopus 로고
    • Distributed learning in multi-armed bandit with multiple players
    • K. Liu, Q. Zhao, Distributed Learning in Multi-Armed Bandit with Multiple Players, IEEE Transactions on Signal Processing 58 (11) (2010) 5667-5681.
    • (2010) IEEE Transactions on Signal Processing , vol.58 , Issue.11 , pp. 5667-5681
    • Liu, K.1    Zhao, Q.2
  • 16
    • 0032640507 scopus 로고    scopus 로고
    • Stalking information: Bayesian inventory management with unobserved lost sales
    • M. A. Lariviere, E. Porteus, Stalking information: Bayesian inventory management with unobserved lost sales, Management Science 45 (3) (1999) 1346-363.
    • (1999) Management Science , vol.45 , Issue.3 , pp. 1346-1363
    • Lariviere, M.A.1    Porteus, E.2
  • 17
    • 77249163740 scopus 로고    scopus 로고
    • Dynamic pricing with a prior on market response
    • V. Farias, B. Van Roy, Dynamic pricing with a prior on market response, Operations Research 58 (1) (2010) 16-29.
    • (2010) Operations Research , vol.58 , Issue.1 , pp. 16-29
    • Farias, V.1    Van Roy, B.2
  • 18
    • 0000511415 scopus 로고
    • Bayesian look ahead one stage sampling allocations for selecting the largest normal mean
    • S. Gupta, K. Miescke, Bayesian look ahead one stage sampling allocations for selecting the largest normal mean, Statistical Papers 35 (1994) 169-177.
    • (1994) Statistical Papers , vol.35 , pp. 169-177
    • Gupta, S.1    Miescke, K.2
  • 19
    • 0030590294 scopus 로고    scopus 로고
    • Bayesian look ahead one-stage sampling allocations for selection of the best population
    • DOI 10.1016/0378-3758(95)00169-7
    • S. Gupta, K. Miescke, Bayesian look ahead one-stage sampling allocations for selection of the best population, Journal of statistical planning and inference 54 (2) (1996) 229-244. (Pubitemid 126161097)
    • (1996) Journal of Statistical Planning and Inference , vol.54 , Issue.2 , pp. 229-244
    • Gupta, S.S.1    Miescke, K.J.2
  • 21
    • 70449498873 scopus 로고    scopus 로고
    • The knowledge-gradient policy for correlated normal rewards
    • P. I. Frazier, W. B. Powell, S. Dayanik, The knowledge-gradient policy for correlated normal rewards, INFORMS J. on Computing 21 (4) (2009) 599-613.
    • (2009) INFORMS J. on Computing , vol.21 , Issue.4 , pp. 599-613
    • Frazier, P.I.1    Powell, W.B.2    Dayanik, S.3
  • 22
    • 77949359798 scopus 로고    scopus 로고
    • Sequential sampling to myopically maximize the expected value of information
    • S. Chick, J. Branke, C. Schmidt, Sequential Sampling to Myopically Maximize the Expected Value of Information, INFORMS J. on Computing 22 (1) (2010) 71-80.
    • (2010) INFORMS J. on Computing , vol.22 , Issue.1 , pp. 71-80
    • Chick, S.1    Branke, J.2    Schmidt, C.3
  • 23
    • 78651309095 scopus 로고    scopus 로고
    • Paradoxes in learning and the marginal value of information
    • P. Frazier, W. Powell, Paradoxes in learning and the marginal value of information, Decision Analysis 7 (4) (2011) 378-403.
    • (2011) Decision Analysis , vol.7 , Issue.4 , pp. 378-403
    • Frazier, P.1    Powell, W.2
  • 25
    • 77951568757 scopus 로고    scopus 로고
    • A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies
    • M. Rosetti, R. Hill, B. Johansson, A. Dunkin, R. Ingalls (Eds
    • I. O. Ryzhov, W. B. Powell, A Monte Carlo Knowledge Gradient Method For Learning Abatement Potential Of Emissions Reduction Technologies, in: M. Rosetti, R. Hill, B. Johansson, A. Dunkin, R. Ingalls (Eds.), Proceedings of the 2009 Winter Simulation Conference, 2009, pp. 1492-1502.
    • (2009) Proceedings of the 2009 Winter Simulation Conference , pp. 1492-1502
    • Ryzhov, I.O.1    Powell, W.B.2
  • 30
    • 77951529657 scopus 로고    scopus 로고
    • The conjunction of the knowledge gradient and the economic approach to simulation selection
    • M. Rosetti, R. Hill, B. Johansson, A. Dunkin, R. Ingalls (Eds)
    • S. E. Chick, P. I. Frazier, The Conjunction Of The Knowledge Gradient And The Economic Approach To Simulation Selection, in: M. Rosetti, R. Hill, B. Johansson, A. Dunkin, R. Ingalls (Eds.), Proceedings of the 2009 Winter Simulation Conference, 2009, pp. 528-539.
    • (2009) Proceedings of the 2009 Winter Simulation Conference , pp. 528-539
    • Chick, S.E.1    Frazier, P.I.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.