SCOPUS 정보 검색 플랫폼

Procedia Computer Science

Volumn 4, Issue , 2011, Pages 1363-1372

The value of information in multi-armed bandits with exponentially distributed rewards

(2) Ryzhov, Ilya O a Powell, Warren B a

a Princeton University (United States)

Author keywords

Exponential rewards; Knowledge gradient; Multi armed bandit; Optimal learning

Indexed keywords

BAYESIAN MODEL; EXPONENTIAL DISTRIBUTIONS; EXPONENTIAL REWARDS; GAMMA PRIOR; GENERAL APPROACH; KNOWLEDGE GRADIENT; LEARNING MODELS; LEARNING PROBLEM; MAKING DECISION; MULTI ARMED BANDIT; MULTI-ARMED BANDIT PROBLEM; OPTIMAL LEARNING; VALUE OF INFORMATION;

BAYESIAN NETWORKS;

OPTIMIZATION;

EID: 79958262308 PISSN: None EISSN: 18770509 Source Type: Conference Proceeding
DOI: 10.1016/j.procs.2011.04.147 Document Type: Conference Paper

Times cited : (9)

References (31)

1
- 0018709825
- A dynamic allocation index for the discounted multiarmed bandit problem
- DOI 10.2307/2335176
- J. C. Gittins, D. M. Jones, A dynamic allocation index for the discounted multiarmed bandit problem, Biometrika 66 (3) (1979) 561-565. (Pubitemid 10218405)
- (1979) Biometrika , vol.66 , Issue.3 , pp. 561-565
- Gittins, J.C.¹ Jones, D.M.²

2
- 0022409707
- Optimal designs for clinical trials with dichotomous responses
- D. Berry, L. Pearson, Optimal designs for clinical trials with dichotomous responses, Statistics in Medicine 4 (4) (1985) 497-508. (Pubitemid 16176681)
- (1985) Statistics in Medicine , vol.4 , Issue.4 , pp. 497-508
- Berry, D.A.¹ Pearson, L.M.²

3
- 0002955623
- A dynamic allocation index for the sequential design of experiments
- J. Gani (Ed)
- J. C. Gittins, D. M. Jones, A dynamic allocation index for the sequential design of experiments, in: J. Gani (Ed.), Progress in Statistics, 1974, pp. 241-266.
- Progress in Statistics , vol.1974 , pp. 241-266
- Gittins, J.C.¹ Jones, D.M.²

4
- 84891584370
- John Wiley and Sons, New York
- J. Gittins, Multi-Armed Bandit Allocation Indices, John Wiley and Sons, New York, 1989.
- (1989) Multi-Armed Bandit Allocation Indices
- Gittins, J.¹

5
- 0036334330
- Optimal learning and experimentation in bandit problems
- M. Brezzi, T. Lai, Optimal learning and experimentation in bandit problems, Journal of Economic Dynamics and Control 27 (1) (2002) 87-108.
- (2002) Journal of Economic Dynamics and Control , vol.27 , Issue.1 , pp. 87-108
- Brezzi, M.¹ Lai, T.²

6
- 67650362301
- Some results on the Gittins index for a normal reward process
- H. Ho, C. Ing, T. Lai (Eds.), Institute of Mathematical Statistics, Beachwood, OH, USA
- Y. Yao, Some results on the Gittins index for a normal reward process, in: H. Ho, C. Ing, T. Lai (Eds.), Time Series and Related Topics: In Memory of Ching-Zong Wei, Institute of Mathematical Statistics, Beachwood, OH, USA, 2006, pp. 284-294.
- (2006) Time Series and Related Topics: In Memory of Ching-Zong Wei , pp. 284-294
- Yao, Y.¹

7
- 67649990621
- Economic analysis of simulation selection options
- S. Chick, N. Gans, Economic analysis of simulation selection options, Management Science 55 (3) (2009) 421-437.
- (2009) Management Science , vol.55 , Issue.3 , pp. 421-437
- Chick, S.¹ Gans, N.²

8
- 0000854435
- Adaptive treatment allocation and the multi-armed bandit problem
- T. Lai, Adaptive treatment allocation and the multi-armed bandit problem, The Annals of Statistics 15 (3) (1987) 1091-1114.
- (1987) The Annals of Statistics , vol.15 , Issue.3 , pp. 1091-1114
- Lai, T.¹

9
- 0036568025
- Finite-time analysis of the multiarmed bandit problem
- DOI 10.1023/A:1013689704352, Computational Learning Theory
- P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning 47 (2-3) (2002) 235-256. (Pubitemid 34126111)
- (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

10
- 21144463800
- The learning component of dynamic allocation indices
- J. Gittins, Y. Wang, The learning component of dynamic allocation indices, The Annals of Statistics 20 (3) (1992) 1625-1636.
- (1992) The Annals of Statistics , vol.20 , Issue.3 , pp. 1625-1636
- Gittins, J.¹ Wang, Y.²

11
- 0002899547
- Asymptotically efficient adaptive allocation rules
- T. L. Lai, H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics 6 (1985) 4-22.
- (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

12
- 0000616723
- Sample mean based index policies with O (log n) regret for the multi-armed bandit problem
- R. Agrawal, Sample mean based index policies with O (log n) regret for the multi-armed bandit problem, Advances in Applied Probability 27 (4) (1995) 1054-1078.
- (1995) Advances in Applied Probability , vol.27 , Issue.4 , pp. 1054-1078
- Agrawal, R.¹

13
- 79953827701
- Distributed learning in multi-armed bandit with multiple players
- K. Liu, Q. Zhao, Distributed Learning in Multi-Armed Bandit with Multiple Players, IEEE Transactions on Signal Processing 58 (11) (2010) 5667-5681.
- (2010) IEEE Transactions on Signal Processing , vol.58 , Issue.11 , pp. 5667-5681
- Liu, K.¹ Zhao, Q.²

14
- 0004007508
- The MIT Press, Cambridge, Massachusetts
- R. Sutton, A. Barto, Reinforcement Learning, The MIT Press, Cambridge, Massachusetts, 1998.
- (1998) Reinforcement Learning
- Sutton, R.¹ Barto, A.²

15
- 47349092417
- John Wiley and Sons, New York
- W. B. Powell, Approximate Dynamic Programming: Solving the curses of dimensionality, John Wiley and Sons, New York, 2007.
- (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality
- Powell, W.B.¹

16
- 0032640507
- Stalking information: Bayesian inventory management with unobserved lost sales
- M. A. Lariviere, E. Porteus, Stalking information: Bayesian inventory management with unobserved lost sales, Management Science 45 (3) (1999) 1346-363.
- (1999) Management Science , vol.45 , Issue.3 , pp. 1346-1363
- Lariviere, M.A.¹ Porteus, E.²

17
- 77249163740
- Dynamic pricing with a prior on market response
- V. Farias, B. Van Roy, Dynamic pricing with a prior on market response, Operations Research 58 (1) (2010) 16-29.
- (2010) Operations Research , vol.58 , Issue.1 , pp. 16-29
- Farias, V.¹ Van Roy, B.²

18
- 0000511415
- Bayesian look ahead one stage sampling allocations for selecting the largest normal mean
- S. Gupta, K. Miescke, Bayesian look ahead one stage sampling allocations for selecting the largest normal mean, Statistical Papers 35 (1994) 169-177.
- (1994) Statistical Papers , vol.35 , pp. 169-177
- Gupta, S.¹ Miescke, K.²

19
- 0030590294
- Bayesian look ahead one-stage sampling allocations for selection of the best population
- DOI 10.1016/0378-3758(95)00169-7
- S. Gupta, K. Miescke, Bayesian look ahead one-stage sampling allocations for selection of the best population, Journal of statistical planning and inference 54 (2) (1996) 229-244. (Pubitemid 126161097)
- (1996) Journal of Statistical Planning and Inference , vol.54 , Issue.2 , pp. 229-244
- Gupta, S.S.¹ Miescke, K.J.²

20
- 55549135706
- A knowledge gradient policy for sequential information collection
- P. I. Frazier, W. B. Powell, S. Dayanik, A knowledge gradient policy for sequential information collection, SIAM Journal on Control and Optimization 47 (5) (2008) 2410-2439.
- (2008) SIAM Journal on Control and Optimization , vol.47 , Issue.5 , pp. 2410-2439
- Frazier, P.I.¹ Powell, W.B.² Dayanik, S.³

21
- 70449498873
- The knowledge-gradient policy for correlated normal rewards
- P. I. Frazier, W. B. Powell, S. Dayanik, The knowledge-gradient policy for correlated normal rewards, INFORMS J. on Computing 21 (4) (2009) 599-613.
- (2009) INFORMS J. on Computing , vol.21 , Issue.4 , pp. 599-613
- Frazier, P.I.¹ Powell, W.B.² Dayanik, S.³

22
- 77949359798
- Sequential sampling to myopically maximize the expected value of information
- S. Chick, J. Branke, C. Schmidt, Sequential Sampling to Myopically Maximize the Expected Value of Information, INFORMS J. on Computing 22 (1) (2010) 71-80.
- (2010) INFORMS J. on Computing , vol.22 , Issue.1 , pp. 71-80
- Chick, S.¹ Branke, J.² Schmidt, C.³

23
- 78651309095
- Paradoxes in learning and the marginal value of information
- P. Frazier, W. Powell, Paradoxes in learning and the marginal value of information, Decision Analysis 7 (4) (2011) 378-403.
- (2011) Decision Analysis , vol.7 , Issue.4 , pp. 378-403
- Frazier, P.¹ Powell, W.²

24
- 67650505320
- The knowledge gradient algorithm for online subset selection
- Nashville, TN
- I. O. Ryzhov, W. B. Powell, The knowledge gradient algorithm for online subset selection, in: Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Nashville, TN, 2009, pp. 137-144.
- (2009) Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning , pp. 137-144
- Ryzhov, I.O.¹ Powell, W.B.²

25
- 77951568757
- A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies
- M. Rosetti, R. Hill, B. Johansson, A. Dunkin, R. Ingalls (Eds
- I. O. Ryzhov, W. B. Powell, A Monte Carlo Knowledge Gradient Method For Learning Abatement Potential Of Emissions Reduction Technologies, in: M. Rosetti, R. Hill, B. Johansson, A. Dunkin, R. Ingalls (Eds.), Proceedings of the 2009 Winter Simulation Conference, 2009, pp. 1492-1502.
- (2009) Proceedings of the 2009 Winter Simulation Conference , pp. 1492-1502
- Ryzhov, I.O.¹ Powell, W.B.²

26
- 67650386778
- Submitted for publication
- I. O. Ryzhov, W. B. Powell, P. I. Frazier, The knowledge gradient algorithm for a general class of online learning problems, Submitted for publication.
- The Knowledge Gradient Algorithm for A General Class of Online Learning Problems
- Ryzhov, I.O.¹ Powell, W.B.² Frazier, P.I.³

27
- 78650269856
- On the robustness of a one-period look-ahead policy in multi-armed bandit problems
- I. O. Ryzhov, P. I. Frazier, W. B. Powell, On the robustness of a one-period look-ahead policy in multi-armed bandit problems, in: Proceedings of the 2010 International Conference on Computational Science, 2010, pp. 1629-1638.
- (2010) Proceedings of the 2010 International Conference on Computational Science , pp. 1629-1638
- Ryzhov, I.O.¹ Frazier, P.I.² Powell, W.B.³

28
- 0003759417
- John Wiley and Sons
- M. H. DeGroot, Optimal Statistical Decisions, John Wiley and Sons, 1970.
- (1970) Optimal Statistical Decisions
- Degroot, M.H.¹

29
- 33644648900
- On the Bayesian steady forecasting model
- P. Key, E. Godolphin, On the Bayesian steady forecasting model, Journal of the Royal Statistical Society B43 (1) (1981) 92-96.
- (1981) Journal of the Royal Statistical Society , vol.B43 , Issue.1 , pp. 92-96
- Key, P.¹ Godolphin, E.²

30
- 77951529657
- The conjunction of the knowledge gradient and the economic approach to simulation selection
- M. Rosetti, R. Hill, B. Johansson, A. Dunkin, R. Ingalls (Eds)
- S. E. Chick, P. I. Frazier, The Conjunction Of The Knowledge Gradient And The Economic Approach To Simulation Selection, in: M. Rosetti, R. Hill, B. Johansson, A. Dunkin, R. Ingalls (Eds.), Proceedings of the 2009 Winter Simulation Conference, 2009, pp. 528-539.
- (2009) Proceedings of the 2009 Winter Simulation Conference , pp. 528-539
- Chick, S.E.¹ Frazier, P.I.²

31
- 79952942276
- Information collection on a graph
- to appear
- I. O. Ryzhov, W. B. Powell, Information collection on a graph, Operations Research (to appear).
- Operations Research
- Ryzhov, I.O.¹ Powell, W.B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.