SCOPUS 정보 검색 플랫폼

Applied Stochastic Models in Business and Industry

Volumn 26, Issue 6, 2010, Pages 639-658

A modern Bayesian look at the multi-armed bandit

(1) Scott, Steven L a

a GOOGLE INC (United States)

Author keywords

Bayesian adaptive design; exploration vs exploitation; probability matching; sequential design

Indexed keywords

ADAPTIVE DESIGNS; BAYESIAN; BAYESIAN COMPUTATION; BAYESIAN POSTERIOR PROBABILITIES; EXPERIMENTAL DESIGN; EXPLORATION VS EXPLOITATION; MULTI ARMED BANDIT; PROBABILITY MATCHING; SEQUENTIAL DESIGN; UNKNOWN PARAMETERS;

BAYESIAN NETWORKS; DESIGN; OPTIMIZATION;

PROBABILITY DISTRIBUTIONS;

EID: 78650505735 PISSN: 15241904 EISSN: 15264025 Source Type: Journal
DOI: 10.1002/asmb.874 Document Type: Article

Times cited : (370)

References (27)

1
- 78650472752
- Available from
- Google. Available from:, 2010.
- (2010) Google

2
- 78650477203
- Discussion of 'bandit processes and dynamic allocation indices'
- Whittle P,. Discussion of 'bandit processes and dynamic allocation indices'. Journal of the Royal Statistical Society, Series B: Methodological 1979; 41: 165.
- (1979) Journal of the Royal Statistical Society, Series B: Methodological , vol.41 , pp. 165
- Whittle, P.¹

3
- 0000169010
- Bandit processes and dynamic allocation indices
- Gittins JC,. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society, Series B: Methodological 1979 41: 148-177.
- (1979) Journal of the Royal Statistical Society, Series B: Methodological , vol.41 , pp. 148-177
- Gittins, J.C.¹

4
- 0004870746
- A problem in the sequential design of experiments
- Bellman RE,. A problem in the sequential design of experiments. Sankhya Series A 1956; 30: 221-252.
- (1956) Sankhya Series A , vol.30 , pp. 221-252
- Bellman, R.E.¹

5
- 0009943101
- Incomplete leraning from endogenous data in dynamic allocation
- Brezzi M, Lai TL,. Incomplete leraning from endogenous data in dynamic allocation. Econometrica 2000; 68 (6): 1511-1516.
- (2000) Econometrica , vol.68 , Issue.6 , pp. 1511-1516
- Brezzi, M.¹ Lai, T.L.²

6
- 0001395850
- On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
- Thompson WR,. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 1933; 25: 285-294.
- (1933) Biometrika , vol.25 , pp. 285-294
- Thompson, W.R.¹

7
- 78650475463
- On the theory of apportionment
- Thompson WR,. On the theory of apportionment. American Journal of Mathematics 1935; 57 (2): 450-456.
- (1935) American Journal of Mathematics , vol.57 , Issue.2 , pp. 450-456
- Thompson, W.R.¹

8
- 85050965446
- Chapman & Hall, CRC: London, Boca Raton.
- Cox DR, Reid N,. The Theory of the Design of Experiments. Chapman & Hall, CRC: London, Boca Raton, 2000.
- (2000) The Theory of the Design of Experiments
- Cox, D.R.¹ Reid, N.²

9
- 84972528615
- Bayesian experimental design: A review
- Chaloner K, Verdinelli I,. Bayesian experimental design: a review. Statistical Science 1995; 10: 273-304.
- (1995) Statistical Science , vol.10 , pp. 273-304
- Chaloner, K.¹ Verdinelli, I.²

10
- 0000576595
- Markov chains for exploring posterior distributions (disc: P1728-1762)
- Tierney L,. Markov chains for exploring posterior distributions (disc: P1728-1762). The Annals of Statistics 1994; 22: 1701-1728.
- (1994) The Annals of Statistics , vol.22 , pp. 1701-1728
- Tierney, L.¹

11
- 0003665481
- Springer: Berlin.
- Doucet A, De Frietas N, Gordon N,. Sequential Monte Carlo in Practice. Springer: Berlin, 2001.
- (2001) Sequential Monte Carlo in Practice
- Doucet, A.¹ De Frietas, N.² Gordon, N.³

12
- 47349092417
- Wiley: New York.
- Powell WB,. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley: New York, 2007.
- (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality
- Powell, W.B.¹

13
- 0004102479
- MIT Press: Cambridge, MA.
- Sutton RS, Barto AG,. Reinforcement Learning: An Introduction. MIT Press: Cambridge, MA, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

14
- 0036334330
- Optimal learning and experimentation in bandit problems
- Brezzi M, Lai TL,. Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control 2002; 27: 87-108.
- (2002) Journal of Economic Dynamics and Control , vol.27 , pp. 87-108
- Brezzi, M.¹ Lai, T.L.²

15
- 21144463800
- The learning component of dynamic allocation indices
- Gittins J, Wang Y-G,. The learning component of dynamic allocation indices. The Annals of Statistics 1992; 20: 1625-1636.
- (1992) The Annals of Statistics , vol.20 , pp. 1625-1636
- Gittins, J.¹ Wang, Y.-G.²

16
- 0004181906
- Chapman & Hall: London.
- Berry DA, Fristedt B,. Bandit Problems: Sequential Allocation of Experiments. Chapman & Hall: London, 1985.
- (1985) Bandit Problems: Sequential Allocation of Experiments
- Berry, D.A.¹ Fristedt, B.²

17
- 0002899547
- Asymptotically efficient adaptive allocation rules
- Lai T-L, Robbins H,. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 1985; 6: 4-22.
- (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
- Lai, T.-L.¹ Robbins, H.²

18
- 0000854435
- Adaptive treatment allocation and the multi-armed bandit problem
- Lai T-L,. Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics 1987; 15 (3): 1091-1114.
- (1987) The Annals of Statistics , vol.15 , Issue.3 , pp. 1091-1114
- Lai, T.-L.¹

19
- 0000616723
- Sample mean based index policies with o(logn) regret for the multi-armed bandit problem
- Agrawal R,. Sample mean based index policies with o(logn) regret for the multi-armed bandit problem. Advances in Applied Probability 1995; 27: 1054-1078.
- (1995) Advances in Applied Probability , vol.27 , pp. 1054-1078
- Agrawal, R.¹

20
- 0036568025
- Finite-time analysis of the multiarmed bandit problem
- Auer P, Cesa-Bianchi N, Fischer P,. Finite-time analysis of the multiarmed bandit problem. Machine Learning 2002; 47: 235-256.
- (2002) Machine Learning , vol.47 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

21
- 0012794373
- A sequential decision procedure with a finite memory
- Robbins H,. A sequential decision procedure with a finite memory. Proceedings of the National Academy of Science 1956; 42: 920-923.
- (1956) Proceedings of the National Academy of Science , vol.42 , pp. 920-923
- Robbins, H.¹

22
- 0004005973
- Wiley: New York.
- Luce D,. Individual Choice Behavior. Wiley: New York, 1959.
- (1959) Individual Choice Behavior
- Luce, D.¹

23
- 0036108219
- Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates
- Yang Y, Zhu D,. Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. The Annals of Statistics 2002; 30: 100-121.
- (2002) The Annals of Statistics , vol.30 , pp. 100-121
- Yang, Y.¹ Zhu, D.²

24
- 84916537550
- Bayesian analysis of binary and polychotomous response data
- Albert JH, Chib S,. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 1993; 88: 669-679.
- (1993) Journal of the American Statistical Association , vol.88 , pp. 669-679
- Albert, J.H.¹ Chib, S.²

25
- 0001043843
- Restless bandits: Activity allocation in a changing world
- Whittle P,. Restless bandits: activity allocation in a changing world. Journal of Applied Probability 1988; 25A: 287-298.
- (1988) Journal of Applied Probability , vol.25 A , pp. 287-298
- Whittle, P.¹

26
- 0003751465
- Springer: Berlin.
- West M, Harrison J,. Bayesian Forecasting and Dynamic Models. Springer: Berlin, 1997.
- (1997) Bayesian Forecasting and Dynamic Models
- West, M.¹ Harrison, J.²

27
- 0000595228
- Arm-acquiring bandits
- Whittle P,. Arm-acquiring bandits. The Annals of Probability 1981; 9 (2): 284-292.
- (1981) The Annals of Probability , vol.9 , Issue.2 , pp. 284-292
- Whittle, P.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.