SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems

Volumn 1, Issue January, 2014, Pages 199-207

Stochastic multi-armed-bandit problem with non-stationary rewards

(3) Besbes, Omar a Gur, Yonatan b Zeevi, Assaf a

a Howard Hughes Medical Institute (United States)

b STANFORD UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ECONOMIC AND SOCIAL EFFECTS; STOCHASTIC SYSTEMS;

DIRECT LINKS; MATHEMATICAL TRACTABILITY; MULTI ARMED BANDIT; MULTI-ARMED BANDIT PROBLEM; NONSTATIONARY; TEMPORAL UNCERTAINTY; TRADE OFF;

INFORMATION SCIENCE;

EID: 84937906754 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (436)

References (32)

1
- 0001395850
- On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
- W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25:285-294, 1933.
- (1933) Biometrika , vol.25 , pp. 285-294
- Thompson, W.R.¹

2
- 84966203785
- Some aspects of the sequential design of experiments
- H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 55:527-535, 1952.
- (1952) Bulletin of the American Mathematical Society , vol.55 , pp. 527-535
- Robbins, H.¹

3
- 34247981226
- Play the winner rule and the controlled clinical trials
- M. Zelen. Play the winner rule and the controlled clinical trials. Journal of the American Statistical Association, 64:131-146, 1969.
- (1969) Journal of the American Statistical Association , vol.64 , pp. 131-146
- Zelen, M.¹

4
- 0030352286
- Learning and strategic pricing
- D. Bergemann and J. Valimaki. Learning and strategic pricing. Econometrica, 64:1125-1149, 1996.
- (1996) Econometrica , vol.64 , pp. 1125-1149
- Bergemann, D.¹ Valimaki, J.²

5
- 33744719690
- The financing of innovation: Learning and stopping
- D. Bergemann and U. Hege. The financing of innovation: Learning and stopping. RAND Journal of Economics, 36(4):719-752, 2005.
- (2005) RAND Journal of Economics , vol.36 , Issue.4 , pp. 719-752
- Bergemann, D.¹ Hege, U.²

6
- 4544345025
- Addaptive routing with end-to-end feedback: Distributed learning and geometric approaches
- B. Awerbuch and R. D. Kleinberg. Addaptive routing with end-to-end feedback: distributed learning and geometric approaches. In Proceedings of the 36th ACM Symposiuim on Theory of Computing (STOC), pages 45-53, 2004.
- (2004) Proceedings of the 36th ACM Symposiuim on Theory of Computing (STOC) , pp. 45-53
- Awerbuch, B.¹ Kleinberg, R.D.²

7
- 0345412655
- The value of knowing a demand curve: Bounds on regret for online posted-price auctions
- R. D. Kleinberg and T. Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 594-605, 2003.
- (2003) Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS) , pp. 594-605
- Kleinberg, R.D.¹ Leighton, T.²

8
- 33847255926
- Dynamic assortment with demand learning for seasonal consumer goods
- F. Caro and G. Gallien. Dynamic assortment with demand learning for seasonal consumer goods. Management Science, 53:276-292, 2007.
- (2007) Management Science , vol.53 , pp. 276-292
- Caro, F.¹ Gallien, G.²

9
- 70049106076
- Bandits for taxonomies: A model-based approach
- S. Pandey, D. Agarwal, D. Charkrabarti, and V. Josifovski. Bandits for taxonomies: A model-based approach. In SIAM International Conference on Data Mining, 2007.
- (2007) SIAM International Conference on Data Mining
- Pandey, S.¹ Agarwal, D.² Charkrabarti, D.³ Josifovski, V.⁴

10
- 85050365667
- Bandit problems: Sequential allocation of experiments
- D. A. Berry and B. Fristedt. Bandit problems: sequential allocation of experiments. Chapman and Hall, 1985.
- (1985) Chapman and Hall
- Berry, D.A.¹ Fristedt, B.²

11
- 84891584370
- John Wiley and Sons
- J. C. Gittins. Multi-Armed Bandit Allocation Indices. John Wiley and Sons, 1989.
- (1989) Multi-Armed Bandit Allocation Indices
- Gittins, J.C.¹

12
- 84926078662
- Cambridge University Press, Cambridge, UK
- N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, Cambridge, UK, 2006.
- (2006) Prediction, Learning, and Games
- Cesa-Bianchi, N.¹ Lugosi, G.²

13
- 0002899547
- Asymptotically efficient adaptive allocation rules
- T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
- (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

14
- 0002955623
- North-Holland
- J. C. Gittins and D. M. Jones. A dynamic allocation index for the sequential design of experiments. North-Holland, 1974.
- (1974) A Dynamic Allocation Index for the Sequential Design of Experiments
- Gittins, J.C.¹ Jones, D.M.²

15
- 0000169010
- Bandit processes and dynamic allocation indices (with discussion)
- Series B
- J. C. Gittins. Bandit processes and dynamic allocation indices (with discussion). Journal of the Royal Statistical Society, Series B, 41:148-177, 1979.
- (1979) Journal of the Royal Statistical Society , vol.41 , pp. 148-177
- Gittins, J.C.¹

16
- 0000595228
- Arm acquiring bandits
- P. Whittle. Arm acquiring bandits. The Annals of Probability, 9:284-292, 1981.
- (1981) The Annals of Probability , vol.9 , pp. 284-292
- Whittle, P.¹

17
- 0001043843
- Restless bandits: Activity allocation in a changing world
- P. Whittle. Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25A:287-298, 1988.
- (1988) Journal of Applied Probability , vol.25 A , pp. 287-298
- Whittle, P.¹

18
- 0028560923
- The complexity of optimal queueing network control
- C. H. Papadimitriou and J. N. Tsitsiklis. The complexity of optimal queueing network control. In Structure in Complexity Theory Conference, pages 318-322, 1994.
- (1994) Structure in Complexity Theory Conference , pp. 318-322
- Papadimitriou, C.H.¹ Tsitsiklis, J.N.²

19
- 0343441515
- Restless bandits, linear programming relaxations, and primal dual index heuristic
- D. Bertsimas and J. Nino-Mora. Restless bandits, linear programming relaxations, and primal dual index heuristic. Operations Research, 48(1):80-90, 2000.
- (2000) Operations Research , vol.48 , Issue.1 , pp. 80-90
- Bertsimas, D.¹ Nino-Mora, J.²

20
- 46749146164
- Approximation algorithms for partial-information based stochastic control with Markovian rewards
- S. Guha and K. Munagala. Approximation algorithms for partial-information based stochastic control with markovian rewards. In 48th Annual IEEE Symposium on Fundations of Computer Science (FOCS), pages 483-493, 2007.
- (2007) 48th Annual IEEE Symposium on Fundations of Computer Science (FOCS) , pp. 483-493
- Guha, S.¹ Munagala, K.²

21
- 84867856114
- Regret bounds for restless Markov bandits
- Springer Berlin Heidelberg
- R. Ortner, D. Ryabko, P. Auer, and R. Munos. Regret bounds for restless markov bandits. In Algorithmic Learning Theory, pages 214-228. Springer Berlin Heidelberg, 2012.
- (2012) Algorithmic Learning Theory , pp. 214-228
- Ortner, R.¹ Ryabko, D.² Auer, P.³ Munos, R.⁴

22
- 84937889454
- arXiv preprint arXiv:1402.0562
- M. G. Azar, A. Lazaric, and E. Brunskill. Stochastic optimization of a locally smooth function under correlated bandit feedback. arXiv preprint arXiv:1402.0562, 2014.
- (2014) Stochastic Optimization of a Locally Smooth Function Under Correlated Bandit Feedback
- Azar, M.G.¹ Lazaric, A.² Brunskill, E.³

23
- 84972545864
- An analog of the minimax theorem for vector payoffs
- D. Blackwell. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 6:1-8, 1956.
- (1956) Pacific Journal of Mathematics , vol.6 , pp. 1-8
- Blackwell, D.¹

24
- 0001976283
- Princeton University Press, Cambridge, UK
- J. Hannan. Approximation to bayes risk in repeated plays, Contributions to the Theory of Games, Volume 3. Princeton University Press, Cambridge, UK, 1957.
- (1957) Approximation to Bayes Risk in Repeated Plays, Contributions to the Theory of Games , vol.3
- Hannan, J.¹

25
- 0002476325
- Regret in the on-line decision problem
- D. P. Foster and R. V. Vohra. Regret in the on-line decision problem. Games and Economic Behaviour, 29:7-35, 1999.
- (1999) Games and Economic Behaviour , vol.29 , pp. 7-35
- Foster, D.P.¹ Vohra, R.V.²

26
- 84927610755
- Working paper
- O. Besbes, Y. Gur, and A. Zeevi. Non-stationary stochastic optimization. Working paper, 2014.
- (2014) Non-stationary Stochastic Optimization
- Besbes, O.¹ Gur, Y.² Zeevi, A.³

27
- 80054097465
- On upper-confidence bound policies for switching bandit problems
- Springer Berlin Heidelberg
- A. Garivier and E. Moulines. On upper-confidence bound policies for switching bandit problems. In Algorithmic Learning Theory, pages 174-188. Springer Berlin Heidelberg, 2011.
- (2011) Algorithmic Learning Theory , pp. 174-188
- Garivier, A.¹ Moulines, E.²

28
- 0037709910
- The non-stochastic multi-armed bandit problem
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The non-stochastic multi-armed bandit problem. SIAM journal of computing, 32:48-77, 2002.
- (2002) SIAM Journal of Computing , vol.32 , pp. 48-77
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

29
- 0036568025
- Finite-time analysis of the multiarmed bandit problem
- P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235-246, 2002.
- (2002) Machine Learning , vol.47 , pp. 235-246
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

30
- 0031211090
- A decision-theoretic generalization of on-line learning and an application to boosting
- Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci., 55:119-139, 1997.
- (1997) J. Comput. System Sci. , vol.55 , pp. 119-139
- Freund, Y.¹ Schapire, R.E.²

31
- 70449882757
- Multi-armed bandit, dynamic environments and meta-bandits
- Whistler, Canada
- C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud, and M. Sebag. Multi-armed bandit, dynamic environments and meta-bandits. NIPS-2006 workshop, Online trading between exploration and exploitation, Whistler, Canada, 2006.
- (2006) NIPS-2006 Workshop, Online Trading Between Exploration and Exploitation
- Hartland, C.¹ Gelly, S.² Baskiotis, N.³ Teytaud, O.⁴ Sebag, M.⁵

32
- 84898070003
- Adapting to a changing environment: The brownian restless bandits
- A. Slivkins and E. Upfal. Adapting to a changing environment: The brownian restless bandits. In Proceedings of the 21st Annual Conference on Learning Theory (COLT), pages 343-354, 2008.
- (2008) Proceedings of the 21st Annual Conference on Learning Theory (COLT) , pp. 343-354
- Slivkins, A.¹ Upfal, E.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.