SCOPUS 정보 검색 플랫폼

Volumn 3, Issue 3, 2003, Pages 397-422

Using confidence bounds for exploitation-exploration trade-offs

Author keywords

Bandit Problem; Exploitation Exploration; Linear Value Function; Online Learning; Reinforcement Learning

Indexed keywords

ALGORITHMS; ONLINE SYSTEMS; RANDOM PROCESSES; STATISTICS;

CONFIDENCE BOUNDS;

LEARNING SYSTEMS;

EID: 0041966002 PISSN: 15324435 EISSN: None Source Type: Journal
DOI: 10.1162/153244303321897663 Document Type: Conference Paper

Times cited : (1885)

References (19)

3
- 0000616723
- Sample mean based index policies with O(log n) regret for the multi-armed bandit problem
- R. Agrawal. Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27:1054-1078, 1995.
- (1995) Advances in Applied Probability , vol.27 , pp. 1054-1078
- Agrawal, R.¹

4
- 0003430191
- Wiley, New York
- N. Alon and J. H. Spencer. The Probabilistic Method. Wiley, New York, 1992.
- (1992) The Probabilistic Method
- Alon, N.¹ Spencer, J.H.²

8
- 0042496192
- Journal version
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. Journal version, 2000.
- (2000) Gambling in a Rigged Casino: The Adversarial Multi-armed Bandit Problem
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

10
- 84972574511
- Weighted sums of certain dependent random variables
- K. Azuma. Weighted sums of certain dependent random variables. Tohoku Math. J., 3:357-367, 1967.
- (1967) Tohoku Math. J. , vol.3 , pp. 357-367
- Azuma, K.¹

11
- 0004218171
- Chapman and Hall
- D. A. Berry and B. Fristedt. Bandit Problems. Chapman and Hall, 1985.
- (1985) Bandit Problems
- Berry, D.A.¹ Fristedt, B.²

12
- 0032137328
- Tracking the best expert
- M. Herbster and M. K. Warmuth. Tracking the best expert. Machine Learning, 32:151-178, 1998.
- (1998) Machine Learning , vol.32 , pp. 151-178
- Herbster, M.¹ Warmuth, M.K.²

13
- 84947403595
- Probability inequalities for sums of bounded random variables
- W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13-30, 1963.
- (1963) Journal of the American Statistical Association , vol.58 , pp. 13-30
- Hoeffding, W.¹

14
- 0028442414
- Associative reinforcement learning: A generate and test algorithm
- L. P. Kaelbling. Associative reinforcement learning: A generate and test algorithm. Machine Learning, 15:299-319, 1994a.
- (1994) Machine Learning , vol.15 , pp. 299-319
- Kaelbling, L.P.¹

15
- 0028442413
- Associative reinforcement learning: Functions in k-DNF
- L. P. Kaelbling. Associative reinforcement learning: Functions in k-DNF. Machine Learning, 15:279-298, 1994b.
- (1994) Machine Learning , vol.15 , pp. 279-298
- Kaelbling, L.P.¹

16
- 0002899547
- Asymptotically efficient adaptive allocation rules
- T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
- (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

17
- 35148838877
- The weighted majority algorithm
- N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212-261, 1994.
- (1994) Information and Computation , vol.108 , Issue.2 , pp. 212-261
- Littlestone, N.¹ Warmuth, M.K.²

18
- 84966203785
- Some aspects of the sequential design of experiments
- H. Robbins. Some aspects of the sequential design of experiments. Bulletin American Mathematical Society, 55:527-535, 1952.
- (1952) Bulletin American Mathematical Society , vol.55 , pp. 527-535
- Robbins, H.¹

19
- 0004102479
- MIT Press, Cambridge, MA
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.