SCOPUS 정보 검색 플랫폼

Machine Learning

Volumn 85, Issue 3, 2011, Pages 361-391

An asymptotically optimal policy for finite support models in the multiarmed bandit problem

(2) Honda, Junya a Takemura, Akimichi a

a UNIVERSITY OF TOKYO (Japan)

Author keywords

Bandit problems; Convex optimization; Finite time regret; MED policy

Indexed keywords

ASYMPTOTIC BOUNDS; ASYMPTOTICALLY OPTIMAL; BANDIT PROBLEMS; CONVEX OPTIMIZATION TECHNIQUES; EXPLORATION AND EXPLOITATION; FINITE SUPPORTS; FINITE TIME; FINITE-TIME REGRET; MULTI-ARMED BANDIT PROBLEM; MULTIPLE ARMS; UPPER AND LOWER BOUNDS; UPPER BOUND;

CONVEX OPTIMIZATION; REINFORCEMENT LEARNING;

OPTIMIZATION;

EID: 83155180573 PISSN: 08856125 EISSN: 15730565 Source Type: Journal
DOI: 10.1007/s10994-011-5257-4 Document Type: Article

Times cited : (48)

References (23)

1
- 0345224411
- The continuum-armed bandit problem
- Agrawal, R. (1995a). The continuum-armed bandit problem. SIAM Journal on Control and Optimization, 33, 1926-1951.
- (1995) SIAM Journal on Control and Optimization , vol.33 , pp. 1926-1951
- Agrawal, R.¹

2
- 0000616723
- Sample mean based index policies with o (log n) regret for the multi-armed bandit problem
- Agrawal, R. (1995b). Sample mean based index policies with o (log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27, 1054-1078.
- (1995) Advances in Applied Probability , vol.27 , pp. 1054-1078
- Agrawal, R.¹

3
- 84898079018
- Minimax policies for adversarial and stochastic bandits
- Montreal: Omnipress
- Audibert, J.-Y., & Bubeck, S. (2009). Minimax policies for adversarial and stochastic bandits. In Proceedings of COLT 2009. Montreal: Omnipress.
- (2009) Proceedings of COLT 2009
- Audibert, J.-Y.¹ Bubeck, S.²

4
- 0036568025
- Finite-time analysis of the multiarmed bandit problem
- DOI 10.1023/A:1013689704352, Computational Learning Theory
- Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002a). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47, 235-256. (Pubitemid 34126111)
- (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

5
- 0037709910
- The nonstochastic multiarmed bandit problem
- Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2002b). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32, 48-77.
- (2002) SIAM Journal on Computing , vol.32 , pp. 48-77
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

6
- 0004055894
- Cambridge: Cambridge University Press
- Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
- (2004) Convex Optimization
- Boyd, S.¹ Vandenberghe, L.²

7
- 0030159874
- Optimal adaptive policies for sequential allocation problems
- DOI 10.1006/aama.1996.0007
- Burnetas, A. N., & Katehakis, M. N. (1996). Optimal adaptive policies for sequential allocation problems. Advances in Applied Mathematics, 17, 122-142. (Pubitemid 126160351)
- (1996) Advances in Applied Mathematics , vol.17 , Issue.2 , pp. 122-142
- Burnetas, A.N.¹ Katehakis, M.N.²

8
- 84889281816
- 2nd edn. New York: Wiley-Interscience
- Cover, T. M., & Thomas, J. A. (2006). Elements of information theory (2nd edn.). New York: Wiley-Interscience.
- (2006) Elements of Information Theory
- Cover, T.M.¹ Thomas, J.A.²

9
- 84937398609
- Pac bounds for multi-armed bandit and Markov decision processes
- London: Springer
- Even-Dar, E., Mannor, S., & Mansour, Y. (2002). Pac bounds for multi-armed bandit and Markov decision processes. In Proceedings of COLT 2002 (pp. 255-270). London: Springer.
- (2002) Proceedings of COLT 2002 , pp. 255-270
- Even-Dar, E.¹ Mannor, S.² Mansour, Y.³

10
- 0003814609
- New York: Academic Press
- Fiacco, A. V. (1983). Introduction to sensitivity and stability analysis in nonlinear programming. New York: Academic Press.
- (1983) Introduction to Sensitivity and Stability Analysis in Nonlinear Programming
- Fiacco, A.V.¹

11
- 84891584370
- Multi-armed bandit allocation indices
- Chichester: Wiley
- Gittins, J. C. (1989). Multi-armed bandit allocation indices. Wiley-Interscience Series in Systems and Optimization. Chichester: Wiley.
- (1989) Wiley-interscience Series in Systems and Optimization
- Gittins, J.C.¹

12
- 84898077171
- An asymptotically optimal bandit algorithm for bounded support models
- Haifa, Israel
- Honda, J., & Takemura, A. (2010). An asymptotically optimal bandit algorithm for bounded support models. In Proceedings of COLT 2010, Haifa, Israel (pp. 67-79).
- (2010) Proceedings of COLT 2010 , pp. 67-79
- Honda, J.¹ Takemura, A.²

13
- 0028531055
- Multi-armed bandit problem revisited
- Ishikida, T., & Varaiya, P. (1994). Multi-armed bandit problem revisited. Journal of Optimization Theory and Applications, 83, 113-154.
- (1994) Journal of Optimization Theory and Applications , vol.83 , pp. 113-154
- Ishikida, T.¹ Varaiya, P.²

14
- 0023345261
- The multi-armed bandit problem: Decomposition and computation
- Katehakis, M. N., & Veinott, A. F. Jr. (1987). The multi-armed bandit problem: decomposition and computation. Mathematics of Operations Research, 12, 262-268.
- (1987) Mathematics of Operations Research , vol.12 , pp. 262-268
- Katehakis, M.N.¹ Veinott Jr., A.F.²

15
- 84898981061
- Nearly tight bounds for the continuum-armed bandit problem
- New York: MIT Press
- Kleinberg, R. (2005). Nearly tight bounds for the continuum-armed bandit problem. In Proceedings of NIPS 2005 (pp. 697-704). New York: MIT Press.
- (2005) Proceedings of NIPS 2005 , pp. 697-704
- Kleinberg, R.¹

16
- 0002899547
- Asymptotically efficient adaptive allocation rules
- Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6, 4-22.
- (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

17
- 0032679082
- Exploration of multi-state environments: Local measures and backpropagation of uncertainty
- Meuleau, N., & Bourgine, P. (1999). Exploration of multi-state environments: Local measures and backpropagation of uncertainty. Machine Learning, 35, 117-154.
- (1999) Machine Learning , vol.35 , pp. 117-154
- Meuleau, N.¹ Bourgine, P.²

18
- 0003983125
- Springer Series in Statistics. New York: Springer
- Pollard, D. (1984). Convergence of stochastic processes. Springer Series in Statistics. New York: Springer.
- (1984) Convergence of Stochastic Processes
- Pollard, D.¹

19
- 84966203785
- Some aspects of the sequential design of experiments
- Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58, 527-535.
- (1952) Bulletin of the American Mathematical Society , vol.58 , pp. 527-535
- Robbins, H.¹

20
- 14344258433
- A Bayesian framework for reinforcement learning
- San Francisco: Kaufmann
- Strens, M. (2000). A Bayesian framework for reinforcement learning. In Proceedings of ICML 2000 (pp. 943-950). San Francisco: Kaufmann.
- (2000) Proceedings of ICML 2000 , pp. 943-950
- Strens, M.¹

21
- 33646406807
- Multi-armed bandit algorithms and empirical evaluation
- Porto, Portugal, Berlin: Springer
- Vermorel, J., & Mohri, M. (2005). Multi-armed bandit algorithms and empirical evaluation. In Proceedings of ECML 2005, Porto, Portugal (pp. 437-448). Berlin: Springer.
- (2005) Proceedings of ECML 2005 , pp. 437-448
- Vermorel, J.¹ Mohri, M.²

22
- 0008954974
- Doctoral dissertation, Department of Artificial Intelligence, University of Edinburgh
- Wyatt, J. (1997). Exploration and inference in learning from reinforcement. Doctoral dissertation, Department of Artificial Intelligence, University of Edinburgh.
- (1997) Exploration and Inference in Learning From Reinforcement
- Wyatt, J.¹

23
- 0000607073
- Nonparametric bandit methods
- Yakowitz, S., & Lowe, W. (1991). Nonparametric bandit methods. Annals of Operation Research, 28, 297-312.
- (1991) Annals of Operation Research , vol.28 , pp. 297-312
- Yakowitz, S.¹ Lowe, W.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.