메뉴 건너뛰기




Volumn 35, Issue 2, 2010, Pages 395-411

Linearly parameterized bandits

Author keywords

Adaptive control; Multi armed bandit; Parametric model

Indexed keywords

ADAPTIVE CONTROL; BANDIT PROBLEMS; BAYES RISK; EXPLORATION AND EXPLOITATION; LINEAR FUNCTIONS; LOWER BOUNDS; MULTI ARMED BANDIT; NEAR-OPTIMAL POLICIES; PARAMETERIZED; PARAMETRIC MODELS; RANDOM VECTORS; UPPER BOUND;

EID: 77953111834     PISSN: 0364765X     EISSN: 15265471     Source Type: Journal    
DOI: 10.1287/moor.1100.0446     Document Type: Article
Times cited : (516)

References (33)
  • 1
    • 0042996986 scopus 로고    scopus 로고
    • Associative reinforcement learning using linear probabilistic concepts
    • Morgan Kaufman, San Francisco
    • Abe, N., P. M. Long. 1999. Associative reinforcement learning using linear probabilistic concepts. Proc. 16th Internat. Conf. Machine Learn., Morgan Kaufman, San Francisco, 3-11.
    • (1999) Proc. 16th Internat. Conf. Machine Learn. , pp. 3-11
    • Abe, N.1    Long, P.M.2
  • 2
    • 0000616723 scopus 로고
    • Sample mean based index policies with 0(log n) regret for the multi-armed bandit problem
    • Agrawal, R. 1995. Sample mean based index policies with 0(log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. 27(4) 1054-1078.
    • (1995) Adv. Appl. Probab. , vol.27 , Issue.4 , pp. 1054-1078
    • Agrawal, R.1
  • 3
    • 0024626787 scopus 로고
    • Asymptotically efficient adaptive allocation schemes for controlled I.I.D. Processes: Finite parameter space
    • Agrawal, R., D. Teneketzis, V. Anantharam. 1989. Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: Finite parameter space. IEEE Trans. Automatic Control 34(3) 258-267.
    • (1989) IEEE Trans. Automatic Control , vol.34 , Issue.3 , pp. 258-267
    • Agrawal, R.1    Teneketzis, D.2    Anantharam, V.3
  • 4
    • 0041966002 scopus 로고    scopus 로고
    • Using confidence bounds for exploitation-exploration trade-offs
    • Auer, P. 2002. Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(3) 397-422.
    • (2002) J. Machine Learn. Res. , vol.3 , Issue.3 , pp. 397-422
    • Auer, P.1
  • 5
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multi-armed bandit problem
    • Auer, P., N. Cesa-Bianchi, P. Fischer. 2002. Finite-time analysis of the multi-armed bandit problem. Machine Learn. 47(2) 235-256.
    • (2002) Machine Learn , vol.47 , Issue.2 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 10
    • 0000792515 scopus 로고
    • Multidimensional stochastic approximation methods
    • Blum, J. R. 1954. Multidimensional stochastic approximation methods. Ann. Math. Statist. 25(4) 737-744.
    • (1954) Ann. Math. Statist. , vol.25 , Issue.4 , pp. 737-744
    • Blum, J.R.1
  • 14
    • 0001492860 scopus 로고
    • "Two-armed bandit" problem
    • Feldman, D. 1962. Contributions to the "two-armed bandit" problem. Ann. Math. Statist. 33(3) 847-856.
    • (1962) Ann. Math. Statist. , vol.33 , Issue.3 , pp. 847-856
    • Feldman, D.1
  • 15
    • 0039176122 scopus 로고    scopus 로고
    • A new positive definite geometric mean of two positive definite matrices
    • Fiedler, M., V. Pták. 1997. A new positive definite geometric mean of two positive definite matrices. Linear Algebra Its Appl. 251(1) 1-20.
    • (1997) Linear Algebra Its Appl. , vol.251 , Issue.1 , pp. 1-20
    • Fiedler, M.1    Pták, V.2
  • 18
    • 70049095891 scopus 로고    scopus 로고
    • Woodroofe's one-armed bandit problem revisited
    • Goldenshluger, A., A. Zeevi. 2009. Woodroofe's one-armed bandit problem revisited. Ann. Appl. Probab. 19(4) 1603-1633.
    • (2009) Ann. Appl. Probab. , vol.19 , Issue.4 , pp. 1603-1633
    • Goldenshluger, A.1    Zeevi, A.2
  • 19
    • 0010948196 scopus 로고
    • "Two-armed bandit" problem
    • Keener, R. 1985. Further contributions to the "two-armed bandit" problem. Ann. Statist. 13(1) 418-422.
    • (1985) Ann. Statist. , vol.13 , Issue.1 , pp. 418-422
    • Keener, R.1
  • 20
    • 0001079593 scopus 로고
    • Stochastic estimation of the maximum of a regression function
    • Kiefer, J., J. Wolfowitz. 1952. Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23(3) 462-466.
    • (1952) Ann. Math. Statist. , vol.23 , Issue.3 , pp. 462-466
    • Kiefer, J.1    Wolfowitz, J.2
  • 21
    • 0038026196 scopus 로고    scopus 로고
    • Stochastic approximation (invited paper)
    • Lai, T. 2003. Stochastic approximation (invited paper). Ann. Statist. 31(2) 391-406.
    • (2003) Ann. Statist. , vol.31 , Issue.2 , pp. 391-406
    • Lai, T.1
  • 22
    • 0000854435 scopus 로고
    • Adaptive treatment allocation and the multi-armed bandit problem
    • Lai, T. L. 1987. Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3) 1091-1114.
    • (1987) Ann. Statist. , vol.15 , Issue.3 , pp. 1091-1114
    • Lai, T.L.1
  • 23
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • Lai, T. L., H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1) 4-22.
    • (1985) Adv. Appl. Math. , vol.6 , Issue.1 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 26
    • 0030306745 scopus 로고    scopus 로고
    • Strongly convex analysis
    • Polovinkin, E. S. 1996. Strongly convex analysis. Sbornik: Math. 187(2) 259-286.
    • (1996) Sbornik: Math. , vol.187 , Issue.2 , pp. 259-286
    • Polovinkin, E.S.1
  • 28
    • 84966203785 scopus 로고
    • Some aspects of the sequential design of experiments
    • Robbins, H. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5) 527-535.
    • (1952) Bull. Amer. Math. Soc. , vol.58 , Issue.5 , pp. 527-535
    • Robbins, H.1
  • 29
    • 0000016172 scopus 로고
    • A stochastic approximation method
    • Robbins, H., S. Monro. 1951. A stochastic approximation method. Ann. Math. Statist. 22(3) 400-407.
    • (1951) Ann. Math. Statist. , vol.22 , Issue.3 , pp. 400-407
    • Robbins, H.1    Monro, S.2
  • 30
    • 77953098183 scopus 로고    scopus 로고
    • Rusmevichientong, P., J. N. Tsitsiklis. 2010. Linearly parameterized bandits (extended version). http://arxiv.org/abs/0812.3465.
    • (2010)
    • Rusmevichientong, P.1    Tsitsiklis, J.N.2
  • 31
    • 0001395850 scopus 로고
    • On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
    • Thompson, W. R. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3) 285-294.
    • (1933) Biometrika , vol.25 , Issue.3 , pp. 285-294
    • Thompson, W.R.1
  • 33
    • 15844362682 scopus 로고    scopus 로고
    • Arbitrary side observations in bandit problems
    • Wang, C.-C., S. R. Kulkarni, H. V. Poor. 2005b. Arbitrary side observations in bandit problems. Adv. Appl. Math. 34(4) 903-938.
    • (2005) Adv. Appl. Math. , vol.34 , Issue.4 , pp. 903-938
    • Wang, C.-C.1    Kulkarni, S.R.2    Poor, H.V.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.