메뉴 건너뛰기




Volumn 41, Issue 2, 2013, Pages 693-721

The multi-armed bandit problem with covariates

Author keywords

Adaptive partition; Contextual bandit; Multi armed bandit; Nonparametric bandit; Regret bounds; Sequential allocation; Successive elimination

Indexed keywords


EID: 84879129093     PISSN: 00905364     EISSN: None     Source Type: Journal    
DOI: 10.1214/13-AOS1101     Document Type: Article
Times cited : (173)

References (25)
  • 1
    • 78649420293 scopus 로고    scopus 로고
    • Regret bounds and minimax policies under partial monitoring
    • MR2738783
    • AUDIBERT, J.-Y. and BUBECK, S. (2010). Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11 2785-2836. MR2738783.
    • (2010) J. Mach. Learn. Res. , vol.11 , pp. 2785-2836
    • Audibert, J.-Y.1    Bubeck, S.2
  • 2
    • 34547706430 scopus 로고    scopus 로고
    • Fast learning rates for plug-in classifiers
    • MR2336861
    • AUDIBERT, J.-Y. and TSYBAKOV, A. B. (2007). Fast learning rates for plug-in classifiers. Ann. Statist. 35 608-633. MR2336861.
    • (2007) Ann. Statist. , vol.35 , pp. 608-633
    • Audibert, J.-Y.1    Tsybakov, A.B.2
  • 4
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • AUER, P., CESA-BIANCHI, N. and FISCHER, P. (2002). Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 235-256.
    • (2002) Mach. Learn. , vol.47 , pp. 235-256
    • Auer, P.1    Cesa-bianchi, N.2    Fischer, P.3
  • 5
    • 77957337199 scopus 로고    scopus 로고
    • UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem
    • MR2728432
    • AUER, P. and ORTNER, R. (2010). UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Period. Math. Hungar. 61 55-65. MR2728432.
    • (2010) Period. Math. Hungar. , vol.61 , pp. 55-65
    • Auer, P.1    Ortner, R.2
  • 6
    • 0004454219 scopus 로고
    • Randomized allocation of treatments in sequential experiments
    • MR0637940
    • BATHER, J. A. (1981). Randomized allocation of treatments in sequential experiments. J. R. Stat. Soc. Ser. B Stat. Methodol. 43 265-292. MR0637940.
    • (1981) J. R. Stat. Soc. Ser. B Stat. Methodol. , vol.43 , pp. 265-292
    • Bather, J.A.1
  • 8
    • 33745295134 scopus 로고    scopus 로고
    • Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
    • MR2274398
    • EVEN-DAR, E., MANNOR, S. and MANSOUR, Y. (2006). Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Mach. Learn. Res. 7 1079-1105. MR2274398.
    • (2006) J. Mach. Learn. Res. , vol.7 , pp. 1079-1105
    • Even-dar, E.1    Mannor, S.2    Mansour, Y.3
  • 9
    • 70049095891 scopus 로고    scopus 로고
    • Woodroofe's one-armed bandit problem revisited
    • MR2538082
    • GOLDENSHLUGER, A. and ZEEVI, A. (2009). Woodroofe's one-armed bandit problem revisited. Ann. Appl. Probab. 19 1603-1633. MR2538082.
    • (2009) Ann. Appl. Probab. , vol.19 , pp. 1603-1633
    • Goldenshluger, A.1    Zeevi, A.2
  • 10
    • 79951890373 scopus 로고    scopus 로고
    • A note on performance limitations in bandit problems with side information
    • MR2815844
    • GOLDENSHLUGER, A. and ZEEVI, A. (2011). A note on performance limitations in bandit problems with side information. IEEE Trans. Inform. Theory 57 1707-1713. MR2815844.
    • (2011) IEEE Trans. Inform. Theory , vol.57 , pp. 1707-1713
    • Goldenshluger, A.1    Zeevi, A.2
  • 11
    • 38048999685 scopus 로고    scopus 로고
    • Online learning with prior knowledge
    • Lecture Notes in Computer Science 4539, Springer, Berlin. MR2397608
    • HAZAN, E. and MEGIDDO, N. (2007). Online learning with prior knowledge. In Learning Theory. Lecture Notes in Computer Science 4539 499-513. Springer, Berlin. MR2397608.
    • (2007) Learning Theory , pp. 499-513
    • Hazan, E.1    Megiddo, N.2
  • 14
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • MR0776826
    • LAI, T. L. and ROBBINS, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4-22. MR0776826.
    • (1985) Adv. in Appl. Math. , vol.6 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 15
    • 77956144722 scopus 로고    scopus 로고
    • The epoch-greedy algorithm for multi-armed bandits with side information
    • (J. C. Platt, D. Koller, Y. Singer and S. Roweis, eds.), MIT Press, Cambridge, MA
    • LANGFORD, J. and ZHANG, T. (2008). The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in Neural Information Processing Systems 20 (J. C. Platt, D. Koller, Y. Singer and S. Roweis, eds.) 817-824. MIT Press, Cambridge, MA.
    • (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 817-824
    • Langford, J.1    Zhang, T.2
  • 16
    • 84862301554 scopus 로고    scopus 로고
    • Showing relevant ads via lipschitz context multi-armed bandits
    • LU, T., PÁL, D. and PÁL, M. (2010). Showing relevant ads via Lipschitz context multi-armed bandits. JMLR: Workshop and Conference Proceedings 9 485-492.
    • (2010) JMLR: Workshop and Conference Proceedings , vol.9 , pp. 485-492
    • Lu, T.1    Pál, D.2    Pál, M.3
  • 17
    • 0033234630 scopus 로고    scopus 로고
    • Smooth discrimination analysis
    • MR1765618
    • MAMMEN, E. and TSYBAKOV, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808-1829. MR1765618.
    • (1999) Ann. Statist. , vol.27 , pp. 1808-1829
    • Mammen, E.1    Tsybakov, A.B.2
  • 18
    • 80053440857 scopus 로고    scopus 로고
    • Nonparametric bandits with covariates
    • (A. Tauman Kalai and M. Mohri, eds.), Omnipress, Haifa, Israel
    • RIGOLLET, P. and ZEEVI, A. (2010). Nonparametric bandits with covariates. In COLT (A. Tauman Kalai and M. Mohri, eds.) 54-66. Omnipress, Haifa, Israel.
    • (2010) COLT , pp. 54-66
    • Rigollet, P.1    Zeevi, A.2
  • 19
    • 84966203785 scopus 로고
    • Some aspects of the sequential design of experiments
    • MR0050246
    • ROBBINS, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58 527-535. MR0050246.
    • (1952) Bull. Amer. Math. Soc. (N.S.) , vol.58 , pp. 527-535
    • Robbins, H.1
  • 21
    • 3142725508 scopus 로고    scopus 로고
    • Optimal aggregation of classifiers in statistical learning
    • MR2051002
    • TSYBAKOV, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135-166. MR2051002.
    • (2004) Ann. Statist. , vol.32 , pp. 135-166
    • Tsybakov, A.B.1
  • 22
    • 0038982800 scopus 로고
    • An asymptotic minimax theorem for the two armed bandit problem
    • MR0116443
    • VOGEL, W. (1960). An asymptotic minimax theorem for the two armed bandit problem. Ann. Math. Statist. 31 444-451. MR0116443.
    • (1960) Ann. Math. Statist. , vol.31 , pp. 444-451
    • Vogel, W.1
  • 24
    • 0001631327 scopus 로고
    • A one-armed bandit problem with a concomitant variable
    • MR0556471
    • WOODROOFE, M. (1979). A one-armed bandit problem with a concomitant variable. J. Amer. Statist. Assoc. 74 799-806. MR0556471.
    • (1979) J. Amer. Statist. Assoc. , vol.74 , pp. 799-806
    • Woodroofe, M.1
  • 25
    • 0036108219 scopus 로고    scopus 로고
    • Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates
    • MR1892657
    • YANG, Y. and ZHU, D. (2002). Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. Ann. Statist. 30 100-121. MR1892657.
    • (2002) Ann. Statist. , vol.30 , pp. 100-121
    • Yang, Y.1    Zhu, D.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.