메뉴 건너뛰기




Volumn , Issue , 2012, Pages 247-254

Sample-efficient nonstationary policy evaluation for contextual bandits

Author keywords

[No Author keywords available]

Indexed keywords

BIAS-VARIANCE TRADEOFFS; CONTEXTUAL BANDITS; IMPORTANCE WEIGHTING; LEARNING PROBLEM; LEARNING SETTINGS; NONSTATIONARY; POLICY EVALUATION; REAL-WORLD;

EID: 84885967848     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (26)

References (24)
  • 1
    • 84886066530 scopus 로고    scopus 로고
    • Arthur Asuncion and David J. Newman. UCI machine learning repository, 2007
    • Arthur Asuncion and David J. Newman. UCI machine learning repository, 2007. http://www.ics.uci.edu/ -mlearn/MLRepository.html.
  • 2
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235-256, 2002.
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 4
    • 70350664424 scopus 로고    scopus 로고
    • The offset tree for learning with partial labels
    • Alina Beygelzimer and John Langford. The offset tree for learning with partial labels. In KDD, pages 129-138, 2009.
    • (2009) KDD , pp. 129-138
    • Beygelzimer, A.1    Langford, J.2
  • 6
    • 0017109044 scopus 로고
    • Some results on generalized difference estimation and generalized regression estimation for finite populations
    • Claes M. Cassel, Carl E. Sarndal, and Jan H. Wretman. Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika, 63:615-620, 1976.
    • (1976) Biometrika , vol.63 , pp. 615-620
    • Cassel, C.M.1    Sarndal, C.E.2    Wretman, J.H.3
  • 7
    • 77956196339 scopus 로고    scopus 로고
    • Evaluating online ad campaigns in a pipeline: Causal models at scale
    • David Chan, Rong Ge, Ori Gershony, Tim Hesterberg, and Diane Lambert. Evaluating online ad campaigns in a pipeline: Causal models at scale. In KDD, 2010.
    • (2010) KDD
    • Chan, D.1    Ge, R.2    Gershony, O.3    Hesterberg, T.4    Lambert, D.5
  • 8
    • 80053456223 scopus 로고    scopus 로고
    • Doubly robust policy evaluation and learning
    • Miroslav Dud?k, John Langford, and Lihong Li. Doubly robust policy evaluation and learning. In ICML, 2011.
    • (2011) ICML
    • Dudk, M.1    Langford, J.2    Li, L.3
  • 10
    • 0002384441 scopus 로고
    • On tail probabilities for martingales
    • David A. Freedman. On tail probabilities for martingales. Annals of Probability, 3(1):100-118, 1975.
    • (1975) Annals of Probability , vol.3 , Issue.1 , pp. 100-118
    • Freedman, D.A.1
  • 11
    • 84947396376 scopus 로고
    • A generalization of sampling without replacement from a finite universe
    • D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc., 47:663-685, 1952.
    • (1952) J. Amer. Statist. Assoc. , vol.47 , pp. 663-685
    • Horvitz, D.G.1    Thompson, D.J.2
  • 12
    • 1942452450 scopus 로고    scopus 로고
    • Exploration in metric state spaces
    • Sham Kakade, Michael Kearns, and John Langford. Exploration in metric state spaces. In ICML, 2003.
    • (2003) ICML
    • Kakade, S.1    Kearns, M.2    Langford, J.3
  • 13
    • 46249131752 scopus 로고    scopus 로고
    • Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data
    • With discussions
    • Joseph D. Y. Kang and Joseph L. Schafer. Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci., 22(4):523-539, 2007. With discussions.
    • (2007) Statist. Sci. , vol.22 , Issue.4 , pp. 523-539
    • Kang, Y.J.D.1    Schafer, J.L.2
  • 14
    • 0012257655 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. In ICML, 1998.
    • (1998) ICML
    • Kearns, M.1    Singh, S.2
  • 15
    • 56449124046 scopus 로고    scopus 로고
    • Exploration scavenging
    • John Langford, Alexander L. Strehl, and Jennifer Wortman. Exploration scavenging. In ICML, pages 528-535, 2008.
    • (2008) ICML , pp. 528-535
    • Langford, J.1    Strehl, A.L.2    Wortman, J.3
  • 16
    • 84876811202 scopus 로고    scopus 로고
    • RCV1: A new benchmark collection for text categorization research
    • David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361-397, 2004.
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 361-397
    • Lewis, D.D.1    Yang, Y.2    Rose, T.G.3    Li, F.4
  • 17
    • 77954641643 scopus 로고    scopus 로고
    • A contextual-bandit approach to personalized news article recommendation
    • Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW, 2010.
    • (2010) WWW
    • Li, L.1    Chu, W.2    Langford, J.3    Schapire, R.E.4
  • 18
    • 79952384747 scopus 로고    scopus 로고
    • Unbiased offline evaluation of contextualbandit-based news article recommendation algorithms
    • Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. Unbiased offline evaluation of contextualbandit-based news article recommendation algorithms. In WSDM, 2011.
    • (2011) WSDM
    • Li, L.1    Chu, W.2    Langford, J.3    Wang, X.4
  • 19
    • 4444230264 scopus 로고    scopus 로고
    • Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study
    • Jared K. Lunceford and Marie Davidian. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine, 23(19):2937-2960, 2004.
    • (2004) Statistics in Medicine , vol.23 , Issue.19 , pp. 2937-2960
    • Lunceford, J.K.1    Davidian, M.2
  • 20
    • 0242393653 scopus 로고    scopus 로고
    • Eligibility traces for off-policy policy evaluation
    • Doina Precup, Richard S. Sutton, and Satinder P. Singh. Eligibility traces for off-policy policy evaluation. In ICML, pages 759-766, 2000.
    • (2000) ICML , pp. 759-766
    • Precup, D.1    Sutton, R.S.2    Singh, S.P.3
  • 21
    • 21844487694 scopus 로고
    • Semiparametric efficiency in multivariate regression models with missing data
    • James M. Robins and Andrea Rotnitzky. Semiparametric efficiency in multivariate regression models with missing data. J. Amer. Statist. Assoc., 90:122-129, 1995.
    • (1995) J. Amer. Statist. Assoc. , vol.90 , pp. 122-129
    • Robins, J.M.1    Rotnitzky, A.2
  • 22
    • 84888862680 scopus 로고
    • Estimation of regression coefficients when some regressors are not always observed
    • James M. Robins, Andrea Rotnitzky, and Lue Ping Zhao. Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc., 89(427):846-866, 1994.
    • (1994) J. Amer. Statist. Assoc. , vol.89 , Issue.427 , pp. 846-866
    • Robins, J.M.1    Rotnitzky, A.2    Zhao, L.P.3
  • 23
    • 85080569673 scopus 로고    scopus 로고
    • Learning from logged implicit exploration data
    • Alex Strehl, John Langford, Lihong Li, and Sham Kakade. Learning from logged implicit exploration data. In NIPS, pages 2217-2225, 2011.
    • (2011) NIPS , pp. 2217-2225
    • Strehl, A.1    Langford, J.2    Li, L.3    Kakade, S.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.