SCOPUS 정보 검색 플랫폼

Uncertainty in Artificial Intelligence - Proceedings of the 28th Conference, UAI 2012

Volumn , Issue , 2012, Pages 247-254

Sample-efficient nonstationary policy evaluation for contextual bandits

(4) Dud'Ik, Miroslav a Erhan, Dumitru b Langford, John a Li, Lihong a

a MICROSOFT RESEARCH (United States)

b YAHOO RESEARCH (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BIAS-VARIANCE TRADEOFFS; CONTEXTUAL BANDITS; IMPORTANCE WEIGHTING; LEARNING PROBLEM; LEARNING SETTINGS; NONSTATIONARY; POLICY EVALUATION; REAL-WORLD;

ARTIFICIAL INTELLIGENCE;

EID: 84885967848 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (26)

References (24)

1
- 84886066530
- Arthur Asuncion and David J. Newman. UCI machine learning repository, 2007
- Arthur Asuncion and David J. Newman. UCI machine learning repository, 2007. http://www.ics.uci.edu/ -mlearn/MLRepository.html.

2
- 0036568025
- Finite-time analysis of the multiarmed bandit problem
- Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235-256, 2002.
- (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

3
- 0037709910
- The nonstochastic multiarmed bandit problem
- Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Computing, 32(1):48-77, 2002.
- (2002) SIAM J. Computing , vol.32 , Issue.1 , pp. 48-77
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

4
- 70350664424
- The offset tree for learning with partial labels
- Alina Beygelzimer and John Langford. The offset tree for learning with partial labels. In KDD, pages 129-138, 2009.
- (2009) KDD , pp. 129-138
- Beygelzimer, A.¹ Langford, J.²

5
- 80053144086
- Contextual bandit algorithms with supervised learning guarantees
- Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, and Robert E. Schapire. Contextual bandit algorithms with supervised learning guarantees. In AISTATS, 2011.
- (2011) AISTATS
- Beygelzimer, A.¹ Langford, J.² Li, L.³ Reyzin, L.⁴ Schapire, R.E.⁵

6
- 0017109044
- Some results on generalized difference estimation and generalized regression estimation for finite populations
- Claes M. Cassel, Carl E. Sarndal, and Jan H. Wretman. Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika, 63:615-620, 1976.
- (1976) Biometrika , vol.63 , pp. 615-620
- Cassel, C.M.¹ Sarndal, C.E.² Wretman, J.H.³

7
- 77956196339
- Evaluating online ad campaigns in a pipeline: Causal models at scale
- David Chan, Rong Ge, Ori Gershony, Tim Hesterberg, and Diane Lambert. Evaluating online ad campaigns in a pipeline: Causal models at scale. In KDD, 2010.
- (2010) KDD
- Chan, D.¹ Ge, R.² Gershony, O.³ Hesterberg, T.⁴ Lambert, D.⁵

8
- 80053456223
- Doubly robust policy evaluation and learning
- Miroslav Dud?k, John Langford, and Lihong Li. Doubly robust policy evaluation and learning. In ICML, 2011.
- (2011) ICML
- Dudk, M.¹ Langford, J.² Li, L.³

9
- 50949133669
- LIBLINEAR: A library for large linear classification
- Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-RuiWang, and Chih-Jen Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871-1874, 2008.
- (2008) Journal of Machine Learning Research , vol.9 , pp. 1871-1874
- Fan, R.-E.¹ Chang, K.-W.² Hsieh, C.-J.³ Wang, X.-R.⁴ Lin, C.-J.⁵

10
- 0002384441
- On tail probabilities for martingales
- David A. Freedman. On tail probabilities for martingales. Annals of Probability, 3(1):100-118, 1975.
- (1975) Annals of Probability , vol.3 , Issue.1 , pp. 100-118
- Freedman, D.A.¹

11
- 84947396376
- A generalization of sampling without replacement from a finite universe
- D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc., 47:663-685, 1952.
- (1952) J. Amer. Statist. Assoc. , vol.47 , pp. 663-685
- Horvitz, D.G.¹ Thompson, D.J.²

12
- 1942452450
- Exploration in metric state spaces
- Sham Kakade, Michael Kearns, and John Langford. Exploration in metric state spaces. In ICML, 2003.
- (2003) ICML
- Kakade, S.¹ Kearns, M.² Langford, J.³

13
- 46249131752
- Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data
- With discussions
- Joseph D. Y. Kang and Joseph L. Schafer. Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci., 22(4):523-539, 2007. With discussions.
- (2007) Statist. Sci. , vol.22 , Issue.4 , pp. 523-539
- Kang, Y.J.D.¹ Schafer, J.L.²

14
- 0012257655
- Near-optimal reinforcement learning in polynomial time
- Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. In ICML, 1998.
- (1998) ICML
- Kearns, M.¹ Singh, S.²

15
- 56449124046
- Exploration scavenging
- John Langford, Alexander L. Strehl, and Jennifer Wortman. Exploration scavenging. In ICML, pages 528-535, 2008.
- (2008) ICML , pp. 528-535
- Langford, J.¹ Strehl, A.L.² Wortman, J.³

16
- 84876811202
- RCV1: A new benchmark collection for text categorization research
- David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361-397, 2004.
- (2004) Journal of Machine Learning Research , vol.5 , pp. 361-397
- Lewis, D.D.¹ Yang, Y.² Rose, T.G.³ Li, F.⁴

17
- 77954641643
- A contextual-bandit approach to personalized news article recommendation
- Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW, 2010.
- (2010) WWW
- Li, L.¹ Chu, W.² Langford, J.³ Schapire, R.E.⁴

18
- 79952384747
- Unbiased offline evaluation of contextualbandit-based news article recommendation algorithms
- Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. Unbiased offline evaluation of contextualbandit-based news article recommendation algorithms. In WSDM, 2011.
- (2011) WSDM
- Li, L.¹ Chu, W.² Langford, J.³ Wang, X.⁴

19
- 4444230264
- Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study
- Jared K. Lunceford and Marie Davidian. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine, 23(19):2937-2960, 2004.
- (2004) Statistics in Medicine , vol.23 , Issue.19 , pp. 2937-2960
- Lunceford, J.K.¹ Davidian, M.²

20
- 0242393653
- Eligibility traces for off-policy policy evaluation
- Doina Precup, Richard S. Sutton, and Satinder P. Singh. Eligibility traces for off-policy policy evaluation. In ICML, pages 759-766, 2000.
- (2000) ICML , pp. 759-766
- Precup, D.¹ Sutton, R.S.² Singh, S.P.³

21
- 21844487694
- Semiparametric efficiency in multivariate regression models with missing data
- James M. Robins and Andrea Rotnitzky. Semiparametric efficiency in multivariate regression models with missing data. J. Amer. Statist. Assoc., 90:122-129, 1995.
- (1995) J. Amer. Statist. Assoc. , vol.90 , pp. 122-129
- Robins, J.M.¹ Rotnitzky, A.²

22
- 84888862680
- Estimation of regression coefficients when some regressors are not always observed
- James M. Robins, Andrea Rotnitzky, and Lue Ping Zhao. Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc., 89(427):846-866, 1994.
- (1994) J. Amer. Statist. Assoc. , vol.89 , Issue.427 , pp. 846-866
- Robins, J.M.¹ Rotnitzky, A.² Zhao, L.P.³

23
- 85080569673
- Learning from logged implicit exploration data
- Alex Strehl, John Langford, Lihong Li, and Sham Kakade. Learning from logged implicit exploration data. In NIPS, pages 2217-2225, 2011.
- (2011) NIPS , pp. 2217-2225
- Strehl, A.¹ Langford, J.² Li, L.³ Kakade, S.⁴

24
- 0004102479
- MIT Press, Cambridge, MA, Marc
- Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, March 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.