SCOPUS 정보 검색 플랫폼

Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011

Volumn , Issue , 2011, Pages 297-306

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

(4) Li, Lihong a Chu, Wei a Langford, John a Wang, Xuanhui a

a YAHOO RESEARCH (United States)

Author keywords

Benchmark dataset; Contextual bandit; Multi armed bandit; Offline evaluation; Recommendation

Indexed keywords

BENCHMARK DATASETS; CONTEXTUAL BANDIT; MULTI-ARMED BANDIT; OFFLINE EVALUATION; RECOMMENDATION;

ALGORITHMS; DATA MINING; INFORMATION RETRIEVAL; ONLINE SYSTEMS; SIMULATORS;

WORLD WIDE WEB;

EID: 79952384747 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1935826.1935878 Document Type: Conference Paper

Times cited : (527)

References (22)

1
- 0344118814
- Long. Reinforcement learning with immediate rewards and linear hypotheses
- Naoki Abe, Alan W. Biermann, and Philip M. Long. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37(4): 263-293, 2003.
- (2003) Algorithmica , vol.37 , Issue.4 , pp. 263-293
- Abe, N.¹ Biermann, A.W.² Philip, M.³

2
- 77951164997
- Explore/exploit schemes for web content optimization
- Deepak Agarwal, Bee-Chung Chen, and Pradheep Elango. Explore/exploit schemes for web content optimization. In Proceedings of the Ninth International Conference on Data Mining, 2009.
- (2009) Proceedings of the Ninth International Conference on Data Mining
- Agarwal, D.¹ Chen, B.-C.² Elango, P.³

3
- 77950884255
- Spatio-temporal models for estimating click-through rate
- Deepak Agarwal, Bee-Chung Chen, and Pradheep Elango. Spatio-temporal models for estimating click-through rate. In Proceedings of the Eighteenth International Conference on World Wide Web, 2009.
- (2009) Proceedings of the Eighteenth International Conference on World Wide Web
- Agarwal, D.¹ Chen, B.-C.² Elango, P.³

4
- 0041966002
- Using confidence bounds for exploitation-exploration trade-offs
- Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3: 397-422, 2002.
- (2002) Journal of Machine Learning Research , vol.3 , pp. 397-422
- Auer, P.¹

5
- 0036568025
- Finite-time analysis of the multiarmed bandit problem
- Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3): 235-256, 2002.
- (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

6
- 0037709910
- The nonstochastic multiarmed bandit problem
- Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1): 48-77, 2002.
- (2002) SIAM Journal on Computing , vol.32 , Issue.1 , pp. 48-77
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

7
- 0042045035
- Bandit problems: Sequential allocation of experiments
- Chapman and Hall
- Donald A. Berry and Bert Fristedt. Bandit Problems: Sequential Allocation of Experiments. Monographs on Statistics and Applied Probability. Chapman and Hall, 1985.
- (1985) Monographs on Statistics and Applied Probability.
- Berry, D.A.¹ Fristedt, B.²

8
- 0000169010
- Bandit processes and dynamic allocation indices
- J.C. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological), 41:148-177, 1979.
- (1979) Journal of the Royal Statistical Society. Series B (Methodological) , vol.41 , pp. 148-177
- Gittins, J.C.¹

9
- 0028442413
- Associative reinforcement learning: Functions in k-DNF
- Leslie Pack Kaelbling. Associative reinforcement learning: Functions in k-DNF. Machine Learning, 15(3):279-298, 1994.
- (1994) Machine Learning , vol.15 , Issue.3 , pp. 279-298
- Kaelbling, L.P.¹

10
- 84898967749
- Ng. Approximate planning in large POMDPs via reusable trajectories
- Michael J. Kearns, Yishay Mansour, and Andrew Y. Ng. Approximate planning in large POMDPs via reusable trajectories. In Advances in Neural Information Processing Systems 12, 2000.
- (2000) Advances in Neural Information Processing Systems , vol.12
- Michael, J.¹ Kearns, Y.M.² Andrew, Y.³

11
- 0002899547
- Asymptotically efficient adaptive allocation rules
- Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1): 4-22, 1985.
- (1985) Advances in Applied Mathematics , vol.6 , Issue.1 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

12
- 56449124046
- Exploration scavenging
- John Langford, Alexander L. Strehl, and Jennifer Wortman. Exploration scavenging. In Proceedings of the Twenty-Fifth International Conference on Machine Learning, pages 528-535, 2008.
- (2008) Proceedings of the Twenty-Fifth International Conference on Machine Learning , pp. 528-535
- Langford, J.¹ Strehl, A.L.² Wortman, J.³

13
- 85162018594
- The epoch-greedy algorithm for contextual multi-armed bandits
- John Langford and Tong Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In Advances in Neural Information Processing Systems 20, 2008.
- (2008) Advances in Neural Information Processing Systems , vol.20
- Langford, J.¹ Zhang, T.²

14
- 77954641643
- Schapire. A contextual-bandit approach to personalized news article recommendation
- Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the Nineteenth International Conference on World Wide Web, 2010.
- (2010) Proceedings of the Nineteenth International Conference on World Wide Web
- Li, H.¹ Chu, W.² Langford, J.³ Robert, E.⁴

15
- 0001035413
- On the method of bounded differences
- In J. Siemons, editor, Cambridge University Press
- Colin McDiarmid. On the method of bounded differences. In J. Siemons, editor, Surveys in Combinatorics, volume 141 of London Mathematical Society Lecture Notes, pages 148-188. Cambridge University Press, 1989.
- (1989) Surveys in Combinatorics, Volume 141 of London Mathematical Society Lecture Notes , pp. 148-188
- McDiarmid, C.¹

16
- 78651337589
- Online learning for recency search ranking using real-time user feedback
- Taesup Moon, Lihong Li, Wei Chu, Ciya Liao, Zhaohui Zheng, and Yi Chang. Online learning for recency search ranking using real-time user feedback. In Proceedings of the Nineteenth International Conference on Knowledge Management, 2010.
- (2010) Proceedings of the Nineteenth International Conference on Knowledge Management
- Moon, T.¹ Li, H.² Chu, W.³ Liao, C.⁴ Zheng, Z.⁵ Chang, Y.⁶

17
- 0242393653
- Singh. Eligibility traces for off-policy policy evaluation
- Doina Precup, Richard S. Sutton, and Satinder P. Singh. Eligibility traces for off-policy policy evaluation. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 759-766, 2000.
- (2000) Proceedings of the Seventeenth International Conference on Machine Learning , pp. 759-766
- Precup, D.¹ Sutton, R.S.² Satinder, P.³

18
- 85161982296
- Learning from logged implicit exploration data
- Alexander L. Strehl, John Langford, Lihong Li, and Sham M. Kakadě. Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems 28, 2011.
- (2011) Advances in Neural Information Processing Systems , vol.28
- Alexander, L.¹ Strehl, J.L.² Li, H.³ Kakadě, S.M.⁴

19
- 34250750797
- Experience-efficient learning in associative bandit problems
- Alexander L. Strehl, Chris Mesterharm, Michael L. Littman, and Haym Hirsh. Experience-efficient learning in associative bandit problems. In Proceedings of the Twenty-Third International Conference on Machine Learning, pages 889-896, 2006.
- (2006) Proceedings of the Twenty-Third International Conference on Machine Learning , pp. 889-896
- Alexander, L.¹ Strehl, C.M.² Littman, M.L.³ Hirsh, H.⁴

20
- 0004102479
- MIT Press, Cambridge, MA, March
- Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, March 1998.
- (1998) Reinforcement Learning: An Introduction.
- Sutton, R.S.¹ Barto, A.G.²

21
- 15844389867
- Bandit problems with side observations
- Chih-Chun Wang, Sanjeev R. Kulkarni, and H. Vincent Poor. Bandit problems with side observations. IEEE Transactions on Automatic Control, 50(3):338-355, 2005.
- (2005) IEEE Transactions on Automatic Control , vol.50 , Issue.3 , pp. 338-355
- Wang, C.-C.¹ Kulkarni, S.R.² Vincent Poor, H.³

22
- 0001631327
- A one-armed bandit problem with a concomitant variable
- Michael Woodroofe. A one-armed bandit problem with a concomitant variable. Journal of the American Statistics Association, 74(368):799-806, 1979.
- (1979) Journal of the American Statistics Association , vol.74 , Issue.368 , pp. 799-806
- Woodroofe, M.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.