SCOPUS 정보 검색 플랫폼

Information Processing and Management

Volumn 48, Issue 6, 2012, Pages 1053-1066

Using crowdsourcing for TREC relevance assessment

(2) Alonso, Omar a Mizzaro, Stefano b

a MICROSOFT (United States)

b UNIVERSITY OF UDINE (Italy)

Author keywords

Amazon Mechanical Turk; Crowdsourcing; Experimental design; IR evaluation; Relevance assessment; Test collections; TREC

Indexed keywords

CROWDSOURCING; MECHANICAL TURKS; RELEVANCE ASSESSMENTS; TEST COLLECTION; TREC;

DESIGN OF EXPERIMENTS;

EXPERIMENTS;

EID: 84865695467 PISSN: 03064573 EISSN: None Source Type: Journal
DOI: 10.1016/j.ipm.2012.01.004 Document Type: Article

Times cited : (119)

References (40)

1
- 84930177119
- Design and implementation of relevance assessments using crowdsourcing
- Alonso, O.; & Baeza-Yates, R. (2011). Design and implementation of relevance assessments using crowdsourcing. In Proceedings of the European conference on Information Retrieval (ECIR) (pp. 153-164).
- (2011) Proceedings of the European Conference on Information Retrieval (ECIR) , pp. 153-164
- Alonso, O.¹ Baeza-Yates, R.²

2
- 80052128947
- Crowdsourcing for information retrieval: Principles, methods and applications, SIGIR tutorial
- Alonso, O.; & Lease, M. (2011). Crowdsourcing for information retrieval: Principles, methods and applications, SIGIR tutorial. In: Proceedings of the 34th ACM SIGIR conference (pp. 1299-1300).
- (2011) Proceedings of the 34th ACM SIGIR Conference , pp. 1299-1300
- Alonso, O.¹ Lease, M.²

3
- 72449180422
- Relevance criteria for E-commerce. A crowdsourcing-based experimental analysis
- Alonso, O.; & Mizzaro, S. (2009a). Relevance criteria for E-commerce. A crowdsourcing-based experimental analysis. In Proceedings of the 32nd ACM SIGIR conference (pp. 760-761).
- (2009) Proceedings of the 32nd ACM SIGIR Conference , pp. 760-761
- Alonso, O.¹ Mizzaro, S.²

4
- 77956016969
- Can we get rid of TREC assessors? Using mechanical Turk for relevance assessment
- Alonso, O.; & Mizzaro, S. (2009b). Can we get rid of TREC assessors? Using mechanical Turk for relevance assessment. In Proceedings of the 32nd ACM SIGIR workshop on the future of IR, evaluation (pp. 15-16).
- (2009) Proceedings of the 32nd ACM SIGIR Workshop on the Future of IR, Evaluation , pp. 15-16
- Alonso, O.¹ Mizzaro, S.²

5
- 65249129950
- Crowdsourcing for relevance evaluation
- O. Alonso, D. Rose, and B. Stewart Crowdsourcing for relevance evaluation SIGIR Forum 42 2 2008 9 15
- (2008) SIGIR Forum , vol.42 , Issue.2 , pp. 9-15
- Alonso, O.¹ Rose, D.² Stewart, B.³

6
- 84889573673
- Crowdsourcing assessments for XML ranked retrieval
- O. Alonso, R. Schenkel, and M. Theobald Crowdsourcing assessments for XML ranked retrieval Proceedings of the European Conference on Information Retrieval (ECIR) 2010 2010 623 626
- (2010) Proceedings of the European Conference on Information Retrieval (ECIR) , vol.2010 , pp. 623-626
- Alonso, O.¹ Schenkel, R.² Theobald, M.³

7
- 33750288965
- A statistical method for system evaluation using incomplete judgments
- Aslam, J.A.; Pavlu, V.; & Yilmaz, E. (2006). A statistical method for system evaluation using incomplete judgments. In Proceedings of the 29th ACM SIGIR conference (pp. 541-548).
- (2006) Proceedings of the 29th ACM SIGIR Conference , pp. 541-548
- Aslam . J, A.¹ Pavlu, V.² Yilmaz, E.³

8
- 57349188929
- Relevance assessment: Are judges exchangeable and does it matter
- Bailey, P.; Craswell, N.; Soboroff, I.; Thomas, P.; de Vries, A.P.; & Yilmaz, E. (2008). Relevance assessment: Are judges exchangeable and does it matter. In Proceedings of the 31st ACM SIGIR conference (pp. 667-674).
- (2008) Proceedings of the 31st ACM SIGIR Conference , pp. 667-674
- Bailey, P.¹ Craswell, N.² Soboroff, I.³ Thomas, P.⁴ De Vries . A, P.⁵ Yilmaz, E.⁶

9
- 34548503242
- Jossey-Bass
- N. Bradburn, S. Sudman, and B. Wansink Asking questions: The definitive guide to questionnaire design 2004 Jossey-Bass
- (2004) Asking Questions: The Definitive Guide to Questionnaire Design
- Bradburn, N.¹ Sudman, S.² Wansink, B.³

10
- 62949244220
- Meeting of the MINDS: An information retrieval research agenda
- J. Callan, J. Allan, C.L.A. Clarke, S. Dumais, D.A. Evans, and M. Sanderson Meeting of the MINDS: An information retrieval research agenda SIGIR Forum 41 2 2007 25 34
- (2007) SIGIR Forum , vol.41 , Issue.2 , pp. 25-34
- Callan, J.¹ Allan, J.² Clarke, C.L.A.³ Dumais, S.⁴ Evans, D.A.⁵ Sanderson, M.⁶

11
- 80053402398
- Fast, cheap, and creative: Evaluating translation quality using Amazon's mechanical turk
- Callison-Burch, C. (2009). Fast, cheap, and creative: Evaluating translation quality using Amazon's mechanical turk. In Proceedings of the 2009 conference on empirical methods in natural language processing (pp. 286-295).
- (2009) Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing , pp. 286-295
- Callison-Burch, C.¹

12
- 77956024152
- The effect of assessor error on IR system evaluation
- Carterette, B.; & Soboroff, I. (2010). The effect of assessor error on IR system evaluation. In Proceedings of the 33rd ACM SIGIR conference (pp. 539-546).
- (2010) Proceedings of the 33rd ACM SIGIR Conference , pp. 539-546
- Carterette, B.¹ Soboroff, I.²

13
- 41849104667
- Here or there: Preference judgments for relevance
- B. Carterette, P. Bennet, D.M. Chickering, and S. Dumais Here or there: Preference judgments for relevance Proceedings of the European Conference on Information Retrieval (ECIR) 2008 2008 16 27
- (2008) Proceedings of the European Conference on Information Retrieval (ECIR) , vol.2008 , pp. 16-27
- Carterette, B.¹ Bennet, P.² Chickering, D.M.³ Dumais, S.⁴

14
- 57349133736
- Evaluation over thousands of queries
- Carterette, B.; Pavlu, V.; Kanoulas, E.; Aslam, J.A.; & Allan, J. (2008). Evaluation over thousands of queries. In Proceedings of the 31st ACM SIGIR conference (pp. 651-658).
- (2008) Proceedings of the 31st ACM SIGIR Conference , pp. 651-658
- Carterette, B.¹ Pavlu, V.² Kanoulas, E.³ Aslam . J, A.⁴ Allan, J.⁵

15
- 84865685535
- Carvalho, V.; Lease, M.; & Yilmaz, E. (Eds.) (2010). Proceedings of the 32nd ACM SIGIR workshop on crowdsourcing for relevance, evaluation, 2010.
- (2010) Proceedings of the 32nd ACM SIGIR Workshop on Crowdsourcing for Relevance, Evaluation, 2010
- Carvalho, V.¹ Lease, M.² Yilmaz, E.³

16
- 84973587732
- A coefficient for agreement for nominal scales
- J. Cohen A coefficient for agreement for nominal scales Education and Psychological Measurement 20 1960 37 46
- (1960) Education and Psychological Measurement , vol.20 , pp. 37-46
- Cohen, J.¹

17
- 0032259402
- Efficient construction of large test collections
- Cormack, G.V.; Palmer, C.R.; & Clarke, C.L.A. (1998). Efficient construction of large test collections. In Proceedings of the 21st ACM SIGIR conference (pp. 282-289).
- (1998) Proceedings of the 21st ACM SIGIR Conference , pp. 282-289
- Cormack . G, V.¹ Palmer . C, R.² Clarke . . L C, A.³

18
- 3343019470
- Measuring nominal scale agreement among many raters
- J.L. Fleiss Measuring nominal scale agreement among many raters Psychological Bulletin 76 5 1971 378 382
- (1971) Psychological Bulletin , vol.76 , Issue.5 , pp. 378-382
- Fleiss, J.L.¹

19
- 85018109911
- Crowdsourcing document relevance assessment with mechanical turk
- Grady, C.; & Lease, M. (2010). Crowdsourcing document relevance assessment with mechanical turk. In Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon's mechanical turk (pp. 172-179).
- (2010) Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk , pp. 172-179
- Grady, C.¹ Lease, M.²

20
- 75149143907
- A few good topics: Experiments in topic set reduction for retrieval evaluation
- J. Guiver, S. Mizzaro, and S. Robertson A few good topics: Experiments in topic set reduction for retrieval evaluation ACM Transactions on Information Systems 27 4 2009 1 26
- (2009) ACM Transactions on Information Systems , vol.27 , Issue.4 , pp. 1-26
- Guiver, J.¹ Mizzaro, S.² Robertson, S.³

21
- 56849109269
- Crown Business New York
- J. Howe Crowdsourcing: Why the power of the crowd is driving the future of business 2008 Crown Business New York
- (2008) Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business
- Howe, J.¹

22
- 72449196273
- Towards methods for the collective gathering and quality control of relevance assessments
- Kazai, G.; Milic-Frayling, N.; & Costello, J. (2009). Towards methods for the collective gathering and quality control of relevance assessments. In Proceedings of the 32nd ACM SIGIR conference (pp: 452-459).
- (2009) Proceedings of the 32nd ACM SIGIR Conference , pp. 452-459
- Kazai, G.¹ Milic-Frayling, N.² Costello, J.³

23
- 80052132873
- Crowdsourcing for book search evaluation: Impact of HIT design on comparative system ranking
- Beijing, China, ACM
- Kazai, G.; Kamps, J.; Koolen, M.; & Milic-Frayling, N. (2011). Crowdsourcing for book search evaluation: Impact of HIT design on comparative system ranking. In Proceedings of the 34th ACM SIGIR conference (pp. 205-214). Beijing, China, ACM.
- (2011) Proceedings of the 34th ACM SIGIR Conference , pp. 205-214
- Kazai, G.¹ Kamps, J.² Koolen, M.³ Milic-Frayling, N.⁴

24
- 57649217556
- Crowdsourcing user studies with mechanical turk
- Kittur, A.; Chi, E.H.; & Suh, B. (2008). Crowdsourcing user studies with mechanical turk. In CHI '08: Proceeding of the 26th ACM SIGCHI conference (pp. 453-456).
- (2008) CHI '08: Proceeding of the 26th ACM SIGCHI Conference , pp. 453-456
- Kittur, A.¹ Chi . E, H.² Suh, B.³

25
- 0008500240
- Estimating the reliability, systematic error, and random error of interval data
- K. Krippendorff Estimating the reliability, systematic error, and random error of interval data Educational and Psychological Measurement 30 1 1970 61 70
- (1970) Educational and Psychological Measurement , vol.30 , Issue.1 , pp. 61-70
- Krippendorff, K.¹

26
- 84865680372
- Lease, M.; Sorokin, A.; & Yilmaz, E. (Eds.) (2011). Proceedings of the 33rd ACM SIGIR workshop on crowdsourcing for information retrieval.
- (2011) Proceedings of the 33rd ACM SIGIR Workshop on Crowdsourcing for Information Retrieval
- Lease, M.¹ Sorokin, A.² Yilmaz, E.³

27
- 84873543071
- TREC
- M. Lease, and G. Kazai Overview of the TREC 2011 crowdsourcing track 2011 TREC
- (2011) Overview of the TREC 2011 Crowdsourcing Track
- Lease, M.¹ Kazai, G.²

28
- 80054835941
- Crowdsourcing a News query classification dataset
- McCreadie, R.; Macdonald, C.; & Ounis, I. (2010). Crowdsourcing a News query classification dataset. In Proceedings of CSE 2010 workshop at SIGIR.
- (2010) Proceedings of CSE 2010 Workshop at SIGIR
- McCreadie, R.¹ MacDonald, C.² Ounis, I.³

29
- 84905047705
- Crowdsourcing blog track top news judgments at TREC
- McCreadie, R.; Macdonald, C.; & Ounis, I. (2011). Crowdsourcing blog track top news judgments at TREC. In Proceedings of CSDM workshop at WSDM 2011.
- (2011) Proceedings of CSDM Workshop at WSDM 2011
- McCreadie, R.¹ MacDonald, C.² Ounis, I.³

30
- 0004257599
- Morgan-Kaufmann
- J. Nielsen Usability engineering 1993 Morgan-Kaufmann
- (1993) Usability Engineering
- Nielsen, J.¹

31
- 77952357661
- How reliable are annotations via crowdsourcing? A study about inter-annotator agreement for multi-label image annotation
- Nowak, S.; Rüger, S. (2010). How reliable are annotations via crowdsourcing? A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the international ACM conference on multimedia, information retrieval (pp. 557-566).
- (2010) Proceedings of the International ACM Conference on Multimedia, Information Retrieval , pp. 557-566
- Nowak, S.¹ Rüger, S.²

32
- 84885608872
- Information retrieval system evaluation: Effort, sensitivity, and reliability
- Sanderson, M. & Zobel, J. (2005). Information retrieval system evaluation: Effort, sensitivity, and reliability. In Proceedings of the 28th ACM SIGIR conference (pp. 162-169).
- (2005) Proceedings of the 28th ACM SIGIR Conference , pp. 162-169
- Sanderson, M.¹ Zobel, J.²

33
- 77954220071
- Test collection based evaluation of information retrieval systems
- M. Sanderson Test collection based evaluation of information retrieval systems Foundations and Trends in Information Retrieval 4 4 2010 247 375
- (2010) Foundations and Trends in Information Retrieval , vol.4 , Issue.4 , pp. 247-375
- Sanderson, M.¹

34
- 80052119348
- Measuring assessor accuracy: A comparison of NIST assessors and user study participants
- Smucker, M.; & Prakash Jethani, C. (2011). Measuring assessor accuracy: A comparison of NIST assessors and user study participants. In: Proceedings of the 34th ACM SIGIR conference (pp. 1231-1232).
- (2011) Proceedings of the 34th ACM SIGIR Conference , pp. 1231-1232
- Smucker, M.¹ Prakash Jethani, C.²

35
- 80053360508
- Cheap and fast but is it good? Evaluating non-expert annotations for natural language tasks
- Snow, R.; O'Connor, B.; Jurafsky, D.; & Ng, A.Y. (2008). Cheap and fast but is it good? Evaluating non-expert annotations for natural language tasks. In Conference on empirical methods on natural language processing (pp. 254-263).
- (2008) Conference on Empirical Methods on Natural Language Processing , pp. 254-263
- Snow, R.¹ O'Connor, B.² Jurafsky, D.³ Ng, A.Y.⁴

36
- 0034790621
- Ranking retrieval systems without relevance judgments
- Soboroff, I.; Nicholas, C.; & Cahan, P. (2001). Ranking retrieval systems without relevance judgments. In Proceedings of the 24th ACM SIGIR conference (pp. 66-73).
- (2001) Proceedings of the 24th ACM SIGIR Conference , pp. 66-73
- Soboroff, I.¹ Nicholas, C.² Cahan, P.³

37
- 0036989640
- Liberal relevance criteria of TREC: Counting on negligible documents?
- Sormunen, E. (2002). Liberal relevance criteria of TREC: Counting on negligible documents? In Proceedings of the 25th ACM SIGIR conference (pp. 324-330).
- (2002) Proceedings of the 25th ACM SIGIR Conference , pp. 324-330
- Sormunen, E.¹

38
- 84874593076
- A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability
- Retrieved 01.10.10
- Stemler, S.E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research & Evaluation, 9(4). < http://PAREonline.net/getvn. asp?v=9&n=4 > Retrieved 01.10.10.
- (2004) Practical Assessment, Research & Evaluation , vol.9 , Issue.4
- Stemler, S.E.¹

39
- 0033733783
- Variations in relevance judgments and the measurement of retrieval effectiveness
- E. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness Information Processing and Management 36 5 2000 697 716
- (2000) Information Processing and Management , vol.36 , Issue.5 , pp. 697-716
- Voorhees, E.¹

40
- 8644262918
- The philosophy of information retrieval evaluation
- Voorhees, E. (2001). The philosophy of information retrieval evaluation. In CLEF '01 proceedings (pp. 355-370).
- (2001) CLEF '01 Proceedings , pp. 355-370
- Voorhees, E.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.