SCOPUS 정보 검색 플랫폼

ACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings

Volumn , Issue , 2008, Pages 667-674

Relevance assessment: Are judges exchangeable and does it matter?

(6) Bailey, Peter a Thomas, Paul b Craswell, Nick a De Vries, Arjen P c Soboroff, Ian d Yilmaz, Emine e

a MICROSOFT (United States)

b CSIRO (Australia)

c CWI (Netherlands)

d NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY (United States)

e MICROSOFT RESEARCH (United Kingdom)

Author keywords

Experimentation; Measurement; Performance

Indexed keywords

BRONZE; COPPER ALLOYS; INFORMATION RETRIEVAL; INFORMATION RETRIEVAL SYSTEMS; INFORMATION SERVICES; RESEARCH AND DEVELOPMENT MANAGEMENT; SILVER;

EXPERIMENTATION; GOLD STANDARDS; INFORMATION SEEKING; IR TESTS; MEASURING SYSTEMS; PERFORMANCE; RELATIVE PERFORMANCES; RELEVANCE ASSESSMENTS; RELEVANCE JUDGEMENTS; SYSTEM RANKINGS; TEST COLLECTIONS;

STANDARDS;

EID: 57349188929 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1390334.1390447 Document Type: Conference Paper

Times cited : (166)

References (25)

1
- 84892523862
- Inter-coder agreement for computational linguistics
- to appear
- R. Artstein and M. Poesio. Inter-coder agreement for computational linguistics. Computational Linguistics, to appear.
- Computational Linguistics
- Artstein, R.¹ Poesio, M.²

2
- 33750288965
- A statistical method for system evaluation using incomplete judgments
- J. A. Aslam, V. Pavlu, and E. Yilmaz. A statistical method for system evaluation using incomplete judgments. In Proc. SIGIR, 2006.
- (2006) Proc. SIGIR
- Aslam, J.A.¹ Pavlu, V.² Yilmaz, E.³

3
- 57349152836
- The CSIRO enterprise search test collection
- December
- P. Bailey, N. Craswell, I. Soboroff, and A. P. de Vries. The CSIRO enterprise search test collection. SIGIR Forum, 41(2), December 2007.
- (2007) SIGIR Forum , vol.41 , Issue.2
- Bailey, P.¹ Craswell, N.² Soboroff, I.³ de Vries, A.P.⁴

4
- 8644251996
- Retrieval evaluation with incomplete information
- C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proc. SIGIR, 2004.
- (2004) Proc. SIGIR
- Buckley, C.¹ Voorhees, E.M.²

5
- 0013287054
- Variations in relevance judgments and the evaluation of retrieval performance
- Sep-Oct
- R. Burgin. Variations in relevance judgments and the evaluation of retrieval performance. Information Processing & Management, 28(5):619-627, Sep-Oct 1992.
- (1992) Information Processing & Management , vol.28 , Issue.5 , pp. 619-627
- Burgin, R.¹

6
- 84937275232
- Assessing agreement on classification tasks: The kappa statistic
- J. Carletta. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249-254, 1996.
- (1996) Computational Linguistics , vol.22 , Issue.2 , pp. 249-254
- Carletta, J.¹

7
- 0005655729
- The effect of variations in relevance assessments in comparative experimental tests of index languages
- Cranfield Institute of Technology
- C. W. Cleverdon. The effect of variations in relevance assessments in comparative experimental tests of index languages. Technical Report ASLIB part 2, Cranfield Institute of Technology, 1970.
- (1970) Technical Report ASLIB part , vol.2
- Cleverdon, C.W.¹

8
- 84973587732
- A coefficient of agreement for nominal scales
- J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20:37-46, 1960.
- (1960) Educational and Psychological Measurement , vol.20 , pp. 37-46
- Cohen, J.¹

9
- 0032259402
- Efficient construction of large test collections
- G. V. Cormack, C. R. Palmer, and C. L. A. Clarke. Efficient construction of large test collections. In Proc. SIGIR, 1998.
- (1998) Proc. SIGIR
- Cormack, G.V.¹ Palmer, C.R.² Clarke, C.L.A.³

10
- 2142668188
- The kappa statistic: A second look
- B. D. Eugenio and M. Glass. The kappa statistic: a second look. Computational Linguistics, 30(1):95-101, 2004.
- (2004) Computational Linguistics , vol.30 , Issue.1 , pp. 95-101
- Eugenio, B.D.¹ Glass, M.²

11
- 0001769424
- Variations in relevance assessments and the measurement of retrieval effectiveness
- S. P. Harter. Variations in relevance assessments and the measurement of retrieval effectiveness. JASIS, 47(1):37-49, 1996.
- (1996) JASIS , vol.47 , Issue.1 , pp. 37-49
- Harter, S.P.¹

12
- 0033645041
- IR evaluation methods for retrieving highly relevant documents
- K. Järvelin and J. Kekäläinen. IR evaluation methods for retrieving highly relevant documents. In Proc. SIGIR, 2000.
- (2000) Proc. SIGIR
- Järvelin, K.¹ Kekäläinen, J.²

13
- 0016930011
- Information retrieval test collections
- K. S. Jones and K. van Rijsbergen. Information retrieval test collections. Journal of Documentation, 32:59-75, 1976.
- (1976) Journal of Documentation , vol.32 , pp. 59-75
- Jones, K.S.¹ van Rijsbergen, K.²

14
- 0009233105
- Relevance assessments and retrieval system evaluation
- M. E. Lesk and G. Salton. Relevance assessments and retrieval system evaluation. Information Storage and Retrieval, 4:343-359, 1969.
- (1969) Information Storage and Retrieval , vol.4 , pp. 343-359
- Lesk, M.E.¹ Salton, G.²

15
- 0003102776
- Measuring the agreement among relevance judges
- April
- S. Mizzaro. Measuring the agreement among relevance judges. In Proc. MIRA 99: Evaluating Interactive Information Retrieval, April 1999.
- (1999) Proc. MIRA 99: Evaluating Interactive Information Retrieval
- Mizzaro, S.¹

16
- 85050172503
- Statistical Techniques for the Study of Language and Language Behaviour
- R. Rietveld and R. van Hout. Statistical Techniques for the Study of Language and Language Behaviour. Mouton de Gray ter, 1993.
- (1993) Mouton de Gray ter
- Rietveld, R.¹ van Hout, R.²

17
- 0003761866
- McGraw-Hill
- S. Sigel and N. J. Castellan. Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, 1988.
- (1988) Nonparametric Statistics for the Behavioral Sciences
- Sigel, S.¹ Castellan, N.J.²

18
- 36448954593
- A comparison of pooled and sampled relevance judgments
- I. Soboroff. A comparison of pooled and sampled relevance judgments. In Proc. SIGIR, 2007.
- (2007) Proc. SIGIR
- Soboroff, I.¹

19
- 0036989640
- Liberal relevance criteria of TREC: Counting on negligible documents?
- E. Sormunen. Liberal relevance criteria of TREC: counting on negligible documents? In Proc. SIGIR, 2002.
- (2002) Proc. SIGIR
- Sormunen, E.¹

20
- 84876705138
- IR Evaluation Using Multiple Assessors per Topic
- A. Trotman and D. Jenkinson. IR Evaluation Using Multiple Assessors per Topic. In Proc. ADCS, 2007.
- (2007) Proc. ADCS
- Trotman, A.¹ Jenkinson, D.²

21
- 51849161867
- Can we at least agree on something?
- A. Trotman, N. Pharo, and D. Jenkinson. Can we at least agree on something? In Proc. SIGIR Workshop on Focused Retrieval, 2007.
- (2007) Proc. SIGIR Workshop on Focused Retrieval
- Trotman, A.¹ Pharo, N.² Jenkinson, D.³

22
- 0032264624
- Variations in relevance judgments and the measurement of retrieval effectiveness
- E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. In Proc. SIGIR, 1998.
- (1998) Proc. SIGIR
- Voorhees, E.M.¹

23
- 57349119640
- NIST
- E. M. Voorhees and D. Harman. Overview of the Fifth Text REtrieval Conference (TREC-5). NIST, 1996.
- (1996) Overview of the Fifth Text REtrieval Conference (TREC-5)
- Voorhees, E.M.¹ Harman, D.²

24
- 34547632535
- Estimating average precision with incomplete and imperfect judgments
- E. Yilmaz and J. A. Aslam. Estimating average precision with incomplete and imperfect judgments. In Proc. CIKM, 2006.
- (2006) Proc. CIKM
- Yilmaz, E.¹ Aslam, J.A.²

25
- 57349107098
- A simple and efficient sampling method for estimating AP and NDCG
- E. Yilmaz, E. Kanoulas, and J. Aslam.. A simple and efficient sampling method for estimating AP and NDCG. In Proc. SIGIR, 2008.
- (2008) Proc. SIGIR
- Yilmaz, E.¹ Kanoulas, E.² Aslam, J.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.