-
1
-
-
77957931572
-
Detecting the origin of text segments efficiently
-
Madrid, Spain
-
Abdel-Hamid, O., Behzadi, B., Christoph, S., and Henzinger, M. (2009). Detecting the Origin of Text Segments Efficiently. In Proceedings of the 18th International Conference on World Wide Web, pages 61-70, Madrid, Spain.
-
(2009)
Proceedings of the 18th International Conference on World Wide Web
, pp. 61-70
-
-
Abdel-Hamid, O.1
Behzadi, B.2
Christoph, S.3
Henzinger, M.4
-
2
-
-
0023041177
-
A bit-string longest-common-subsequence algorithm
-
Allison, L. and Dix, T. I. (1986). A bit-string longest-common- subsequence algorithm. Information Processing Letters, 23:305-310.
-
(1986)
Information Processing Letters
, vol.23
, pp. 305-310
-
-
Allison, L.1
Dix, T.I.2
-
3
-
-
57349126313
-
Inter-coder agreement for computational linguistics
-
Artstein, R. and Poesio, M. (2008). Inter-Coder Agreement for Computational Linguistics. Computational Linguistics, 34(4):555-596.
-
(2008)
Computational Linguistics
, vol.34
, Issue.4
, pp. 555-596
-
-
Artstein, R.1
Poesio, M.2
-
4
-
-
84866840259
-
A reflective view on text similarity
-
Hissar, Bulgaria
-
Bär, D., Zesch, T., and Gurevych, I. (2011). A Reflective View on Text Similarity. In Proceedings of the International Conference on Recent Advances in Natural Language Processing, pages 515-520, Hissar, Bulgaria.
-
(2011)
Proceedings of the International Conference on Recent Advances in Natural Language Processing
, pp. 515-520
-
-
Bär, D.1
Zesch, T.2
Gurevych, I.3
-
5
-
-
80053424869
-
Plagiarism detection across distant language pairs
-
Beijing, China
-
Barrón-Cedeño, A., Rosso, P., Agirre, E., and Labaka, G. (2010). Plagiarism Detection across Distant Language Pairs. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 37-45, Beijing, China.
-
(2010)
Proceedings of the 23rd International Conference on Computational Linguistics
, pp. 37-45
-
-
Barrón-Cedeño, A.1
Rosso, P.2
Agirre, E.3
Labaka, G.4
-
8
-
-
0010362121
-
Syntactic clustering of the web
-
Santa Clara, CA, USA
-
Broder, A. Z., Glassman, S. C., Manasse, M. S., and Zweig, G. (1997). Syntactic clustering of the Web. In Proceedings of the 6th International World Wide Web Conference, pages 1157-1166, Santa Clara, CA, USA.
-
(1997)
Proceedings of the 6th International World Wide Web Conference
, pp. 1157-1166
-
-
Broder, A.Z.1
Glassman, S.C.2
Manasse, M.S.3
Zweig, G.4
-
9
-
-
84888111020
-
Paraphrase acquisition via crowd sourcing and machine learning
-
January
-
Burrows, S., Potthast, M., and Stein, B. (2012). Paraphrase Acquisition via Crowdsourcing and Machine Learning. Transactions on Intelligent Systems and Technology, V(January):1-22.
-
(2012)
Transactions on Intelligent Systems and Technology
, vol.5
, pp. 1-22
-
-
Burrows, S.1
Potthast, M.2
Stein, B.3
-
10
-
-
84957855792
-
Creating speech and language data with Amazon's mechanical turk
-
Los Angeles, CA, USA
-
Callison-Burch, C. and Dredze, M. (2010). Creating Speech and Language Data With Amazon's Mechanical Turk. In Proceedings of the NAACL HLT Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 1-12, Los Angeles, CA, USA.
-
(2010)
Proceedings of the NAACL HLT Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
, pp. 1-12
-
-
Callison-Burch, C.1
Dredze, M.2
-
11
-
-
84937275232
-
Assessing agreement on classification tasks: The kappa statistic
-
Carletta, J. (1996). Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics, 22(2):249-254.
-
(1996)
Computational Linguistics
, vol.22
, Issue.2
, pp. 249-254
-
-
Carletta, J.1
-
13
-
-
84866878919
-
Using natural language processing for automatic detection of plagiarism
-
Newcastle upon Tyne, UK
-
Chong, M., Specia, L., and Mitkov, R. (2010). Using Natural Language Processing for Automatic Detection of Plagiarism. In Proceedings of the 4th International Plagiarism Conference, Newcastle upon Tyne, UK.
-
(2010)
Proceedings of the 4th International Plagiarism Conference
-
-
Chong, M.1
Specia, L.2
Mitkov, R.3
-
14
-
-
85026872180
-
METER: Measuring text reuse
-
Philadelphia, PA, USA
-
Clough, P., Gaizauskas, R., Piao, S. S., and Wilks, Y. (2002). METER: MEasuring TExt Reuse. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 152-159, Philadelphia, PA, USA.
-
(2002)
Proceedings of 40th Annual Meeting of the Association for Computational Linguistics
, pp. 152-159
-
-
Clough, P.1
Gaizauskas, R.2
Piao, S.S.3
Wilks, Y.4
-
16
-
-
84887428123
-
Ordinal measures in authorship identification
-
San Sebastian, Spain
-
Dinu, L. P. and Popescu, M. (2009). Ordinal measures in authorship identification. In Proceedings of the 3rd PAN Workshop. Uncovering Plagiarism, Authorship and Social Software Misuse, pages 62-66, San Sebastian, Spain.
-
(2009)
Proceedings of the 3rd PAN Workshop, Uncovering Plagiarism, Authorship and Social Software Misuse
, pp. 62-66
-
-
Dinu, L.P.1
Popescu, M.2
-
18
-
-
3343019470
-
Measuring nominal scale agreement among many raters
-
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378-382.
-
(1971)
Psychological Bulletin
, vol.76
, Issue.5
, pp. 378-382
-
-
Fleiss, J.L.1
-
20
-
-
33645437107
-
The METER corpus: A corpus for analyzing journalistic text reuse
-
Gaizauskas, R., Foster, J., Wilks, Y., Arundel, J., Clough, P., and Piao, S. (2001). The METER Corpus: A corpus for analysing journalistic text reuse. In Proceedings of the Corpus Linguistics 2001 Conference, pages 214-223.
-
(2001)
Proceedings of the Corpus Linguistics 2001 Conference
, pp. 214-223
-
-
Gaizauskas, R.1
Foster, J.2
Wilks, Y.3
Arundel, J.4
Clough, P.5
Piao, S.6
-
22
-
-
0002917116
-
Seven strictures on similarity
-
Goodman, N. editor, Bobbs-Merrill
-
Goodman, N. (1972). Seven strictures on similarity. In Goodman, N., editor, Problems and projects, pages 437-446. Bobbs-Merrill.
-
(1972)
Problems and Projects
, pp. 437-446
-
-
Goodman, N.1
-
24
-
-
76749092270
-
The WEKA data mining software: An update
-
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1):10-18.
-
(2009)
SIGKDD Explorations
, vol.11
, Issue.1
, pp. 10-18
-
-
Hall, M.1
Frank, E.2
Holmes, G.3
Pfahringer, B.4
Reutemann, P.5
Witten, I.H.6
-
25
-
-
84863347445
-
Detecting text similarity over short passages: Exploring linguistic feature combinations via machine learning
-
College Park, MD, USA
-
Hatzivassiloglou, V., Klavans, J. L., and Eskin, E. (1999). Detecting text similarity over short passages: Exploring linguistic feature combinations via machine learning. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 203-212, College Park, MD, USA.
-
(1999)
Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
, pp. 203-212
-
-
Hatzivassiloglou, V.1
Klavans, J.L.2
Eskin, E.3
-
28
-
-
84950419860
-
Advances in record linkage methodology as applied to the 1985 census of Tampa Florida
-
Jaro, M. A. (1989). Advances in record linkage methodology as applied to the 1985 census of Tampa Florida. Journal of the American Statistical Association, 84(406):414-420.
-
(1989)
Journal of the American Statistical Association
, vol.84
, Issue.406
, pp. 414-420
-
-
Jaro, M.A.1
-
30
-
-
33745868242
-
N-gram-based author profiles for authorship attribution
-
Keselj, V., Peng, F., Cercone, N., and Thomas, C. (2003). N-gram-based author profiles for authorship attribution. In Proceedings of the Conference of the Pacific Association for Computational Linguistics, pages 255-264.
-
(2003)
Proceedings of the Conference of the Pacific Association for Computational Linguistics
, pp. 255-264
-
-
Keselj, V.1
Peng, F.2
Cercone, N.3
Thomas, C.4
-
31
-
-
44949230930
-
Europarl: A parallel corpus for statistical machine translation
-
Phuket Island, Thailand
-
Koehn, P. (2005). Europarl: A Parallel Corpus for Statistical Machine Translation. In Proceedings of the 10th Machine Translation Summit, pages 79-86, Phuket Island, Thailand.
-
(2005)
Proceedings of the 10th Machine Translation Summit
, pp. 79-86
-
-
Koehn, P.1
-
32
-
-
85110867932
-
Moses: Open source toolkit for statistical machine translation
-
Prague, Czech Republic
-
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. (2007). Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages 177-180, Prague, Czech Republic.
-
(2007)
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions
, pp. 177-180
-
-
Koehn, P.1
Hoang, H.2
Birch, A.3
Callison-Burch, C.4
Federico, M.5
Bertoldi, N.6
Cowan, B.7
Shen, W.8
Moran, C.9
Zens, R.10
Dyer, C.11
Bojar, O.12
Constantin, A.13
Herbst, E.14
-
33
-
-
80053431219
-
An introduction to latent semantic analysis
-
Landauer, T. K., Foltz, P. W., and Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2):259-284.
-
(1998)
Discourse Processes
, vol.25
, Issue.2
, pp. 259-284
-
-
Landauer, T.K.1
Foltz, P.W.2
Laham, D.3
-
34
-
-
0017360990
-
The measurement of observer agreement for categorical data
-
Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1):159-174.
-
(1977)
Biometrics
, vol.33
, Issue.1
, pp. 159-174
-
-
Landis, J.R.1
Koch, G.G.2
-
37
-
-
0001116877
-
Binary codes capable of correcting deletions, insertions, and reversals
-
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8):707-710.
-
(1966)
Soviet Physics Doklady
, vol.10
, Issue.8
, pp. 707-710
-
-
Levenshtein, V.I.1
-
39
-
-
67650529109
-
A theoretical basis to the automated detection of copying between texts, and its practical implementation in the Ferret plagiarism and collusion detector
-
Lyon, C., Barrett, R., and Malcolm, J. (2004). A theoretical basis to the automated detection of copying between texts, and its practical implementation in the Ferret plagiarism and collusion detector. In In Plagiarism: Prevention, Practice and Policies Conference.
-
(2004)
Plagiarism: Prevention, Practice and Policies Conference
-
-
Lyon, C.1
Barrett, R.2
Malcolm, J.3
-
40
-
-
85126922087
-
Detecting short passages of similar text in large document collections
-
Lyon, C., Malcolm, J., and Dickerson, B. (2001). Detecting short passages of similar text in large document collections. In Proceedings of Conference on Empirical Methods in Natural Language Processing, pages 118-125.
-
(2001)
Proceedings of Conference on Empirical Methods in Natural Language Processing
, pp. 118-125
-
-
Lyon, C.1
Malcolm, J.2
Dickerson, B.3
-
41
-
-
35348911985
-
Detecting near-duplicates for web crawling
-
Banff, AB, Canada
-
Manku, G. S., Jain, A., and Sarma, A. D. (2007). Detecting Near-Duplicates for Web Crawling. In Proceedings of the 16th International World Wide Web Conference, pages 141-149, Banff, AB, Canada.
-
(2007)
Proceedings of the 16th International World Wide Web Conference
, pp. 141-149
-
-
Manku, G.S.1
Jain, A.2
Sarma, A.D.3
-
42
-
-
77955897943
-
MTLD, VOCD-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment
-
McCarthy, P. M. and Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior research methods, 42(2):381- 392.
-
(2010)
Behavior Research Methods
, vol.42
, Issue.2
, pp. 381-392
-
-
McCarthy, P.M.1
Jarvis, S.2
-
43
-
-
33750693384
-
Corpus-based and knowledge-based measures of text semantic similarity
-
Boston, MA, USA
-
Mihalcea, R., Corley, C., and Strapparava, C. (2006). Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In Proceedings of the 21st National Conference on Artificial Intelligence, pages 775-780, Boston, MA, USA.
-
(2006)
Proceedings of the 21st National Conference on Artificial Intelligence
, pp. 775-780
-
-
Mihalcea, R.1
Corley, C.2
Strapparava, C.3
-
44
-
-
0004043396
-
An efficient domain-independent algorithm for detecting approximately duplicate database records
-
Tucson, AZ, USA
-
Monge, A. and Elkan, C. (1997). An efficient domain-independent algorithm for detecting approximately duplicate database records. In Proceedings of the SIGMOD Workshop on Data Mining and Knowledge Discovery, pages 23-29, Tucson, AZ, USA.
-
(1997)
Proceedings of the SIGMOD Workshop on Data Mining and Knowledge Discovery
, pp. 23-29
-
-
Monge, A.1
Elkan, C.2
-
46
-
-
84922022293
-
Overview of the 2nd international competition on plagiarism detection
-
Padua, Italy
-
Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B., and Rosso, P. (2010). Overview of the 2nd International Competition on Plagiarism Detection. In Notebook Papers of CLEF 10 Labs and Workshops, Padua, Italy.
-
(2010)
Notebook Papers of CLEF 10 Labs and Workshops
-
-
Potthast, M.1
Barrón-Cedeño, A.2
Eiselt, A.3
Stein, B.4
Rosso, P.5
-
49
-
-
78650006402
-
Towards document plagiarism detection based on the relevance and fragmentation of the reused text
-
Pachuca, Mexico
-
Sánchez-Vega, F., Villaseñor-Pineda, L., Montes-y-Gómez, M., and Rosso, P. (2010). Towards Document Plagiarism Detection Based on the Relevance and Fragmentation of the Reused Text. In Proceedings of the 9th Mexican International Conference on Artificial Intelligence, pages 24-31, Pachuca, Mexico.
-
(2010)
Proceedings of the 9th Mexican International Conference on Artificial Intelligence
, pp. 24-31
-
-
Sánchez-Vega, F.1
Villaseñor-Pineda, L.2
Montes-Y-Gómez, M.3
Rosso, P.4
-
50
-
-
33748063869
-
Reliability of content analysis: The case of nominal scale coding
-
Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19(3):321-325.
-
(1955)
Public Opinion Quarterly
, vol.19
, Issue.3
, pp. 321-325
-
-
Scott, W.A.1
-
51
-
-
84944486544
-
Prediction and entropy of printed english
-
Shannon, C. E. (1951). Prediction and Entropy of Printed English. Bell System Technical Journal, 30:50-64.
-
(1951)
Bell System Technical Journal
, vol.30
, pp. 50-64
-
-
Shannon, C.E.1
-
52
-
-
84953744816
-
A statistical interpretation of term specificity and its application in retrieval
-
Spärck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11-21.
-
(1972)
Journal of Documentation
, vol.28
, Issue.1
, pp. 11-21
-
-
Spärck Jones, K.1
-
55
-
-
58149411184
-
Features of similarity
-
Tversky, A. (1977). Features of Similarity. In Psychological Review, volume 84, pages 327-352.
-
(1977)
Psychological Review
, vol.84
, pp. 327-352
-
-
Tversky, A.1
-
56
-
-
0008976521
-
String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage
-
Winkler, W. E. (1990). String Comparator Metrics and Enhanced Decision Rules in the Fellegi- Sunter Model of Record Linkage. In Proceedings of the Section on Survey Research Methods, pages 354-359.
-
(1990)
Proceedings of the Section on Survey Research Methods
, pp. 354-359
-
-
Winkler, W.E.1
-
58
-
-
0038468602
-
On sentence-length as a statistical characteristic of style in prose: With application to two cases of disputed authorship
-
Yule, G. U. (1939). On sentence-length as a statistical characteristic of style in prose: With application to two cases of disputed authorship. Biometrika, 30(3/4):363-390.
-
(1939)
Biometrika
, vol.30
, Issue.3-4
, pp. 363-390
-
-
Yule, G.U.1
|