SCOPUS 정보 검색 플랫폼

International Conference on Information and Knowledge Management, Proceedings

Volumn , Issue , 2011, Pages 469-474

Partial duplicate detection for large book collections

(3) Yalniz, Ismet Zeki a Can, Ethem F a Manmatha, R a

a Biologically Inspired Neural and Dynamical Systems Laboratory (United States)

Author keywords

partial duplicate detection; sequence matching; unique words

Indexed keywords

COMPACT REPRESENTATION; DATA SETS; DUPLICATE DETECTION; LONGEST COMMON SUBSEQUENCES; SEQUENCE MATCHING; UNIQUE WORD;

OPTICAL CHARACTER RECOGNITION;

KNOWLEDGE MANAGEMENT;

EID: 83055181761 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2063576.2063647 Document Type: Conference Paper

Times cited : (15)

References (26)

1
- 83055196622
- Internet Archive. http://www.archive.org, 2010.
- (2010)

2
- 83055196629
- Project Gutenberg. http://www.gutenberg.org, 2010.
- (2010)

3
- 84871101442
- A scalable system for identifying co-derivative documents
- Y. Bernstein and J. Zobel. A scalable system for identifying co-derivative documents. In SPIRE, pages 55-67, 2004.
- (2004) SPIRE , pp. 55-67
- Bernstein, Y.¹ Zobel, J.²

4
- 84976810280
- Copy detection mechanisms for digital documents
- S. Brin, J. Davis, and H. Garcia-Molina. Copy detection mechanisms for digital documents. In ACM SIGMOD, pages 398-409, 1995.
- (1995) ACM SIGMOD , pp. 398-409
- Brin, S.¹ Davis, J.² Garcia-Molina, H.³

5
- 0010362121
- Syntactic clustering of the web
- A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157-1166, 1997.
- (1997) Computer Networks , vol.29 , Issue.8-13 , pp. 1157-1166
- Broder, A.Z.¹ Glassman, S.C.² Manasse, M.S.³ Zweig, G.⁴

6
- 0036040277
- Similarity estimation techniques from rounding algorithms
- M. S. Charikar. Similarity estimation techniques from rounding algorithms. In 34th Ann. ACM Symp. on Theory of computing, pages 380-388, 2002.
- (2002) 34th Ann. ACM Symp. on Theory of Computing , pp. 380-388
- Charikar, M.S.¹

7
- 0013206133
- Collection statistics for fast duplicate document detection
- DOI 10.1145/506309.506311
- A. Chowdhury, O. Frieder, D. A. Grossman, and M. C. McCabe. Collection statistics for fast duplicate document detection. ACM Trans. Inf. Syst., 20(2):171-191, 2002. (Pubitemid 44642301)
- (2002) ACM Transactions on Information Systems , vol.20 , Issue.2 , pp. 171-191
- Chowdhury, A.¹ Frieder, O.² Grossman, D.³ McCabe, M.C.⁴

8
- 62949095390
- National UK Plagiarism Advisory Service
- P. Clough. Old and new challenges in automatic plagiarism detection. National UK Plagiarism Advisory Service, http://www.ir.shef.ac.uk/cloughie/ papers/pas plagiarism.pdf, 2003.
- (2003) Old and New Challenges in Automatic Plagiarism Detection
- Clough, P.¹

9
- 0037481029
- Detecting similar documents using salient terms
- J. Cooper, A. Coden, and E. Brown. Detecting similar documents using salient terms. In CIKM, pages 245-251, 2002.
- (2002) CIKM , pp. 245-251
- Cooper, J.¹ Coden, A.² Brown, E.³

10
- 77956386944
- Solving longest common subsequence and related problems on graphical processing units
- July
- S. Deorowicz. Solving longest common subsequence and related problems on graphical processing units. Softw. Pract. Exper., 40:673-700, July 2010.
- (2010) Softw. Pract. Exper. , vol.40 , pp. 673-700
- Deorowicz, S.¹

11
- 77953896957
- Identifying duplicate content using statistically improbable phrases
- M. Errami, Z. Sun, A. C. George, T. C. Long, M. A. Skinner, J. D. Wren, and H. R. Garner. Identifying duplicate content using statistically improbable phrases. Bioinformatics, 26(11):1453-1457, 2010.
- (2010) Bioinformatics , vol.26 , Issue.11 , pp. 1453-1457
- Errami, M.¹ Sun, Z.² George, A.C.³ Long, T.C.⁴ Skinner, M.A.⁵ Wren, J.D.⁶ Garner, H.R.⁷

12
- 34247235660
- A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books
- S. Feng and R. Manmatha. A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books. In JCDL, pages 109-118, 2006.
- (2006) JCDL , pp. 109-118
- Feng, S.¹ Manmatha, R.²

13
- 77956039068
- Adaptive near-duplicate detection via similarity learning
- H. Hajishirzi, W. tau Yih, and A. Kolcz. Adaptive near-duplicate detection via similarity learning. In SIGIR'10, pages 419-426, 2010.
- (2010) SIGIR'10 , pp. 419-426
- Hajishirzi, H.¹ Tau Yih, W.² Kolcz, A.³

14
- 83055169673
- Scalable document ngerprinting
- N. Heintze. Scalable document ngerprinting. In USENIX Workshop on Electronic Commerce, 1996.
- USENIX Workshop on Electronic Commerce, 1996
- Heintze, N.¹

15
- 33750296887
- Finding near-duplicate web pages: A large-scale evaluation of algorithms
- M. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In ACM SIGIR, pages 284-291, 2006.
- (2006) ACM SIGIR , pp. 284-291
- Henzinger, M.¹

16
- 0037319544
- Methods for identifying versioned and plagiarized documents
- T. C. Hoad and J. Zobel. Methods for identifying versioned and plagiarized documents. JASIST, 54(3):203-215, 2003.
- (2003) JASIST , vol.54 , Issue.3 , pp. 203-215
- Hoad, T.C.¹ Zobel, J.²

17
- 0003838454
- Technical Report CSTR 41, Bell Laboratories, Murray Hill, NJ
- J. W. Hunt and M. D. McIlroy. An algorithm for differential file comparison. Technical Report CSTR 41, Bell Laboratories, Murray Hill, NJ, 1976.
- (1976) An Algorithm for Differential File Comparison
- Hunt, J.W.¹ McIlroy, M.D.²

18
- 0017492836
- A fast algorithm for computing longest common subsequences
- May
- J. W. Hunt and T. G. Szymanski. A fast algorithm for computing longest common subsequences. Commun. ACM, 20:350-353, May 1977.
- (1977) Commun. ACM , vol.20 , pp. 350-353
- Hunt, J.W.¹ Szymanski, T.G.²

19
- 0005180705
- An information-theoretic definition of similarity
- D. Lin. An information-theoretic definition of similarity. In ICML '98, pages 296-304, 1998.
- (1998) ICML '98 , pp. 296-304
- Lin, D.¹

20
- 85043988965
- Finding similar files in a large file system
- U. Manber. Finding similar files in a large file system. In USENIX Winter 1994 Tech. Conf, pages 1-10, 1994.
- (1994) USENIX Winter 1994 Tech. Conf , pp. 1-10
- Manber, U.¹

21
- 26944455145
- Hierarchical catalog records: Implementing a FRBR catalog
- Oct
- D. Mimno, G. Crane, and A. Jones. Hierarchical catalog records: Implementing a FRBR catalog. In D-Lib Magazine, http://www.dlib.org/dlib/ october05/crane/10crane.html, volume 11, Oct 2005.
- (2005) D-Lib Magazine , vol.11
- Mimno, D.¹ Crane, G.² Jones, A.³

22
- 1142267351
- Winnowing: Local algorithms for document fingerprinting
- S. Schleimer, D. Wilkerson, and A. Aiken. Winnowing: local algorithms for document fingerprinting. In ACM SIGMOD conference, pages 76-85, 2003.
- (2003) ACM SIGMOD Conference , pp. 76-85
- Schleimer, S.¹ Wilkerson, D.² Aiken, A.³

23
- 57349177560
- Local text reuse detection
- J. Seo and W. B. Croft. Local text reuse detection. In ACM SIGIR, pages 571-578, 2008.
- (2008) ACM SIGIR , pp. 571-578
- Seo, J.¹ Croft, W.B.²

24
- 0013273370
- Scam: A copy detection mechanism for digital documents
- N. Shivakumar and H. Garcia-Molina. Scam: A copy detection mechanism for digital documents. In Ann. Conf. on the Theory and Practice of Digital Libraries, 1995.
- Ann. Conf. on the Theory and Practice of Digital Libraries, 1995
- Shivakumar, N.¹ Garcia-Molina, H.²

25
- 12244307637
- Finding near-replicas of documents on the web
- N. Shivakumar and H. Garcia-Molina. Finding near-replicas of documents on the web. In Intl. Workshop on the World Wide Web and Databases, 1999.
- Intl. Workshop on the World Wide Web and Databases, 1999
- Shivakumar, N.¹ Garcia-Molina, H.²

26
- 36349036645
- A new generation of textual corpora: Mining corpora from very large collections
- G. Stewart, G. Crane, and A. Babeu. A new generation of textual corpora: mining corpora from very large collections. In JCDL, pages 356-365, 2007.
- (2007) JCDL , pp. 356-365
- Stewart, G.¹ Crane, G.² Babeu, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.