메뉴 건너뛰기




Volumn , Issue , 2011, Pages 469-474

Partial duplicate detection for large book collections

Author keywords

partial duplicate detection; sequence matching; unique words

Indexed keywords

COMPACT REPRESENTATION; DATA SETS; DUPLICATE DETECTION; LONGEST COMMON SUBSEQUENCES; SEQUENCE MATCHING; UNIQUE WORD;

EID: 83055181761     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/2063576.2063647     Document Type: Conference Paper
Times cited : (15)

References (26)
  • 1
    • 83055196622 scopus 로고    scopus 로고
    • Internet Archive. http://www.archive.org, 2010.
    • (2010)
  • 2
    • 83055196629 scopus 로고    scopus 로고
    • Project Gutenberg. http://www.gutenberg.org, 2010.
    • (2010)
  • 3
    • 84871101442 scopus 로고    scopus 로고
    • A scalable system for identifying co-derivative documents
    • Y. Bernstein and J. Zobel. A scalable system for identifying co-derivative documents. In SPIRE, pages 55-67, 2004.
    • (2004) SPIRE , pp. 55-67
    • Bernstein, Y.1    Zobel, J.2
  • 4
    • 84976810280 scopus 로고
    • Copy detection mechanisms for digital documents
    • S. Brin, J. Davis, and H. Garcia-Molina. Copy detection mechanisms for digital documents. In ACM SIGMOD, pages 398-409, 1995.
    • (1995) ACM SIGMOD , pp. 398-409
    • Brin, S.1    Davis, J.2    Garcia-Molina, H.3
  • 6
    • 0036040277 scopus 로고    scopus 로고
    • Similarity estimation techniques from rounding algorithms
    • M. S. Charikar. Similarity estimation techniques from rounding algorithms. In 34th Ann. ACM Symp. on Theory of computing, pages 380-388, 2002.
    • (2002) 34th Ann. ACM Symp. on Theory of Computing , pp. 380-388
    • Charikar, M.S.1
  • 9
    • 0037481029 scopus 로고    scopus 로고
    • Detecting similar documents using salient terms
    • J. Cooper, A. Coden, and E. Brown. Detecting similar documents using salient terms. In CIKM, pages 245-251, 2002.
    • (2002) CIKM , pp. 245-251
    • Cooper, J.1    Coden, A.2    Brown, E.3
  • 10
    • 77956386944 scopus 로고    scopus 로고
    • Solving longest common subsequence and related problems on graphical processing units
    • July
    • S. Deorowicz. Solving longest common subsequence and related problems on graphical processing units. Softw. Pract. Exper., 40:673-700, July 2010.
    • (2010) Softw. Pract. Exper. , vol.40 , pp. 673-700
    • Deorowicz, S.1
  • 12
    • 34247235660 scopus 로고    scopus 로고
    • A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books
    • S. Feng and R. Manmatha. A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books. In JCDL, pages 109-118, 2006.
    • (2006) JCDL , pp. 109-118
    • Feng, S.1    Manmatha, R.2
  • 13
    • 77956039068 scopus 로고    scopus 로고
    • Adaptive near-duplicate detection via similarity learning
    • H. Hajishirzi, W. tau Yih, and A. Kolcz. Adaptive near-duplicate detection via similarity learning. In SIGIR'10, pages 419-426, 2010.
    • (2010) SIGIR'10 , pp. 419-426
    • Hajishirzi, H.1    Tau Yih, W.2    Kolcz, A.3
  • 15
    • 33750296887 scopus 로고    scopus 로고
    • Finding near-duplicate web pages: A large-scale evaluation of algorithms
    • M. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In ACM SIGIR, pages 284-291, 2006.
    • (2006) ACM SIGIR , pp. 284-291
    • Henzinger, M.1
  • 16
    • 0037319544 scopus 로고    scopus 로고
    • Methods for identifying versioned and plagiarized documents
    • T. C. Hoad and J. Zobel. Methods for identifying versioned and plagiarized documents. JASIST, 54(3):203-215, 2003.
    • (2003) JASIST , vol.54 , Issue.3 , pp. 203-215
    • Hoad, T.C.1    Zobel, J.2
  • 18
    • 0017492836 scopus 로고
    • A fast algorithm for computing longest common subsequences
    • May
    • J. W. Hunt and T. G. Szymanski. A fast algorithm for computing longest common subsequences. Commun. ACM, 20:350-353, May 1977.
    • (1977) Commun. ACM , vol.20 , pp. 350-353
    • Hunt, J.W.1    Szymanski, T.G.2
  • 19
    • 0005180705 scopus 로고    scopus 로고
    • An information-theoretic definition of similarity
    • D. Lin. An information-theoretic definition of similarity. In ICML '98, pages 296-304, 1998.
    • (1998) ICML '98 , pp. 296-304
    • Lin, D.1
  • 20
    • 85043988965 scopus 로고
    • Finding similar files in a large file system
    • U. Manber. Finding similar files in a large file system. In USENIX Winter 1994 Tech. Conf, pages 1-10, 1994.
    • (1994) USENIX Winter 1994 Tech. Conf , pp. 1-10
    • Manber, U.1
  • 21
    • 26944455145 scopus 로고    scopus 로고
    • Hierarchical catalog records: Implementing a FRBR catalog
    • Oct
    • D. Mimno, G. Crane, and A. Jones. Hierarchical catalog records: Implementing a FRBR catalog. In D-Lib Magazine, http://www.dlib.org/dlib/ october05/crane/10crane.html, volume 11, Oct 2005.
    • (2005) D-Lib Magazine , vol.11
    • Mimno, D.1    Crane, G.2    Jones, A.3
  • 22
    • 1142267351 scopus 로고    scopus 로고
    • Winnowing: Local algorithms for document fingerprinting
    • S. Schleimer, D. Wilkerson, and A. Aiken. Winnowing: local algorithms for document fingerprinting. In ACM SIGMOD conference, pages 76-85, 2003.
    • (2003) ACM SIGMOD Conference , pp. 76-85
    • Schleimer, S.1    Wilkerson, D.2    Aiken, A.3
  • 23
    • 57349177560 scopus 로고    scopus 로고
    • Local text reuse detection
    • J. Seo and W. B. Croft. Local text reuse detection. In ACM SIGIR, pages 571-578, 2008.
    • (2008) ACM SIGIR , pp. 571-578
    • Seo, J.1    Croft, W.B.2
  • 26
    • 36349036645 scopus 로고    scopus 로고
    • A new generation of textual corpora: Mining corpora from very large collections
    • G. Stewart, G. Crane, and A. Babeu. A new generation of textual corpora: mining corpora from very large collections. In JCDL, pages 356-365, 2007.
    • (2007) JCDL , pp. 356-365
    • Stewart, G.1    Crane, G.2    Babeu, A.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.