메뉴 건너뛰기




Volumn 36, Issue 3, 2011, Pages

Efficient similarity joins for near-duplicate detection

Author keywords

Near duplicate detection; Similarity join

Indexed keywords

EFFICIENT ALGORITHM; FILTERING TECHNIQUE; MULTIPLE DATA SOURCES; NEAR-DUPLICATE DETECTION; REAL DATA SETS; SIMILARITY JOIN;

EID: 80052344031     PISSN: 03625915     EISSN: 15574644     Source Type: Journal    
DOI: 10.1145/2000824.2000825     Document Type: Article
Times cited : (309)

References (53)
  • 20
    • 0033075316 scopus 로고    scopus 로고
    • Combining fuzzy information from multiple systems
    • FAGIN, R. 1999. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci. 58, 1, 83-99.
    • (1999) J. Comput. Syst. Sci. , vol.58 , Issue.1 , pp. 83-99
    • Fagin, R.1
  • 22
    • 0038504811 scopus 로고    scopus 로고
    • Optimal aggregation algorithms for middleware
    • FAGIN, R., LOTEM, A., AND NAOR, M. 2003b. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66, 4, 614-656.
    • (2003) J. Comput. Syst. Sci. , vol.66 , Issue.4 , pp. 614-656
    • Fagin, R.1    Lotem, A.2    Naor, M.3
  • 31
    • 0013331361 scopus 로고    scopus 로고
    • Real-world data is dirty: Data cleansing and the merge/purge problem
    • HERNANDEZ, M. A. AND STOLFO, S. J. 1998. Real-World data is dirty: Data cleansing and the merge/purge problem. Data Min. Knowl. Discov. 2, 1, 9-37. (Pubitemid 128696797)
    • (1998) Data Mining and Knowledge Discovery , vol.2 , Issue.1 , pp. 9-37
    • Hernandez, M.A.1    Stolfo, S.J.2
  • 32
    • 0037319544 scopus 로고    scopus 로고
    • Methods for identifying versioned and plagiarized documents
    • HOAD, T. C. AND ZOBEL, J. 2003. Methods for identifying versioned and plagiarized documents. J. Amer. Soc. Inf. Sci. Technol. 54, 3, 203-215.
    • (2003) J. Amer. Soc. Inf. Sci. Technol. , vol.54 , Issue.3 , pp. 203-215
    • Hoad, T.C.1    Zobel, J.2
  • 39
    • 35348911985 scopus 로고    scopus 로고
    • Detecting near-duplicates for web crawling
    • DOI 10.1145/1242572.1242592, 16th International World Wide Web Conference, WWW2007
    • MANKU, G. S., JAIN, A., AND SARMA, A. D. 2007. Detecting near-duplicates for web crawling. In Proceedings of the International World Wide Web Conference (WWW'07). 141-150. (Pubitemid 47582246)
    • (2007) 16th International World Wide Web Conference, WWW2007 , pp. 141-150
    • Manku, G.S.1    Jain, A.2    Das Sarma, A.3
  • 40
    • 0345566149 scopus 로고    scopus 로고
    • A guided tour to approximate string matching
    • NAVARRO, G. 2001. A guided tour to approximate string matching. ACM Comput. Surv. 33, 1, 31-88. (Pubitemid 33768480)
    • (2001) ACM Computing Surveys , vol.33 , Issue.1 , pp. 31-88
    • Navarro, G.1
  • 41
    • 80052344988 scopus 로고
    • U.S. patent
    • RUSSELL, R. C. 1918. Index. U.S. patent 1, 261, 167.
    • (1918) Index , vol.1 , Issue.261 , pp. 167
    • Russell, R.C.1
  • 50
    • 70849105253 scopus 로고    scopus 로고
    • Ed-Join: An efficient algorithm for similarity joins with edit distance constraints
    • XIAO, C., WANG, W., AND LIN, X. 2008a. Ed-Join: An efficient algorithm for similarity joins with edit distance constraints. Proc. VLDB 1, 1, 933-944.
    • (2008) Proc. VLDB , vol.1 , Issue.1 , pp. 933-944
    • Xiao, C.1    Wang, W.2    Lin, X.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.