메뉴 건너뛰기




Volumn 20, Issue 1, 2010, Pages 152-187

An incremental clustering scheme for data de-duplication

Author keywords

Approximated similarity measures; Clustering mining methods and algorithms; De duplication; Indexing methods and structures; Locality sensitive hashing; Min wise independent permutations; Record classification

Indexed keywords

INDEXING METHODS; LOCALITY SENSITIVE HASHING; MIN-WISE INDEPENDENT PERMUTATIONS; MINING METHODS AND ALGORITHMS; SIMILARITY MEASURE;

EID: 76749114248     PISSN: 13845810     EISSN: None     Source Type: Journal    
DOI: 10.1007/s10618-009-0155-0     Document Type: Article
Times cited : (36)

References (47)
  • 12
    • 44649181012 scopus 로고    scopus 로고
    • Boosting text segmentation via progressive classification
    • 10.1007/s10115-007-0085-3
    • E Cesario F Folino A Locane G Manco R Ortale 2008 Boosting text segmentation via progressive classification J Knowl Inf Syst 15 3 285 320 10.1007/s10115-007-0085-3
    • (2008) J Knowl Inf Syst , vol.15 , Issue.3 , pp. 285-320
    • Cesario, E.1    Folino, F.2    Locane, A.3    Manco, G.4    Ortale, R.5
  • 22
    • 84947399464 scopus 로고
    • A theory for record linkage
    • 10.2307/2286061
    • IP Fellegi AB Sunter 1969 A theory for record linkage J Am Stat Assoc 64 1183 1210 10.2307/2286061
    • (1969) J Am Stat Assoc , vol.64 , pp. 1183-1210
    • Fellegi, I.P.1    Sunter, A.B.2
  • 28
    • 0034228041 scopus 로고    scopus 로고
    • ROCK: A robust clustering algorithm for categorical attributes
    • 10.1016/S0306-4379(00)00022-3
    • S Guha R Rastogi K Shim 2001 ROCK: a robust clustering algorithm for categorical attributes Inf Syst 25 5 345 366 10.1016/S0306-4379(00)00022-3
    • (2001) Inf Syst , vol.25 , Issue.5 , pp. 345-366
    • Guha, S.1    Rastogi, R.2    Shim, K.3
  • 31
    • 0041664272 scopus 로고    scopus 로고
    • Index-driven similarity search in metric spaces
    • 10.1145/958942.958948
    • GR Hjatason H Samet 2003 Index-driven similarity search in metric spaces ACM Trans Database Syst 28 4 517 518 10.1145/958942.958948
    • (2003) ACM Trans Database Syst , vol.28 , Issue.4 , pp. 517-518
    • Hjatason, G.R.1    Samet, H.2
  • 32
    • 0031644241 scopus 로고    scopus 로고
    • Approximate nearest neighbor-towards removing the curse of dimensionality
    • Indyk P, Motwani R (1998) Approximate nearest neighbor-towards removing the curse of dimensionality. In: Proceedings of symposium on theory of computing, pp 604-613
    • (1998) Proceedings of Symposium on Theory of Computing , pp. 604-613
    • Indyk, P.1    Motwani, R.2
  • 34
    • 84893405732 scopus 로고    scopus 로고
    • Data clustering: A review
    • 10.1145/331499.331504
    • AK Jain MN Murty PJ Flynn 1999 Data clustering: a review ACM Comput Surv 31 3 264 323 10.1145/331499.331504
    • (1999) ACM Comput Surv , vol.31 , Issue.3 , pp. 264-323
    • Jain, A.K.1    Murty, M.N.2    Flynn, P.J.3
  • 44
    • 0027113212 scopus 로고
    • Approximate string matching using q-grams and maximal matches
    • 10.1016/0304-3975(92)90143-4 1143139
    • E Ukkonen 1982 Approximate string matching using q-grams and maximal matches Theor Comput Sci 92 1 191 211 10.1016/0304-3975(92)90143-4 1143139
    • (1982) Theor Comput Sci , vol.92 , Issue.1 , pp. 191-211
    • Ukkonen, E.1
  • 46
    • 0008976521 scopus 로고
    • String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage
    • American Statistical Association
    • Winkler WE (1990) String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage. In: Proceedings of the section on survey research methods, American Statistical Association, pp 354-359
    • (1990) Proceedings of the Section on Survey Research Methods , pp. 354-359
    • Winkler, W.E.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.