메뉴 건너뛰기




Volumn 6263 LNCS, Issue , 2010, Pages 309-323

An efficient duplicate record detection using q-grams array inverted index

Author keywords

bitmaps; clustering; Duplicate record detection; inverted index; q grams

Indexed keywords

BIT MAPS; CLUSTERING; DUPLICATE RECORD DETECTION; INVERTED INDICES; Q-GRAMS;

EID: 78049385808     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-642-15105-7_25     Document Type: Conference Paper
Times cited : (8)

References (23)
  • 3
    • 84976721642 scopus 로고
    • Data manipulation in heterogeneous databases
    • Chatterjee, A., Segev, A.: Data manipulation in heterogeneous databases. ACM SIGMOD Record 20, 64-68 (1991)
    • (1991) ACM SIGMOD Record , vol.20 , pp. 64-68
    • Chatterjee, A.1    Segev, A.2
  • 4
    • 1142279457 scopus 로고    scopus 로고
    • Robust and efficient fuzzy match for online data cleaning
    • Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: SIGMOD 2003, pp. 313-324 (2003)
    • (2003) SIGMOD 2003 , pp. 313-324
    • Chaudhuri, S.1    Ganjam, K.2    Ganti, V.3    Motwani, R.4
  • 5
    • 78049362549 scopus 로고    scopus 로고
    • Towards scalable real-time entity resolution using a similarity-aware inverted index approach
    • Christen, P., Gayler, R.: Towards scalable real-time entity resolution using a similarity-aware inverted index approach. Proceedings of AusDM 2008, Glenelg, Adelaide 87, 30-39 (2008)
    • (2008) Proceedings of AusDM 2008, Glenelg, Adelaide , vol.87 , pp. 30-39
    • Christen, P.1    Gayler, R.2
  • 7
    • 0242540438 scopus 로고    scopus 로고
    • Learning to match and cluster large high-dimensional data sets for data integration
    • Cohen, W., Richman, J.: Learning to match and cluster large high-dimensional data sets for data integration. In: SIGKDD 2002 (2002)
    • (2002) SIGKDD 2002
    • Cohen, W.1    Richman, J.2
  • 12
    • 84950419860 scopus 로고
    • Advances in record linkage methodology as applied to matching the 1985 census of tampa, florida
    • Jaro, M.A.: Advances in record linkage methodology as applied to matching the 1985 census of tampa, florida. Journal of the American Statistical Society 84, 414-420 (1989)
    • (1989) Journal of the American Statistical Society , vol.84 , pp. 414-420
    • Jaro, M.A.1
  • 13
    • 0000390142 scopus 로고
    • Binary codes capable of correcting deletions, insertions and reversals
    • Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Doklady Akademii Nauk SSSR 163, 845-848 (1965)
    • (1965) Doklady Akademii Nauk SSSR , vol.163 , pp. 845-848
    • Levenshtein, V.I.1
  • 15
    • 0027681165 scopus 로고
    • Suffix arrays: A new method for on-line string searches
    • Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing 22, 935-948 (1993)
    • (1993) SIAM Journal on Computing , vol.22 , pp. 935-948
    • Manber, U.1    Myers, G.2
  • 16
    • 0034592784 scopus 로고    scopus 로고
    • Efficient clustering of high-dimensional data sets with application to reference matching
    • McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: ACM SIGKDD, pp. 169-178 (2000)
    • (2000) ACM SIGKDD , pp. 169-178
    • McCallum, A.1    Nigam, K.2    Ungar, L.H.3
  • 17
    • 0004043396 scopus 로고    scopus 로고
    • An efficient domain-independent algorithm for detecting approximately duplicate database records
    • Monge, A.E., Elkan, C.P.: An efficient domain-independent algorithm for detecting approximately duplicate database records. In: Proceedings of DMKD 1997, pp. 23-29 (1997)
    • (1997) Proceedings of DMKD 1997 , pp. 23-29
    • Monge, A.E.1    Elkan, C.P.2
  • 19
    • 0019887799 scopus 로고
    • Identification of common molecular subsequences
    • Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195-197 (1981)
    • (1981) Journal of Molecular Biology , vol.147 , pp. 195-197
    • Smith, T.F.1    Waterman, M.S.2
  • 20
    • 84947737449 scopus 로고
    • On using q-gram locations in approximate string matching
    • Spirakis, P.G. (ed.) ESA 1995. Springer, Heidelberg
    • Sutinen, E., Tarhio, J.: On using q-gram locations in approximate string matching. In: Spirakis, P.G. (ed.) ESA 1995. LNCS, vol. 979, pp. 327-340. Springer, Heidelberg (1995)
    • (1995) LNCS , vol.979 , pp. 327-340
    • Sutinen, E.1    Tarhio, J.2
  • 21
    • 0027113212 scopus 로고
    • Approximate string matching with q-grams and maximal matches
    • Ukkonen, E.: Approximate string matching with q-grams and maximal matches. Theoretical Computer Science 92, 191-211 (1992)
    • (1992) Theoretical Computer Science , vol.92 , pp. 191-211
    • Ukkonen, E.1
  • 22
    • 0010111194 scopus 로고
    • A binary n-gram technique for automatic correction of substitution, deletion, insertion, and reversal errors in words
    • Ullman, J.: A binary n-gram technique for automatic correction of substitution, deletion, insertion, and reversal errors in words. The Computer Journal 20, 141-147 (1977)
    • (1977) The Computer Journal , vol.20 , pp. 141-147
    • Ullman, J.1
  • 23
    • 2342503573 scopus 로고    scopus 로고
    • The state of record linkage and current research problems
    • Winkler, W.E.: The state of record linkage and current research problems. In: Statistics of Income Division (1999)
    • (1999) Statistics of Income Division
    • Winkler, W.E.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.