메뉴 건너뛰기




Volumn , Issue , 2009, Pages 9-16

Record linkage performance for large data sets

Author keywords

Deduplication; Memoization; Record linkage

Indexed keywords

DEDUPLICATION; LARGE DATASETS; MEMOIZATION; ORDER OF MAGNITUDE; PROCESSING SPEED; RECORD LINKAGE; SLIDING WINDOW; SPEED-UPS; STRING COMPARISON; VALUE DISTRIBUTION;

EID: 74049107745     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1651449.1651453     Document Type: Conference Paper
Times cited : (2)

References (18)
  • 1
    • 85051855725 scopus 로고    scopus 로고
    • R. Baeza-Yates. Searching the web: Challenges and partial solutions. In Proc. of IEEE SPIRE, 1998.
    • R. Baeza-Yates. Searching the web: Challenges and partial solutions. In Proc. of IEEE SPIRE, 1998.
  • 2
    • 1142279457 scopus 로고    scopus 로고
    • S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani. Robust and efficient fuzzy match for online data cleaning. In Proc. of ACM SIGMOD, 2003.
    • S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani. Robust and efficient fuzzy match for online data cleaning. In Proc. of ACM SIGMOD, 2003.
  • 3
    • 74049122877 scopus 로고    scopus 로고
    • C. Clifton, et al. Privacy-preserving data integration and sharing. In Proc. of ACM DMKD, 2004.
    • C. Clifton, et al. Privacy-preserving data integration and sharing. In Proc. of ACM DMKD, 2004.
  • 4
    • 67650700151 scopus 로고    scopus 로고
    • Accurate Synthetic Generation of Realistic Personal Information
    • Proc. of PAKDD
    • P. Christen, A. Pudjijono. Accurate Synthetic Generation of Realistic Personal Information. In Proc. of PAKDD, LNAI 5476, 2009.
    • (2009) LNAI , vol.5476
    • Christen, P.1    Pudjijono, A.2
  • 5
    • 74049152619 scopus 로고    scopus 로고
    • P. Christen, R. Gayler and D. Hawking. Similarity-aware indexing for real-time entity resolution. In Proc. of ACM CIKM, Hong Kong 2009.
    • P. Christen, R. Gayler and D. Hawking. Similarity-aware indexing for real-time entity resolution. In Proc. of ACM CIKM, Hong Kong 2009.
  • 6
    • 74049138802 scopus 로고    scopus 로고
    • Development and User Experiences of an Open Source Data Cleaning, Deduplication and Record Linkage System
    • August
    • P. Christen. Development and User Experiences of an Open Source Data Cleaning, Deduplication and Record Linkage System. In SIGKDD Explorations, 11(1), August 2009.
    • (2009) In SIGKDD Explorations , vol.11 , Issue.1
    • Christen, P.1
  • 7
    • 0242540438 scopus 로고    scopus 로고
    • W. Cohen and J. Richman. Learning to match and cluster large high-dimensional data sets for data integration. In Proc. of ACM SIGKDD, 2002.
    • W. Cohen and J. Richman. Learning to match and cluster large high-dimensional data sets for data integration. In Proc. of ACM SIGKDD, 2002.
  • 8
    • 0002629270 scopus 로고
    • Maximum likelihood from incomplete data via the EM algorithm
    • A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. of RSS, 39(B), 1977.
    • (1977) J. of RSS , vol.39 , Issue.B
    • Dempster, A.P.1    Laird, N.M.2    Rubin, D.B.3
  • 9
  • 10
    • 84880467474 scopus 로고    scopus 로고
    • L. Gravano, P. Ipeirotis, N. Koudas, and D. Srivastava. Text joins in an rdbms for web data integration. In Proc. of WWW Conf., 2003.
    • L. Gravano, P. Ipeirotis, N. Koudas, and D. Srivastava. Text joins in an rdbms for web data integration. In Proc. of WWW Conf., 2003.
  • 11
    • 85198670080 scopus 로고    scopus 로고
    • M. Hernandez and S. Stolfo. The merge/purge problem for large databases. In Proc. of ACM SIGMOD, 1995.
    • M. Hernandez and S. Stolfo. The merge/purge problem for large databases. In Proc. of ACM SIGMOD, 1995.
  • 12
    • 84950419860 scopus 로고
    • Advances in record linkage methodology as applied to matching the 1985 census of Tampa, Florida
    • M. A. Jaro. Advances in record linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. of the ASA, 84(406), 1989.
    • (1989) J. of the ASA , vol.84 , Issue.406
    • Jaro, M.A.1
  • 13
    • 84943425383 scopus 로고    scopus 로고
    • L. Jin, C. Li, and S. Mehrotra. Efficient record linkage in large data sets. In Proc. of DASFAA, 2003.
    • L. Jin, C. Li, and S. Mehrotra. Efficient record linkage in large data sets. In Proc. of DASFAA, 2003.
  • 14
    • 0001116877 scopus 로고
    • Binary codes capable of correcting deletions, insertions, and reversals
    • V. I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 8, 1966.
    • (1966) Soviet Physics Doklady , vol.8
    • Levenshtein, V.I.1
  • 15
    • 0002089617 scopus 로고    scopus 로고
    • Matching algorithms within a duplicate detection system
    • A. E. Monge. Matching algorithms within a duplicate detection system. IEEE Data Engineering Bulletin, 23(4), 2000.
    • (2000) IEEE Data Engineering Bulletin , vol.23 , Issue.4
    • Monge, A.E.1
  • 17
    • 74049122197 scopus 로고    scopus 로고
    • S. Y. Sung, Z. Li, and S. Peng. A fast filtering scheme for large database cleansing. In Proc. of ACM CIKM, 2002.
    • S. Y. Sung, Z. Li, and S. Peng. A fast filtering scheme for large database cleansing. In Proc. of ACM CIKM, 2002.
  • 18
    • 74049110997 scopus 로고
    • Near automatic weight computation in the Fellegi-Sunter model of record linkage
    • W. E. Winkler. Near automatic weight computation in the Fellegi-Sunter model of record linkage. Proc. of the Census Bureau Research Conf., 1988.
    • (1988) Proc. of the Census Bureau Research Conf
    • Winkler, W.E.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.