메뉴 건너뛰기




Volumn 24, Issue 9, 2012, Pages 1537-1555

A survey of indexing techniques for scalable record linkage and deduplication

Author keywords

Blocking; Data linkage; Data matching; Entity resolution; Experimental evaluation; Index techniques; Scalability

Indexed keywords

CLEANING; DATA MINING; DATABASE SYSTEMS; INDEXING (OF INFORMATION); SCALABILITY; SURVEYS;

EID: 84920595044     PISSN: 10414347     EISSN: None     Source Type: Journal    
DOI: 10.1109/TKDE.2011.127     Document Type: Article
Times cited : (565)

References (61)
  • 1
    • 2942709772 scopus 로고    scopus 로고
    • Methods for evaluating and creating data quality
    • W.E. Winkler, "Methods for Evaluating and Creating Data Quality," Elsevier Information Systems, vol. 29, no. 7, pp. 531-550, 2004.
    • (2004) Elsevier Information Systems , vol.29 , Issue.7 , pp. 531-550
    • Winkler, W.E.1
  • 2
    • 4344570142 scopus 로고    scopus 로고
    • Practical introduction to record linkage for injury research
    • D.E. Clark, "Practical Introduction to Record Linkage for Injury Research," Injury Prevention, vol. 10, pp. 186-191, 2004.
    • (2004) Injury Prevention , vol.10 , pp. 186-191
    • Clark, D.E.1
  • 3
    • 0036450652 scopus 로고    scopus 로고
    • Research use of linked health data - A best practice protocol
    • C.W. Kelman, J. Bass, and D. Holman, "Research Use of Linked Health Data - A Best Practice Protocol," Australian NZ J. Public Health, vol. 26, pp. 251-255, 2002.
    • (2002) Australian NZ J. Public Health , vol.26 , pp. 251-255
    • Kelman, C.W.1    Bass, J.2    Holman, D.3
  • 5
    • 45849148052 scopus 로고    scopus 로고
    • Effective counterterrorism and the limited role of predictive data mining
    • J. Jonas and J. Harper, "Effective Counterterrorism and the Limited Role of Predictive Data Mining," Policy Analysis, no. 584, pp. 1-11, 2006.
    • (2006) Policy Analysis , Issue.584 , pp. 1-11
    • Jonas, J.1    Harper, J.2
  • 7
    • 77649261370 scopus 로고    scopus 로고
    • Record matching over query results from multiple web databases
    • Apr.
    • W. Su, J. Wang, and F.H. Lochovsky, "Record Matching over Query Results from Multiple Web Databases," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 4, pp. 578-589, Apr. 2010.
    • (2010) IEEE Trans. Knowledge and Data Eng. , vol.22 , Issue.4 , pp. 578-589
    • Su, W.1    Wang, J.2    Lochovsky, F.H.3
  • 8
    • 33746054079 scopus 로고    scopus 로고
    • Adaptive product normalization: Using online learning for record linkage in comparison shopping
    • M. Bilenko, S. Basu, and M. Sahami, "Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping," Proc. IEEE Int'l Conf. Data Mining (ICDM '05), pp. 58-65, 2005.
    • (2005) Proc. IEEE Int'l Conf. Data Mining (ICDM '05) , pp. 58-65
    • Bilenko, M.1    Basu, S.2    Sahami, M.3
  • 9
    • 33846428121 scopus 로고    scopus 로고
    • Quality and complexity measures for data linkage and deduplication
    • ser. Studies in Computational Intelligence, F. Guillet and H. Hamilton, eds. Springer
    • P. Christen and K. Goiser, "Quality and Complexity Measures for Data Linkage and Deduplication," Quality Measures in Data Mining, ser. Studies in Computational Intelligence, F. Guillet and H. Hamilton, eds., vol. 43, Springer, pp. 127-151, 2007.
    • (2007) Quality Measures in Data Mining , vol.43 , pp. 127-151
    • Christen, P.1    Goiser, K.2
  • 11
    • 84947399464 scopus 로고
    • A theory for record linkage
    • I.P. Fellegi and A.B. Sunter, "A Theory for Record Linkage," J. Am. Statistical Soc., vol. 64, no. 328, pp. 1183-1210, 1969.
    • (1969) J. Am. Statistical Soc. , vol.64 , Issue.328 , pp. 1183-1210
    • Fellegi, I.P.1    Sunter, A.B.2
  • 13
    • 0032091575 scopus 로고    scopus 로고
    • Integration of heterogeneous databases without common domains using queries based on textual similarity
    • W.W. Cohen, "Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '98), pp. 201-212, 1998.
    • (1998) Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '98) , pp. 201-212
    • Cohen, W.W.1
  • 15
    • 0002490026 scopus 로고    scopus 로고
    • Data cleaning: Problems and current approaches
    • Dec.
    • E. Rahm and H.H. Do, "Data Cleaning: Problems and Current Approaches," IEEE Technical Committee Data Eng. Bull., vol. 23, no. 4, pp. 3-13, Dec. 2000.
    • (2000) IEEE Technical Committee Data Eng. Bull. , vol.23 , Issue.4 , pp. 3-13
    • Rahm, E.1    Do, H.H.2
  • 26
    • 65449178105 scopus 로고    scopus 로고
    • Febrl: An open source data cleaning, deduplication and record linkage system with a graphical user interface
    • P. Christen, "Febrl: An Open Source Data Cleaning, Deduplication and Record Linkage System With a Graphical User Interface," Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '08), pp. 1065-1068, 2008.
    • (2008) Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '08) , pp. 1065-1068
    • Christen, P.1
  • 27
    • 84870515729 scopus 로고    scopus 로고
    • Decision models for record linkage
    • LNCS 3755, Springer
    • L. Gu and R. Baxter, "Decision Models for Record Linkage," Selected Papers from AusDM, LNCS 3755, Springer, 2006.
    • (2006) Selected Papers from AusDM
    • Gu, L.1    Baxter, R.2
  • 34
    • 0013331361 scopus 로고    scopus 로고
    • Real-world data is dirty: Data cleansing and the merge/Purge problem
    • M.A. Hernandez and S.J. Stolfo, "Real-World Data is Dirty: Data Cleansing and the Merge/Purge Problem," Data Mining and Knowledge Discovery, vol. 2, no. 1, pp. 9-37, 1998.
    • (1998) Data Mining and Knowledge Discovery , vol.2 , Issue.1 , pp. 9-37
    • Hernandez, M.A.1    Stolfo, S.J.2
  • 39
    • 65449165865 scopus 로고    scopus 로고
    • Towards parameter-free blocking for scalable record linkage
    • The Australian Nat'l Univ.
    • P. Christen, "Towards Parameter-Free Blocking for Scalable Record Linkage," Technical Report TR-CS-07-03, Dept. of Computer Science, The Australian Nat'l Univ., 2007.
    • (2007) Technical Report TR-CS-07-03, Dept. of Computer Science
    • Christen, P.1
  • 41
    • 84888417083 scopus 로고    scopus 로고
    • A comparison and generalization of blocking and windowing algorithms for duplicate detection
    • U. Draisbach and F. Naumann, "A Comparison and Generalization of Blocking and Windowing Algorithms for Duplicate Detection," Proc. Workshop Quality in Databases (VLDB '09), 2009.
    • (2009) Proc. Workshop Quality in Databases (VLDB '09)
    • Draisbach, U.1    Naumann, F.2
  • 46
    • 84976803260 scopus 로고
    • Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets
    • C. Faloutsos and K.-I. Lin, "Fastmap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '95), pp. 163-174, 1995.
    • (1995) Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '95) , pp. 163-174
    • Faloutsos, C.1    Lin, K.-I.2
  • 48
    • 84920600570 scopus 로고    scopus 로고
    • Efficient record linkage using a double embedding scheme
    • N. Adly, "Efficient Record Linkage Using a Double Embedding Scheme," Proc. Int'l Conf. Data Mining (DMIN '09), pp. 274-281, 2009.
    • (2009) Proc. Int'l Conf. Data Mining (DMIN '09) , pp. 274-281
    • Adly, N.1
  • 54
    • 79251527493 scopus 로고    scopus 로고
    • Efficient techniques for online record linkage
    • Mar.
    • D. Dey, V. Mookerjee, and D. Liu, "Efficient Techniques for Online Record Linkage," IEEE Trans. Knowledge and Data Eng., vol. 23, no. 3, pp. 373-387, Mar. 2011.
    • (2011) IEEE Trans. Knowledge and Data Eng. , vol.23 , Issue.3 , pp. 373-387
    • Dey, D.1    Mookerjee, V.2    Liu, D.3
  • 56
    • 67649641448 scopus 로고    scopus 로고
    • Space-constrained gram-based indexing for efficient approximate string search
    • A. Behm, S. Ji, C. Li, and J. Lu, "Space-Constrained Gram-Based Indexing for Efficient Approximate String Search," Proc. IEEE Int'l Conf. Data Eng. (ICDE '09), pp. 604-615, 2009.
    • (2009) Proc. IEEE Int'l Conf. Data Eng. (ICDE '09) , pp. 604-615
    • Behm, A.1    Ji, S.2    Li, C.3    Lu, J.4
  • 59
    • 70849105253 scopus 로고    scopus 로고
    • Ed-join: An efficient algorithm for similarity joins with edit distance constraints
    • C. Xiao, W. Wang, and X. Lin, "Ed-Join: An Efficient Algorithm for Similarity Joins with Edit Distance Constraints," Proc. VLDB Endowment, vol. 1, no. 1, pp. 933-944, 2008.
    • (2008) Proc. VLDB Endowment , vol.1 , Issue.1 , pp. 933-944
    • Xiao, C.1    Wang, W.2    Lin, X.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.