메뉴 건너뛰기




Volumn 375, Issue , 2011, Pages 385-412

Data de-duplication: A review

Author keywords

[No Author keywords available]

Indexed keywords


EID: 80455127017     PISSN: 1860949X     EISSN: None     Source Type: Book Series    
DOI: 10.1007/978-3-642-22913-8_18     Document Type: Review
Times cited : (7)

References (99)
  • 1
    • 12244298488 scopus 로고    scopus 로고
    • In: Proc. of ACM SIGKDD Int. Conf. On Knowledge Discovery and Data Mining Seattle Washington USA
    • Agichtein, E., Ganti, V.:Mining Reference Tables for Automatic Text Segmentation. Proc. of ACM SIGKDD Int. Conf. On Knowledge Discovery and Data Mining, Seattle, Washington, USA, pp. 20-29 (2004)
    • (2004) Mining Reference Tables for Automatic Text Segmentation , pp. 20-29
    • Agichtein, E.1    Ganti, V.2
  • 3
    • 38749118638 scopus 로고    scopus 로고
    • Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
    • Las Vegas Nevada USA
    • Andoni, A., Indyk, P.: Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. Proc. of IEEE Symposium on Foundations of Computer Science, Las Vegas, Nevada, USA, pp. 459-468 (2006)
    • (2006) Proc. of IEEE Symposium on Foundations of Computer Science , pp. 459-468
    • Andoni, A.1    Indyk, P.2
  • 4
    • 37549058056 scopus 로고    scopus 로고
    • Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
    • Andoni, A., Indyk, P.: Near-optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. Communications of the ACM 51(1), 117-122 (2008)
    • (2008) Communications of the ACM , vol.51 , Issue.1 , pp. 117-122
    • Andoni, A.1    Indyk, P.2
  • 7
    • 27944439775 scopus 로고    scopus 로고
    • Modern information retrieval
    • Addison-Wesley
    • Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
    • (1999) Reading
    • Baeza-Yates, R.1    Ribeiro-Neto, B.2
  • 15
    • 77952372966 scopus 로고    scopus 로고
    • Adaptive duplicate detection using learnable string similarity measures
    • proc. of Washington DC USA
    • Bilenko, M., Mooney, R.J.: Adaptive Duplicate Detection Using Learnable String Similarity Measures. Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Washington, DC, USA, pp. 39-48 (2003)
    • (2003) ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining , pp. 39-48
    • Bilenko, M.1    Mooney, R.J.2
  • 28
    • 0000666461 scopus 로고    scopus 로고
    • Data Integration using Similarity Joins and a Word-based Information Representation Language
    • Cohen, W.W.: Data Integration using Similarity Joins and a Word-based Information Representation Language. ACM Trans. on Inf. Syst. 18(3), 228-321 (2000)
    • (2000) ACM Trans. on Inf. Syst. , vol.18 , Issue.3 , pp. 228-321
    • Cohen, W.W.1
  • 30
    • 0242540438 scopus 로고    scopus 로고
    • Learning to match and cluster large high-dimensional data sets for data integration
    • Edmonton Alberta Canada
    • Cohen, W.W., Richman, J.: Learning to Match and Cluster Large High-Dimensional Data Sets for Data Integration. Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 475-480 (2002)
    • (2002) Proc. Of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining , pp. 475-480
    • Cohen, W.W.1    Richman, J.2
  • 31
    • 0028424239 scopus 로고
    • Improving generalization with active learning
    • Cohn, D.A., Atlas, L., Ladner, R.E.: Improving Generalization with Active Learning. Machine Learning 15(2), 201-221 (1994)
    • (1994) Machine Learning , vol.15 , Issue.2 , pp. 201-221
    • Cohn, D.A.1    Atlas, L.2    Ladner, R.E.3
  • 32
    • 76749114248 scopus 로고    scopus 로고
    • An incremental clustering scheme for data deduplication
    • Costa, G., Manco, G., Ortale, R.: An Incremental Clustering Scheme for Data Deduplication. Data Min. and Knowl. Discovery 20(1), 152-187 (2010)
    • (2010) Data Min. and Knowl. Discovery , vol.20 , Issue.1 , pp. 152-187
    • Costa, G.1    Manco, G.2    Ortale, R.3
  • 33
    • 80455148350 scopus 로고    scopus 로고
    • Database Group Leipzig
    • Database Group Leipzig. Benchmark datasets for entity resolution, http://dbs.uni-leipzig.de/en/research/projects/objectmatching/fever/benchmark datasets for entity resolution
    • Benchmark datasets for entity resolution
  • 41
    • 84947935707 scopus 로고    scopus 로고
    • Entity resolution: Overview and challenges
    • Atzeni P., Chu, W. Lu H. Zhou S. Ling T.-W. eds Springer, Heidelberg
    • Garcia-Molina, H.: Entity resolution: Overview and challenges. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 1-2. Springer, Heidelberg (2004)
    • (2004) ER 2004 LNCS , vol.3288 , pp. 1-2
    • Garcia-Molina, H.1
  • 45
    • 0038119396 scopus 로고    scopus 로고
    • Techniques of cluster algorithms in data mining
    • Grabmeier, J., Rudolph, A.: Techniques of Cluster Algorithms in Data Mining. Data Min. and Knowl. Discovery 6(4), 303-360 (2002)
    • (2002) Data Min. and Knowl. Discovery , vol.6 , Issue.4 , pp. 303-360
    • Grabmeier, J.1    Rudolph, A.2
  • 52
    • 70349826301 scopus 로고    scopus 로고
    • Creating probabilistic databases from duplicated data
    • Hassanzadeh, O., Miller, R.J.: Creating Probabilistic Databases from Duplicated Data. The VLDB Journal 18(5), 1141-1166 (2009)
    • (2009) VLDB Journal , vol.18 , Issue.5 , pp. 1141-1166
    • Hassanzadeh, O.1    Miller, R.J.2
  • 53
  • 54
    • 0013331361 scopus 로고    scopus 로고
    • Real-world data is dirty: Data cleansing and the merge/purge problem
    • Hernández, M.A., Stolfo, J.: Real-world Data is Dirty: Data Cleansing and the Merge/Purge Problem. Data Min. and Knowl. Discovery 2(1), 9-37 (1998)
    • (1998) Data Min. and Knowl. Discovery , vol.2 , Issue.1 , pp. 9-37
    • Hernández, M.A.1    Stolfo, J.2
  • 56
    • 0041664272 scopus 로고    scopus 로고
    • Index-driven similarity search in metric spaces
    • Hjatason, G.R., Samet, H.: Index-Driven Similarity Search in Metric Spaces. ACM Trans. on Database Syst. 28(4), 517-518 (2003)
    • (2003) ACM Trans. on Database Syst. , vol.28 , Issue.4 , pp. 517-518
    • Hjatason, G.R.1    Samet, H.2
  • 61
    • 84950419860 scopus 로고
    • Advances in Record Linkage Methodology as Applied to Matching the 1985 Census of Tampa Florida
    • Jaro, M.A.: Advances in Record Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida. Journal of the American Statistical Society 84, 420-424 (1989)
    • (1989) Journal of the American Statistical Society , vol.84 , pp. 420-424
    • Jaro, M.A.1
  • 63
    • 80455148340 scopus 로고    scopus 로고
    • Evaluation of entity resolution approaches on realworld match problems
    • Kopcke, H., Rahm, E.: Frameworks for Entity Matching: A Comparison Data and Know. Engineering 69(2), 197-210 (2010) 64. Kopcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on realworld match problems. Proc. of the VLDB Endowment 3(1), 484-493 (2010)
    • (2010) Proc. of the VLDB Endowment , vol.3 , Issue.1 , pp. 484-493
    • Kopcke, H.1    Thor, A.2    Rahm, E.3
  • 64
    • 77954338155 scopus 로고    scopus 로고
    • Evaluation of learning-based approaches for matching web data entities
    • Kopcke, H., Thor, A., Rahm, E.: Evaluation of Learning-Based Approaches for Matching Web Data Entities. IEEE Internet Computing 14(4), 23-31 (2010)
    • (2010) IEEE Internet Computing , vol.14 , Issue.4 , pp. 23-31
    • Kopcke, H.1    Thor, A.2    Rahm, E.3
  • 68
    • 0035545906 scopus 로고    scopus 로고
    • A knowledge-based approach for duplicate elimination in data cleaning
    • Low, W.L., Lee, M.L., Ling, T.W.: A Knowledge-Based Approach for Duplicate Elimination in Data Cleaning. Information Systems 26(8), 585-606 (2001)
    • (2001) Information Systems , vol.26 , Issue.8 , pp. 585-606
    • Low, W.L.1    Lee, M.L.2    Ling, T.W.3
  • 77
    • 0028959905 scopus 로고
    • Evaluating the quality of anonymous record linkage using deterministic procedures with the New York State AIDS registry and a hospital discharge file
    • Muse, A.G., Mikl, J., Smith, P.F.: Evaluating the quality of anonymous record linkage using deterministic procedures with the New York State AIDS registry and a hospital discharge file. Statistics in Medicine 14, 499-509 (1995)
    • (1995) Statistics in Medicine , vol.14 , pp. 499-509
    • Muse, A.G.1    Mikl, J.2    Smith, P.F.3
  • 78
    • 80455138856 scopus 로고    scopus 로고
    • Proc. KDD Workshop on Data Cleaning Record Linkage and Object Consolidation Washington DC USA
    • Neiling, M., Jurk, S.: The Object Identification Framework. In: Proc. KDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation, Washington, DC, USA, pp. 37-39 (2003)
    • (2003) Object Identification Framework , pp. 37-39
    • Neiling, M.1    Jurk, S.2
  • 79
    • 33750548434 scopus 로고    scopus 로고
    • Privacy issues in research using record linkage
    • Neutel, C.I.: Privacy Issues in Research Using Record Linkage. Pharmcoepidemiology and Drug Safety 6, 367-369 (1997)
    • (1997) Pharmcoepidemiology and Drug Safety , vol.6 , pp. 367-369
    • Neutel, C.I.1
  • 80
    • 0014087577 scopus 로고
    • Record linking: The design of efficient systems for linking records into individual and family histories
    • Newcombe, H.B.: Record Linking: The Design of Efficient Systems for Linking Records into Individual and Family Histories. American Journal of Human Genetics 19, 335-359 (1967)
    • (1967) American Journal of Human Genetics , vol.19 , pp. 335-359
    • Newcombe, H.B.1
  • 84
    • 0242456811 scopus 로고    scopus 로고
    • In: Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining Edmonton Alberta Canada
    • Sarawagi, S., Bhamidipaty, A.: Interactive Deduplication using Active Learning. Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 269-278 (2002)
    • (2002) Interactive Deduplication using Active Learning , pp. 269-278
    • Sarawagi, S.1    Bhamidipaty, A.2
  • 86
    • 57049103006 scopus 로고    scopus 로고
    • Improved approximate detection of duplicates for data streams over sliding windows
    • Shen, H., Zhang, Y.: Improved Approximate Detection of Duplicates for Data Streams over Sliding Windows. Journal of Computer Science and Technology 23(6), 973-987 (2008)
    • (2008) Journal of Computer Science and Technology , vol.23 , Issue.6 , pp. 973-987
    • Shen, H.1    Zhang, Y.2
  • 87
    • 80455136873 scopus 로고    scopus 로고
    • In: Proc. of ACM Int. Ws. on Multi-Relational Data Mining
    • Singla, P., Domingos, P.: Multi-Relational Record Linkage. Proc. of ACM Int. Ws. on Multi-Relational Data Mining, pp. 31-38 (2004)
    • (2004) Multi-Relational Record Linkage , pp. 31-38
    • Singla, P.1    Domingos, P.2
  • 88
    • 0019887799 scopus 로고
    • Identification of common molecular subsequences
    • Smith, S., Waterman, M.S.: Identification of Common Molecular Subsequences. Journal of Molecular Biology 147(1), 195-197 (1981)
    • (1981) Journal of Molecular Biology , vol.147 , Issue.1 , pp. 195-197
    • Smith, S.1    Waterman, M.S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.