메뉴 건너뛰기




Volumn 5, Issue 4, 2015, Pages 281-393

Trends in cleaning relational data: Consistency and deduplication

Author keywords

[No Author keywords available]

Indexed keywords

DATABASE SYSTEMS; ERROR DETECTION; ERRORS; INFORMATION MANAGEMENT; REPAIR; TAXONOMIES;

EID: 84958053976     PISSN: 19317883     EISSN: 19317891     Source Type: Journal    
DOI: 10.1561/1900000045     Document Type: Article
Times cited : (136)

References (127)
  • 3
    • 56349095491 scopus 로고    scopus 로고
    • Aggregating inconsistent information: Ranking and clustering
    • N. Ailon, M. Charikar, and A. Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008.
    • (2008) Journal of the ACM (JACM) , vol.55 , Issue.5 , pp. 23
    • Ailon, N.1    Charikar, M.2    Newman, A.3
  • 9
    • 80052917068 scopus 로고    scopus 로고
    • Sampling the repairs of functional dependency violations under hard constraints
    • G. Beskales, I. F. Ilyas, and L. Golab. Sampling the repairs of functional dependency violations under hard constraints. Proceedings of the VLDB Endowment, 3(1-2):197-207, 2010.
    • (2010) Proceedings of the VLDB Endowment , vol.3 , Issue.1-2 , pp. 197-207
    • Beskales, G.1    Ilyas, I.F.2    Golab, L.3
  • 11
    • 84892822296 scopus 로고    scopus 로고
    • Sampling from repairs of conditional functional dependency violations
    • G. Beskales, I. F. Ilyas, L. Golab, and A. Galiullin. Sampling from repairs of conditional functional dependency violations. The VLDB Journal, 23(1):103-128, 2014.
    • (2014) The VLDB Journal , vol.23 , Issue.1 , pp. 103-128
    • Beskales, G.1    Ilyas, I.F.2    Golab, L.3    Galiullin, A.4
  • 24
    • 14744293228 scopus 로고    scopus 로고
    • Minimal-change integrity maintenance using tuple deletions
    • J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Information and Computation, 197(1):90-121, 2005.
    • (2005) Information and Computation , vol.197 , Issue.1 , pp. 90-121
    • Chomicki, J.1    Marcinkowski, J.2
  • 29
    • 84968376348 scopus 로고    scopus 로고
    • 7 facts about data quality
    • S. Clemens. 7 facts about data quality. InsightSquared, 2012.
    • (2012) InsightSquared
    • Clemens, S.1
  • 30
    • 0032091575 scopus 로고    scopus 로고
    • Integration of heterogeneous databases without common domains using queries based on textual similarity
    • W. W. Cohen. Integration of heterogeneous databases without common domains using queries based on textual similarity. In ACM SIGMOD Record, volume 27, pages 201-212, 1998.
    • (1998) ACM SIGMOD Record , vol.27 , pp. 201-212
    • Cohen, W.W.1
  • 35
    • 37549003336 scopus 로고    scopus 로고
    • Mapreduce: Simplified data processing on large clusters
    • J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107-113, 2008.
    • (2008) Communications of the ACM , vol.51 , Issue.1 , pp. 107-113
    • Dean, J.1    Ghemawat, S.2
  • 38
    • 2342615638 scopus 로고    scopus 로고
    • Profile-based object matching for information integration
    • A. Doan, Y. Lu, Y. Lee, and J. Han. Profile-based object matching for information integration. IEEE Intelligent Systems, 18(5):54-59, 2003.
    • (2003) IEEE Intelligent Systems , vol.18 , Issue.5 , pp. 54-59
    • Doan, A.1    Lu, Y.2    Lee, Y.3    Han, J.4
  • 39
    • 84863067746 scopus 로고    scopus 로고
    • Data fusion: Resolving data conflicts for integration
    • X. L. Dong and F. Naumann. Data fusion: resolving data conflicts for integration. Proceedings of the VLDB Endowment, 2(2):1654-1655, 2009.
    • (2009) Proceedings of the VLDB Endowment , vol.2 , Issue.2 , pp. 1654-1655
    • Dong, X.L.1    Naumann, F.2
  • 41
    • 74549151555 scopus 로고    scopus 로고
    • You talking to me? A corpus and algorithm for conversation disentanglement
    • M. Elsner and E. Charniak. You talking to me? a corpus and algorithm for conversation disentanglement. In Association for Computational Linguistics (ACL), pages 834-842, 2008.
    • (2008) Association for Computational Linguistics (ACL) , pp. 834-842
    • Elsner, M.1    Charniak, E.2
  • 43
    • 84958554013 scopus 로고    scopus 로고
    • Detecting errors in numeric attributes
    • Springer
    • G. Fan, W. Fan, and F. Geerts. Detecting errors in numeric attributes. In Web-Age Information Management, pages 125-137. Springer, 2014.
    • (2014) Web-Age Information Management , pp. 125-137
    • Fan, G.1    Fan, W.2    Geerts, F.3
  • 50
    • 84858615261 scopus 로고    scopus 로고
    • Towards certain fixes with editing rules and master data
    • W. Fan, J. Li, S. Ma, N. Tang, and W. Yu. Towards certain fixes with editing rules and master data. Proceedings of the VLDB Endowment, 3(1-2):173-184, 2010.
    • (2010) Proceedings of the VLDB Endowment , vol.3 , Issue.1-2 , pp. 173-184
    • Fan, W.1    Li, J.2    Ma, S.3    Tang, N.4    Yu, W.5
  • 58
    • 84873162472 scopus 로고    scopus 로고
    • Entity resolution: Theory, practice & open challenges
    • L. Getoor and A. Machanavajjhala. Entity resolution: theory, practice & open challenges. Proceedings of the VLDB Endowment, 5(12):2018-2019, 2012.
    • (2012) Proceedings of the VLDB Endowment , vol.5 , Issue.12 , pp. 2018-2019
    • Getoor, L.1    Machanavajjhala, A.2
  • 68
    • 84976856849 scopus 로고
    • The merge/purge problem for large databases
    • M. A. Hernández and S. J. Stolfo. The merge/purge problem for large databases. ACM SIGMOD Record, 24(2):127-138, 1995.
    • (1995) ACM SIGMOD Record , vol.24 , Issue.2 , pp. 127-138
    • Hernández, M.A.1    Stolfo, S.J.2
  • 69
    • 0013331361 scopus 로고    scopus 로고
    • Real-world data is dirty: Data cleansing and the merge/purge problem
    • M. A. Hernández and S. J. Stolfo. Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery, 2(1):9-37, 1998.
    • (1998) Data Mining and Knowledge Discovery , vol.2 , Issue.1 , pp. 9-37
    • Hernández, M.A.1    Stolfo, S.J.2
  • 71
    • 0345201769 scopus 로고    scopus 로고
    • TANE: An efficient algorithm for discovering functional and approximate dependencies
    • Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen. TANE: An efficient algorithm for discovering functional and approximate dependencies. Computer Journal, 42(2):100-111, 1999.
    • (1999) Computer Journal , vol.42 , Issue.2 , pp. 100-111
    • Huhtala, Y.1    Kärkkäinen, J.2    Porkka, P.3    Toivonen, H.4
  • 73
    • 0004037050 scopus 로고
    • Unimatch: A record linkage system: User's manual
    • M. A. Jaro. Unimatch: A record linkage system: User's manual. U.S. Bureau of the Census, 1976.
    • (1976) U.S. Bureau of the Census
    • Jaro, M.A.1
  • 78
    • 84872977079 scopus 로고    scopus 로고
    • Dedoop: Efficient deduplication with hadoop
    • L. Kolb, A. Thor, and E. Rahm. Dedoop: efficient deduplication with hadoop. Proceedings of the VLDB Endowment, 5(12):1878-1881, 2012.
    • (2012) Proceedings of the VLDB Endowment , vol.5 , Issue.12 , pp. 1878-1881
    • Kolb, L.1    Thor, A.2    Rahm, E.3
  • 82
    • 0001116877 scopus 로고
    • Binary codes capable of correcting deletions, insertions and reversals
    • V. I. Levenshtein. Binary codes capable of correcting deletions, insertions and reversals. In Soviet Physics Doklady, volume 10, page 707, 1966.
    • (1966) Soviet Physics Doklady , vol.10 , pp. 707
    • Levenshtein, V.I.1
  • 83
    • 84878700965 scopus 로고    scopus 로고
    • Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics
    • A. Lopatenko and L. E. Bertossi. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In 11th International Conference on Database Theory, pages 179-193, 2007.
    • (2007) 11th International Conference on Database Theory , pp. 179-193
    • Lopatenko, A.1    Bertossi, L.E.2
  • 84
    • 84890119754 scopus 로고    scopus 로고
    • Extending inclusion dependencies with conditions
    • S. Ma, W. Fan, and L. Bravo. Extending inclusion dependencies with conditions. Theoretical Computer Science, 515:64-95, 2014.
    • (2014) Theoretical Computer Science , vol.515 , pp. 64-95
    • Ma, S.1    Fan, W.2    Bravo, L.3
  • 89
    • 85018108837 scopus 로고    scopus 로고
    • The field matching problem: Algorithms and applications
    • A. E. Monge and C. Elkan. The field matching problem: Algorithms and applications. In Knowledge Discovery and Data Mining, pages 267-270, 1996.
    • (1996) Knowledge Discovery and Data Mining , pp. 267-270
    • Monge, A.E.1    Elkan, C.2
  • 93
    • 0002490026 scopus 로고    scopus 로고
    • Data cleaning: Problems and current approaches
    • E. Rahm and H. H. Do. Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23:2000, 2000.
    • (2000) IEEE Data Engineering Bulletin , vol.23 , pp. 2000
    • Rahm, E.1    Do, H.H.2
  • 95
    • 84958034478 scopus 로고
    • Apr. 2 US Patent
    • R. Russell. Index., Apr. 2 1918. US Patent 1,261,167.
    • (1918) Index
    • Russell, R.1
  • 97
    • 84905106551 scopus 로고    scopus 로고
    • Clusterjoin: A similarity joins framework using map-reduce
    • A. D. Sarma, Y. He, and S. Chaudhuri. Clusterjoin: A similarity joins framework using map-reduce. Proceedings of the VLDB Endowment, 7(12):1059-1070, 2014.
    • (2014) Proceedings of the VLDB Endowment , vol.7 , Issue.12 , pp. 1059-1070
    • Sarma, A.D.1    He, Y.2    Chaudhuri, S.3
  • 103
    • 0039891959 scopus 로고    scopus 로고
    • A machine learning approach to coreference resolution of noun phrases
    • W. M. Soon, H. T. Ng, and D. C. Y. Lim. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521-544, 2001.
    • (2001) Computational Linguistics , vol.27 , Issue.4 , pp. 521-544
    • Soon, W.M.1    Ng, H.T.2    Lim, D.C.Y.3
  • 106
    • 48249116542 scopus 로고    scopus 로고
    • Gartner warns firms of "dirty data"
    • N. Swartz. Gartner warns firms of "dirty data". Information Management Journal, 41(3), 2007.
    • (2007) Information Management Journal , vol.41 , Issue.3
    • Swartz, N.1
  • 107
    • 32444450026 scopus 로고
    • Special report (New York State Identification and Intelligence System). Bureau of Systems Development, New York State Identification and Intelligence System
    • R. Taft. Name Search Techniques. Special report (New York State Identification and Intelligence System). Bureau of Systems Development, New York State Identification and Intelligence System, 1970.
    • (1970) Name Search Techniques
    • Taft, R.1
  • 108
    • 0035545848 scopus 로고    scopus 로고
    • Learning object identification rules for information integration
    • S. Tejada, C. A. Knoblock, and S. Minton. Learning object identification rules for information integration. Information Systems, 26(8):607-633, 2001.
    • (2001) Information Systems , vol.26 , Issue.8 , pp. 607-633
    • Tejada, S.1    Knoblock, C.A.2    Minton, S.3
  • 109
    • 0042868698 scopus 로고    scopus 로고
    • Support vector machine active learning with applications to text classification
    • S. Tong and D. Koller. Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research, 2:45-66, 2002.
    • (2002) The Journal of Machine Learning Research , vol.2 , pp. 45-66
    • Tong, S.1    Koller, D.2
  • 121
    • 0008976521 scopus 로고
    • String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage
    • W. E. Winkler. String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. Proceedings of the Section on Survey Research, 1990.
    • (1990) Proceedings of the Section on Survey Research
    • Winkler, W.E.1
  • 123
    • 84881115711 scopus 로고    scopus 로고
    • Scorpion: Explaining away outliers in aggregate queries
    • E. Wu and S. Madden. Scorpion: Explaining away outliers in aggregate queries. Proceedings of the VLDB Endowment, 6(8):553-564, 2013.
    • (2013) Proceedings of the VLDB Endowment , vol.6 , Issue.8 , pp. 553-564
    • Wu, E.1    Madden, S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.