메뉴 건너뛰기




Volumn , Issue , 2009, Pages 952-963

Large-Scale deduplication with constraints using dedupalog

Author keywords

[No Author keywords available]

Indexed keywords

DATA CLEANING; DEDUPLICATION; DOMAIN SPECIFIC; EFFICIENT ALGORITHM; HIGH QUALITY; LARGE DATASETS; PAPER REFERENCES; PROTOTYPE IMPLEMENTATIONS; REAL WORLD DATA;

EID: 67649649597     PISSN: 10844627     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICDE.2009.43     Document Type: Conference Paper
Times cited : (156)

References (55)
  • 1
    • 0034228041 scopus 로고    scopus 로고
    • Rock: A robust clustering algorithm for categorical attributes
    • S. Guha, R. Rastogi, and K. Shim, "Rock: A robust clustering algorithm for categorical attributes," Inf. Syst., vol. 25, no. 5, pp. 345 -366, 2000.
    • (2000) Inf. Syst , vol.25 , Issue.5 , pp. 345-366
    • Guha, S.1    Rastogi, R.2    Shim, K.3
  • 3
    • 0032091575 scopus 로고    scopus 로고
    • Integration of heterogeneous databases without common domains using queries based on textual similarity
    • W. W. Cohen, "Integration of heterogeneous databases without common domains using queries based on textual similarity." in SIGMOD Conference, 1998, pp. 201 -212.
    • (1998) SIGMOD Conference , pp. 201-212
    • Cohen, W.W.1
  • 4
    • 84949423737 scopus 로고    scopus 로고
    • Constraintbased clustering in large databases
    • A. K. H. Tung, R. T. Ng, L. V. S. Lakshmanan, and J. Han, "Constraintbased clustering in large databases," in ICDT, 2001, pp. 405 -419.
    • (2001) ICDT , pp. 405-419
    • Tung, A.K.H.1    Ng, R.T.2    Lakshmanan, L.V.S.3    Han, J.4
  • 5
    • 33745448357 scopus 로고    scopus 로고
    • I. Bhattacharya and L. Getoor, A latent dirichlet model for unsupervised entity resolution, in SDM, 2006.
    • I. Bhattacharya and L. Getoor, "A latent dirichlet model for unsupervised entity resolution," in SDM, 2006.
  • 6
    • 29344435802 scopus 로고    scopus 로고
    • Constraint-based entity matching
    • W. Shen, X. Li, and A. Doan, "Constraint-based entity matching," in AAAI, 2005, pp. 862 -867.
    • (2005) AAAI , pp. 862-867
    • Shen, W.1    Li, X.2    Doan, A.3
  • 8
    • 35048857464 scopus 로고    scopus 로고
    • Limbo: Scalable clustering of categorical data
    • P. Andritsos, P. Tsaparas, R. J. Miller, and K. C. Sevcik, "Limbo: Scalable clustering of categorical data," in EDBT, 2004, pp. 123 -146.
    • (2004) EDBT , pp. 123-146
    • Andritsos, P.1    Tsaparas, P.2    Miller, R.J.3    Sevcik, K.C.4
  • 9
    • 33750288047 scopus 로고    scopus 로고
    • Measuring constraint-set utility for partitional clustering algorithms
    • I. Davidson, K. Wagstaff, and S. Basu, "Measuring constraint-set utility for partitional clustering algorithms," in PKDD, 2006, pp. 115 -126.
    • (2006) PKDD , pp. 115-126
    • Davidson, I.1    Wagstaff, K.2    Basu, S.3
  • 10
    • 57549092754 scopus 로고    scopus 로고
    • When is constrained clustering beneficial, and why?
    • K. Wagstaff, S. Basu, and I. Davidson, "When is constrained clustering beneficial, and why?" in AAAI, 2006.
    • (2006) AAAI
    • Wagstaff, K.1    Basu, S.2    Davidson, I.3
  • 11
    • 0344756851 scopus 로고    scopus 로고
    • Eliminating fuzzy duplicates in data warehouses
    • R. Ananthakrishna, S. Chaudhuri, and V. Ganti, "Eliminating fuzzy duplicates in data warehouses," in VLDB, 2002, pp. 586 -597.
    • (2002) VLDB , pp. 586-597
    • Ananthakrishna, R.1    Chaudhuri, S.2    Ganti, V.3
  • 13
    • 84880903471 scopus 로고    scopus 로고
    • Semantics and inference for recursive probability models
    • A. Pfeffer and D. Koller, "Semantics and inference for recursive probability models," in AAAI/IAAI, 2000, pp. 538 -544.
    • (2000) AAAI/IAAI , pp. 538-544
    • Pfeffer, A.1    Koller, D.2
  • 14
    • 84878044770 scopus 로고    scopus 로고
    • Entity resolution with markov logic
    • P. Singla and P. Domingos, "Entity resolution with markov logic," in ICDM, 2006, pp. 572 -582.
    • (2006) ICDM , pp. 572-582
    • Singla, P.1    Domingos, P.2
  • 15
    • 33746868385 scopus 로고    scopus 로고
    • Correlation clustering in general weighted graphs
    • E. D. Demaine, D. Emanuel, A. Fiat, and N. Immorlica, "Correlation clustering in general weighted graphs," Theor. Comput. Sci., vol. 361, no. 2-3, pp. 172 -187, 2006.
    • (2006) Theor. Comput. Sci , vol.361 , Issue.2-3 , pp. 172-187
    • Demaine, E.D.1    Emanuel, D.2    Fiat, A.3    Immorlica, N.4
  • 16
    • 67649651583 scopus 로고    scopus 로고
    • Available
    • [Online]. Available: http://www.cs.umass.edu/ mccallum/code-data.html
  • 17
    • 0019004898 scopus 로고
    • Equality and domain closure in first-order databases
    • R. Reiter, "Equality and domain closure in first-order databases," J. ACM, vol. 27, no. 2, pp. 235 -249, 1980.
    • (1980) J. ACM , vol.27 , Issue.2 , pp. 235-249
    • Reiter, R.1
  • 18
    • 67649639188 scopus 로고    scopus 로고
    • Alias: An active learning led interactive deduplication system
    • S. Sarawagi, A. Bhamidipaty, A. Kirpal, and C. Mouli, "Alias: An active learning led interactive deduplication system," in VLDB, 2002, pp. 1103 -1106.
    • (2002) VLDB , pp. 1103-1106
    • Sarawagi, S.1    Bhamidipaty, A.2    Kirpal, A.3    Mouli, C.4
  • 19
    • 33745661243 scopus 로고    scopus 로고
    • D-dupe: An interactive tool for entity resolution in social networks
    • M. Bilgic, L. Licamele, L. Getoor, and B. Shneiderman, "D-dupe: An interactive tool for entity resolution in social networks," in Graph Drawing, 2005, pp. 505 -507.
    • (2005) Graph Drawing , pp. 505-507
    • Bilgic, M.1    Licamele, L.2    Getoor, L.3    Shneiderman, B.4
  • 21
    • 34248229658 scopus 로고    scopus 로고
    • Collective entity resolution in relational data
    • I. Bhattacharya and L. Getoor, "Collective entity resolution in relational data," TKDD, vol. 1, no. 1, 2007.
    • (2007) TKDD , vol.1 , Issue.1
    • Bhattacharya, I.1    Getoor, L.2
  • 22
    • 34548731840 scopus 로고    scopus 로고
    • Conditional functional dependencies for data cleaning
    • P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis, "Conditional functional dependencies for data cleaning," in ICDE, 2007, pp. 746 -755.
    • (2007) ICDE , pp. 746-755
    • Bohannon, P.1    Fan, W.2    Geerts, F.3    Jia, X.4    Kementsietsidis, A.5
  • 23
    • 57549084481 scopus 로고    scopus 로고
    • Dependencies revisited for improving data quality
    • W. Fan, "Dependencies revisited for improving data quality," in PODS, 2008, pp. 159 -170.
    • (2008) PODS , pp. 159-170
    • Fan, W.1
  • 24
    • 3142665421 scopus 로고    scopus 로고
    • Correlation clustering
    • N. Bansal, A. Blum, and S. Chawla, "Correlation clustering," Machine Learning, vol. 56, no. 1-3, pp. 89 -113, 2004.
    • (2004) Machine Learning , vol.56 , Issue.1-3 , pp. 89-113
    • Bansal, N.1    Blum, A.2    Chawla, S.3
  • 25
    • 24644456480 scopus 로고    scopus 로고
    • Clustering with qualitative information
    • M. Charikar, V. Guruswami, and A. Wirth, "Clustering with qualitative information," J. Comput. Syst. Sci., vol. 71, no. 3, pp. 360 -383, 2005.
    • (2005) J. Comput. Syst. Sci , vol.71 , Issue.3 , pp. 360-383
    • Charikar, M.1    Guruswami, V.2    Wirth, A.3
  • 26
    • 34848818026 scopus 로고    scopus 로고
    • Aggregating inconsistent information: Ranking and clustering
    • N. Ailon, M. Charikar, and A. Newman, "Aggregating inconsistent information: ranking and clustering," in STOC, 2005, pp. 684 -693.
    • (2005) STOC , pp. 684-693
    • Ailon, N.1    Charikar, M.2    Newman, A.3
  • 29
    • 85104914015 scopus 로고    scopus 로고
    • Efficient exact set-similarity joins
    • A. Arasu, V. Ganti, and R. Kaushik, "Efficient exact set-similarity joins," in VLDB, 2006, pp. 918 -929.
    • (2006) VLDB , pp. 918-929
    • Arasu, A.1    Ganti, V.2    Kaushik, R.3
  • 30
    • 34248168069 scopus 로고    scopus 로고
    • Clustering aggregation
    • A. Gionis, H. Mannila, and P. Tsaparas, "Clustering aggregation," TKDD, vol. 1, no. 1, 2007.
    • (2007) TKDD , vol.1 , Issue.1
    • Gionis, A.1    Mannila, H.2    Tsaparas, P.3
  • 31
    • 0028514351 scopus 로고
    • On the hardness of approximating minimization problems
    • C. Lund and M. Yannakakis, "On the hardness of approximating minimization problems," J. ACM, vol. 41, no. 5, pp. 960 -981, 1994.
    • (1994) J. ACM , vol.41 , Issue.5 , pp. 960-981
    • Lund, C.1    Yannakakis, M.2
  • 33
    • 77952372966 scopus 로고    scopus 로고
    • Adaptive duplicate detection using learnable string similarity measures
    • M. Bilenko and R. J. Mooney, "Adaptive duplicate detection using learnable string similarity measures," in KDD, 2003, pp. 39 -48.
    • (2003) KDD , pp. 39-48
    • Bilenko, M.1    Mooney, R.J.2
  • 34
    • 1142279457 scopus 로고    scopus 로고
    • Robust and efficient fuzzy match for online data cleaning
    • S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, "Robust and efficient fuzzy match for online data cleaning," in SIGMOD Conference, 2003, pp. 313 -324.
    • (2003) SIGMOD Conference , pp. 313-324
    • Chaudhuri, S.1    Ganjam, K.2    Ganti, V.3    Motwani, R.4
  • 35
    • 84880467474 scopus 로고    scopus 로고
    • Text joins in an rdbms for web data integration
    • L. Gravano, P. G. Ipeirotis, N. Koudas, and D. Srivastava, "Text joins in an rdbms for web data integration," in WWW, 2003, pp. 90 -101.
    • (2003) , pp. 90-101
    • Gravano, L.1    Ipeirotis, P.G.2    Koudas, N.3    Srivastava, D.4
  • 36
    • 67649639189 scopus 로고    scopus 로고
    • M. A. Jaro, Unimatch: A record linkage system: Users manual, US Bureau of the Census, Washington, D.C., Tech. Rep., 1976.
    • M. A. Jaro, "Unimatch: A record linkage system: Users manual," US Bureau of the Census, Washington, D.C., Tech. Rep., 1976.
  • 37
    • 84947399464 scopus 로고
    • A theory for record linkage
    • I. P. Fellegi and A. B. Sunter, "A theory for record linkage," J. Am. Statistical Assoc., vol. 64, no. 328, pp. 1183 -1210, 1969.
    • (1969) J. Am. Statistical Assoc , vol.64 , Issue.328 , pp. 1183-1210
    • Fellegi, I.P.1    Sunter, A.B.2
  • 38
    • 0242456811 scopus 로고    scopus 로고
    • Interactive deduplication using active learning
    • S. Sarawagi and A. Bhamidipaty, "Interactive deduplication using active learning," in KDD, 2002, pp. 269 -278.
    • (2002) KDD , pp. 269-278
    • Sarawagi, S.1    Bhamidipaty, A.2
  • 39
    • 52649145249 scopus 로고    scopus 로고
    • Fast indexes and algorithms for set similarity selection queries
    • M. Hadjieleftheriou, A. Chandel, N. Koudas, and D. Srivastava, "Fast indexes and algorithms for set similarity selection queries," in ICDE, 2008, pp. 267 -276.
    • (2008) ICDE , pp. 267-276
    • Hadjieleftheriou, M.1    Chandel, A.2    Koudas, N.3    Srivastava, D.4
  • 41
    • 3142777876 scopus 로고    scopus 로고
    • Efficient set joins on similarity predicates
    • S. Sarawagi and A. Kirpal, "Efficient set joins on similarity predicates," in SIGMOD Conference, 2004, pp. 743 -754.
    • (2004) SIGMOD Conference , pp. 743-754
    • Sarawagi, S.1    Kirpal, A.2
  • 45
    • 26444550791 scopus 로고    scopus 로고
    • Robust identification of fuzzy duplicates
    • S. Chaudhuri, V. Ganti, and R. Motwani, "Robust identification of fuzzy duplicates," in ICDE, 2005, pp. 865 -876.
    • (2005) ICDE , pp. 865-876
    • Chaudhuri, S.1    Ganti, V.2    Motwani, R.3
  • 46
    • 0344756845 scopus 로고    scopus 로고
    • Declarative data cleaning: Language, model, and algorithms
    • H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita, "Declarative data cleaning: Language, model, and algorithms," in VLDB, 2001, pp. 371 -380.
    • (2001) VLDB , pp. 371-380
    • Galhardas, H.1    Florescu, D.2    Shasha, D.3    Simon, E.4    Saita, C.-A.5
  • 47
    • 84944315993 scopus 로고    scopus 로고
    • Potter 's wheel: An interactive data cleaning system
    • V. Raman and J. M. Hellerstein, "Potter 's wheel: An interactive data cleaning system," in VLDB, 2001, pp. 381 -390.
    • (2001) VLDB , pp. 381-390
    • Raman, V.1    Hellerstein, J.M.2
  • 48
    • 29844452555 scopus 로고    scopus 로고
    • Reference reconciliation in complex information spaces
    • X. Dong, A. Y. Halevy, and J. Madhavan, "Reference reconciliation in complex information spaces," in SIGMOD Conference, 2005, pp. 85 -96.
    • (2005) SIGMOD Conference , pp. 85-96
    • Dong, X.1    Halevy, A.Y.2    Madhavan, J.3
  • 49
    • 84878044770 scopus 로고    scopus 로고
    • Entity resolution with markov logic
    • P. Singla and P. Domingos, "Entity resolution with markov logic," in ICDM, 2006, pp. 572 -582.
    • (2006) ICDM , pp. 572-582
    • Singla, P.1    Domingos, P.2
  • 50
    • 33745776306 scopus 로고    scopus 로고
    • Joint deduplication of multiple record types in relational data
    • A. Culotta and A. McCallum, "Joint deduplication of multiple record types in relational data," in CIKM, 2005, pp. 257 -258.
    • (2005) CIKM , pp. 257-258
    • Culotta, A.1    McCallum, A.2
  • 51
    • 34548759040 scopus 로고    scopus 로고
    • Fast identification of relational constraint violations
    • A. Chandel, N. Koudas, K. Q. Pu, and D. Srivastava, "Fast identification of relational constraint violations," in ICDE, 2007, pp. 776 -785.
    • (2007) ICDE , pp. 776-785
    • Chandel, A.1    Koudas, N.2    Pu, K.Q.3    Srivastava, D.4
  • 52
    • 33745329531 scopus 로고    scopus 로고
    • A cost-based model and effective heuristic for repairing constraints by value modification
    • P. Bohannon, M. Flaster, W. Fan, and R. Rastogi, "A cost-based model and effective heuristic for repairing constraints by value modification," in SIGMOD Conference, 2005, pp. 143 -154.
    • (2005) SIGMOD Conference , pp. 143-154
    • Bohannon, P.1    Flaster, M.2    Fan, W.3    Rastogi, R.4
  • 53
    • 84959912087 scopus 로고    scopus 로고
    • Improving data quality: Consistency and accuracy
    • G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma, "Improving data quality: Consistency and accuracy," in VLDB, 2007, pp. 315 -326.
    • (2007) VLDB , pp. 315-326
    • Cong, G.1    Fan, W.2    Geerts, F.3    Jia, X.4    Ma, S.5
  • 54
    • 29844448776 scopus 로고    scopus 로고
    • Conquer: Efficient management of inconsistent databases
    • A. Fuxman, E. Fazli, and R. J. Miller, "Conquer: Efficient management of inconsistent databases," in SIGMOD Conference, 2005, pp. 155 -166.
    • (2005) SIGMOD Conference , pp. 155-166
    • Fuxman, A.1    Fazli, E.2    Miller, R.J.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.