-
1
-
-
2342576574
-
Eliminating fuzzy duplicates in data warehouses
-
In Proceedings of the 28th International Conference on Very Large Databases (VLDB-2002), Hong Kong, China
-
R. Ananthakrishna, S. Chaudhuri, and V. Ganti. Eliminating fuzzy duplicates in data warehouses. In Proceedings of the 28th International Conference on Very Large Databases (VLDB-2002), Hong Kong, China, 2002.
-
(2002)
-
-
Ananthakrishna, R.1
Chaudhuri, S.2
Ganti, V.3
-
2
-
-
0442289065
-
Survey of clustering data mining techniques
-
Technical Report, Accrue Software
-
P. Berkhin. Survey of clustering data mining techniques. Technical Report, Accrue Software, 2002.
-
(2002)
-
-
Berkhin, P.1
-
3
-
-
33644538718
-
Deduplication and group detection using links
-
In Proceedings of the 10th ACM SIGKDD Workshop on Link Analysis and Group Detection (LinkKDD-04), August
-
I. Bhattacharya and L. Getoor. Deduplication and group detection using links. In Proceedings of the 10th ACM SIGKDD Workshop on Link Analysis and Group Detection (LinkKDD-04), August 2004.
-
(2004)
-
-
Bhattacharya, I.1
Getoor, L.2
-
4
-
-
77954003729
-
Iterative record linkage for cleaning and integration
-
In Proceedings of the SIGMOD 2004 Workshop on Research Issues on Data Mining and Knowledge Discovery, June
-
I. Bhattacharya and L. Getoor. Iterative record linkage for cleaning and integration. In Proceedings of the SIGMOD 2004 Workshop on Research Issues on Data Mining and Knowledge Discovery, June 2004.
-
(2004)
-
-
Bhattacharya, I.1
Getoor, L.2
-
5
-
-
84889392492
-
A latent dirichlet model for entity resolution
-
Technical Report, University of Maryland, College Park, MD
-
I. Bhattacharya and L. Getoor. A latent dirichlet model for entity resolution. Technical Report, University of Maryland, College Park, MD, 2005.
-
(2005)
-
-
Bhattacharya, I.1
Getoor, L.2
-
6
-
-
77952372966
-
Adaptive duplicate detection using learnable string similarity measures
-
In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), Washington, DC
-
M. Bilenko and R. J. Mooney. Adaptive duplicate detection using learnable string similarity measures. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), Washington, DC, 2003.
-
(2003)
-
-
Bilenko, M.1
Mooney, R.J.2
-
7
-
-
84889342710
-
D-dupe: An interactive tool for entity resolution in social networks
-
In The 13th International Symposium on Graph Drawing (Poster), Limerick, Ireland, September
-
M. Bilgic, L. Licamele, L. Getoor, and B. Shneiderman. D-dupe: An interactive tool for entity resolution in social networks. In The 13th International Symposium on Graph Drawing (Poster), Limerick, Ireland, September 2005.
-
(2005)
-
-
Bilgic, M.1
Licamele, L.2
Getoor, L.3
Shneiderman, B.4
-
8
-
-
1142279457
-
Robust and efficient fuzzy match for online data cleaning
-
In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA
-
S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani. Robust and efficient fuzzy match for online data cleaning. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 313-324, San Diego, CA, 2003.
-
(2003)
, pp. 313-324
-
-
Chaudhuri, S.1
Ganjam, K.2
Ganti, V.3
Motwani, R.4
-
9
-
-
0000666461
-
Data integration using similarity joins and a word-based information representation language
-
W. Cohen. Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems, 18:288-321, 2000.
-
(2000)
ACM Transactions on Information Systems
, vol.18
, pp. 288-321
-
-
Cohen, W.1
-
10
-
-
0034592802
-
Hardening soft information sources
-
In Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000), Boston, August
-
W. W. Cohen, H. Kautz, and D. McAllester. Hardening soft information sources. In Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) pp. 255-259, Boston, August 2000.
-
(2000)
, pp. 255-259
-
-
Cohen, W.W.1
Kautz, H.2
McAllester, D.3
-
11
-
-
11144240583
-
A comparison of string distance metrics for name-matching tasks
-
In Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web, Acapulco, Mexico, August
-
W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web, pp. 73-78, Acapulco, Mexico, August 2003.
-
(2003)
, pp. 73-78
-
-
Cohen, W.W.1
Ravikumar, P.2
Fienberg, S.E.3
-
12
-
-
0242540438
-
Learning to match and cluster large high-dimensional data sets for data integration
-
In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), Edmonton, Alberta
-
W. W. Cohen and J. Richman. Learning to match and cluster large high-dimensional data sets for data integration. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), Edmonton, Alberta, 2002.
-
(2002)
-
-
Cohen, W.W.1
Richman, J.2
-
13
-
-
18744368587
-
Object matching for data integration: A profilebased approach
-
In Proceedings of the IJCAI Workshop on Information Integration on the Web, Acapulco, Mexico, August
-
A. Doan, Y. Lu, Y. Lee, and J. Han. Object matching for data integration: A profilebased approach. In Proceedings of the IJCAI Workshop on Information Integration on the Web, Acapulco, Mexico, August 2003.
-
(2003)
-
-
Doan, A.1
Lu, Y.2
Lee, Y.3
Han, J.4
-
15
-
-
0033891155
-
An extensible framework for data cleaning
-
In ICDE '00: Proceedings of the 16th International Conference on Data Engineering, IEEE Computer Society
-
H. Galhardas, D. Florescu, E. Simon, and D. Shasha. An extensible framework for data cleaning. In ICDE '00: Proceedings of the 16th International Conference on Data Engineering, p. 312. IEEE Computer Society, 2000.
-
(2000)
, pp. 312
-
-
Galhardas, H.1
Florescu, D.2
Simon, E.3
Shasha, D.4
-
16
-
-
0031622479
-
CiteSeer: An automatic citation indexing system
-
In Proceedings of the Third ACM Conference on Digital Libraries, Pittsburgh, PA, June 23-26
-
C. Lee Giles, K. Bollacker, and S. Lawrence. CiteSeer: An automatic citation indexing system. In Proceedings of the Third ACM Conference on Digital Libraries, pp. 89-98, Pittsburgh, PA, June 23-26 1998.
-
(1998)
, pp. 89-98
-
-
Lee Giles, C.1
Bollacker, K.2
Lawrence, S.3
-
17
-
-
0344927353
-
Text joins for data cleansing and integration in an rdbms
-
In 19th IEEE International Conference on Data Engineering
-
L. Gravano, P. Ipeirotis, N. Koudas, and D. Srivastava. Text joins for data cleansing and integration in an rdbms. In 19th IEEE International Conference on Data Engineering, 2003.
-
(2003)
-
-
Gravano, L.1
Ipeirotis, P.2
Koudas, N.3
Srivastava, D.4
-
18
-
-
84976856849
-
The merge/purge problem for large databases
-
In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD-95), San Jose, CA, May
-
M. A. Hernández and S. J. Stolfo. The merge/purge problem for large databases. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD-95) pp. 127-138, San Jose, CA, May 1995.
-
(1995)
, pp. 127-138
-
-
Hernández, M.A.1
Stolfo, S.J.2
-
19
-
-
0003897956
-
Identifying and merging related bibliographic records
-
Master's thesis, Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA
-
J. A. Hylton. Identifying and merging related bibliographic records. Master's thesis, Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, 1996.
-
(1996)
-
-
Hylton, J.A.1
-
20
-
-
0141767425
-
Graph-based hierarchical conceptual clustering
-
I. Jonyer, L. B. Holder, and D. J. Cook. Graph-based hierarchical conceptual clustering. Journal of Machine Learning Research, 2(1-2):19-43, 2001.
-
(2001)
Journal of Machine Learning Research
, vol.2
, Issue.1-2
, pp. 19-43
-
-
Jonyer, I.1
Holder, L.B.2
Cook, D.J.3
-
21
-
-
84880127702
-
Exploiting relationships for domainindependent data cleaning
-
In SIAM International Conference on Data Mining (SIAM SDM), Newport Beach, CA, April 21-23
-
D. V. Kalashnikov, S. Mehrotra, and Z. Chen. Exploiting relationships for domainindependent data cleaning. In SIAM International Conference on Data Mining (SIAM SDM), Newport Beach, CA, April 21-23 2005.
-
(2005)
-
-
Kalashnikov, D.V.1
Mehrotra, S.2
Chen, Z.3
-
22
-
-
0032652968
-
Autonomous citation matching
-
In Proceedings of the Third International Conference on Autonomous Agents, New York, May, ACM Press
-
S. Lawrence, K. Bollacker, and C. L. Giles. Autonomous citation matching. In Proceedings of the Third International Conference on Autonomous Agents, New York, May 1999. ACM Press.
-
(1999)
-
-
Lawrence, S.1
Bollacker, K.2
Giles, C.L.3
-
23
-
-
17244368453
-
Semantic integration in text: From ambiguous names to identifiable entities
-
Al Magazine. Special Issue on Semantic Integration
-
X. Li, P. Morie, and D. Roth. Semantic integration in text: From ambiguous names to identifiable entities. Al Magazine. Special Issue on Semantic Integration, 2005.
-
(2005)
-
-
Li, X.1
Morie, P.2
Roth, D.3
-
24
-
-
33646398530
-
Conditional models of identity uncertainty with application to noun coreference
-
In Neural Information Processing Systems (NIPS)
-
A. McCallum and B. Wellner. Conditional models of identity uncertainty with application to noun coreference. In Neural Information Processing Systems (NIPS), 2004.
-
(2004)
-
-
McCallum, A.1
Wellner, B.2
-
25
-
-
0034592784
-
Efficient clustering of high-dimensional data sets with application to reference matching
-
In Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000), Boston, MA, August
-
A. K. McCallum, K. Nigam, and L. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000), pp. 169-178, Boston, MA, August 2000.
-
(2000)
, pp. 169-178
-
-
McCallum, A.K.1
Nigam, K.2
Ungar, L.3
-
26
-
-
84880739933
-
Blog: Probabilistic models with unknown objects
-
In Proceedings IJCAI
-
B. Milch, B. Marthi, D. Sontag, S. Russell, D. L. Ong, and A. Kolobov. Blog: Probabilistic models with unknown objects. In Proceedings IJCAI, 2005.
-
(2005)
-
-
Milch, B.1
Marthi, B.2
Sontag, D.3
Russell, S.4
Ong, D.L.5
Kolobov, A.6
-
27
-
-
85018108837
-
The field matching problem: Algorithms and applications
-
In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, OR, August
-
A. E. Monge and C. P. Elkan. The field matching problem: Algorithms and applications. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 267-270, Portland, OR, August 1996.
-
(1996)
, pp. 267-270
-
-
Monge, A.E.1
Elkan, C.P.2
-
28
-
-
0004043396
-
An efficient domain-independent algorithm for detecting approximately duplicate database records
-
In Proceedings of the SIGMOD 1997 Workshop on Research Issues on Data Mining and Knowledge Discovery, Tuscon, AZ, May
-
A. E. Monge and C. P. Elkan. An efficient domain-independent algorithm for detecting approximately duplicate database records. In Proceedings of the SIGMOD 1997 Workshop on Research Issues on Data Mining and Knowledge Discovery, pp. 23-29, Tuscon, AZ, May 1997.
-
(1997)
, pp. 23-29
-
-
Monge, A.E.1
Elkan, C.P.2
-
29
-
-
0345566149
-
A guided tour to approximate string matching
-
G. Navarro. A guided tour to approximate string matching. ACM Computing Surveys, 33(1):31-88, 2001.
-
(2001)
ACM Computing Surveys
, vol.33
, Issue.1
, pp. 31-88
-
-
Navarro, G.1
-
30
-
-
40849113633
-
Clustering relational data using attribute and link information
-
In Proceedings of the Text Mining and Link Analysis Workshop, Eighteenth International Joint Conference on Artificial Intelligence
-
J. Neville, M. Adler, and D. Jensen. Clustering relational data using attribute and link information. In Proceedings of the Text Mining and Link Analysis Workshop, Eighteenth International Joint Conference on Artificial Intelligence, 2003.
-
(2003)
-
-
Neville, J.1
Adler, M.2
Jensen, D.3
-
31
-
-
0001592068
-
Automatic linkage of vital records
-
H. B. Newcombe, J. M. Kennedy, S. J. Axford, and A. P. James. Automatic linkage of vital records. Science, 130:954-959, 1959.
-
(1959)
Science
, vol.130
, pp. 954-959
-
-
Newcombe, H.B.1
Kennedy, J.M.2
Axford, S.J.3
James, A.P.4
-
32
-
-
17244378014
-
Multi-relational record linkage
-
In Proceedings of 3rd Workshop on Multi-Relational Data Mining at ACM SI GKDD, Seattle, WA, August
-
Parag and P. Domingos. Multi-relational record linkage. In Proceedings of 3rd Workshop on Multi-Relational Data Mining at ACM SI GKDD, Seattle, WA, August 2004.
-
(2004)
-
-
Parag, P.1
Domingos, P.2
-
33
-
-
84898987614
-
Identity uncertainty and citation matching
-
In Advances in Neural Information Processing Systems 15. MIT Press, Cambridge, MA
-
H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser. Identity uncertainty and citation matching. In Advances in Neural Information Processing Systems 15. MIT Press, Cambridge, MA, 2003.
-
(2003)
-
-
Pasula, H.1
Marthi, B.2
Milch, B.3
Russell, S.4
Shpitser, I.5
-
34
-
-
84944315993
-
Potter's wheel: An interactive data cleaning system
-
In Proceedings VLDB
-
V. Raman and J. M. Hellerstein. Potter's wheel: An interactive data cleaning system. In Proceedings VLDB, 2001.
-
(2001)
-
-
Raman, V.1
Hellerstein, J.M.2
-
35
-
-
26844557708
-
A hierarchical graphical model for record linkage
-
In UAI 2004, Banff, CA, July
-
P. Ravikumar and W. W. Cohen. A hierarchical graphical model for record linkage. In UAI 2004, Banff, CA, July 2004.
-
(2004)
-
-
Ravikumar, P.1
Cohen, W.W.2
-
37
-
-
0242456811
-
Interactive deduplication using active learning
-
In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002). Edmonton, Alberta
-
S. Sarawagi and A. Bhamidipaty. Interactive deduplication using active learning. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002). Edmonton, Alberta, 2002.
-
(2002)
-
-
Sarawagi, S.1
Bhamidipaty, A.2
-
39
-
-
0035545848
-
Learning object identification rules for information integration
-
S. Tejada, C. A. Knoblock, and S. Minton. Learning object identification rules for information integration. Information Systems Journal, 26(8):635-656, 2001.
-
(2001)
Information Systems Journal
, vol.26
, Issue.8
, pp. 635-656
-
-
Tejada, S.1
Knoblock, C.A.2
Minton, S.3
-
40
-
-
0012866045
-
The state of record linkage and current research problems
-
Technical Report, Statistical Research Division, U.S. Census Bureau, Washington, DC
-
W. E. Winkler. The state of record linkage and current research problems. Technical Report, Statistical Research Division, U.S. Census Bureau, Washington, DC, 1999.
-
(1999)
-
-
Winkler, W.E.1
-
41
-
-
2942741943
-
Methods for record linkage and Bayesian networks
-
Technical Report, Statistical Research Division, U.S. Census Bureau, Washington, DC
-
W. E. Winkler. Methods for record linkage and Bayesian networks. Technical Report, Statistical Research Division, U.S. Census Bureau, Washington, DC, 2002.
-
(2002)
-
-
Winkler, W.E.1
|