-
1
-
-
2342576574
-
Eliminating Fuzzy Duplicates in Data Warehouses
-
R. Ananthakrishna, S. Chaudhuri, and V. Ganti, "Eliminating Fuzzy Duplicates in Data Warehouses," Proc. 28th Int'l Conf. Very Large Data Bases, pp. 586-597, 2002.
-
(2002)
Proc. 28th Int'l Conf. Very Large Data Bases
, pp. 586-597
-
-
Ananthakrishna, R.1
Chaudhuri, S.2
Ganti, V.3
-
2
-
-
85059500505
-
-
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. ACM Press, 1999.
-
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. ACM Press, 1999.
-
-
-
-
3
-
-
5444258997
-
A Comparison of Fast Blocking Methods for Record Linkage
-
R. Baxter, P. Christen, and T. Churches, "A Comparison of Fast Blocking Methods for Record Linkage," Proc. KDD Workshop Data Cleaning, Record Linkage, and Object Consolidation, pp. 25-27, 2003.
-
(2003)
Proc. KDD Workshop Data Cleaning, Record Linkage, and Object Consolidation
, pp. 25-27
-
-
Baxter, R.1
Christen, P.2
Churches, T.3
-
4
-
-
58149472338
-
Swoosh: A Generic Approach to Entity Resolution
-
O. Bennjelloun, H. Garcia-Molina, D. Menestrina, Q. Su, S.E. Whang, and J. Widom, "Swoosh: A Generic Approach to Entity Resolution," The VLDB J., vol. 18, no. 1, pp. 255-276, 2009.
-
(2009)
The VLDB J
, vol.18
, Issue.1
, pp. 255-276
-
-
Bennjelloun, O.1
Garcia-Molina, H.2
Menestrina, D.3
Su, Q.4
Whang, S.E.5
Widom, J.6
-
5
-
-
77952372966
-
Adaptive Duplicate Detection Using Learnable String Similarity Measures
-
M. Bilenko and R.J. Mooney, "Adaptive Duplicate Detection Using Learnable String Similarity Measures," Proc. ACM SIGKDD, pp. 39-48, 2003.
-
(2003)
Proc. ACM SIGKDD
, pp. 39-48
-
-
Bilenko, M.1
Mooney, R.J.2
-
6
-
-
1142279457
-
Robust and Efficient Fuzzy Match for Online Data Cleaning
-
S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, "Robust and Efficient Fuzzy Match for Online Data Cleaning," Proc. ACM SIGMOD, pp. 313-324, 2003.
-
(2003)
Proc. ACM SIGMOD
, pp. 313-324
-
-
Chaudhuri, S.1
Ganjam, K.2
Ganti, V.3
Motwani, R.4
-
7
-
-
26444550791
-
Robust Identification of Fuzzy Duplicates
-
S. Chaudhuri, V. Ganti, and R. Motwani, "Robust Identification of Fuzzy Duplicates," Proc. 21st IEEE Int'l Conf. Data Eng., pp. 865-876, 2005.
-
(2005)
Proc. 21st IEEE Int'l Conf. Data Eng
, pp. 865-876
-
-
Chaudhuri, S.1
Ganti, V.2
Motwani, R.3
-
8
-
-
65449139594
-
Automatic Record Linkage Using Seeded Nearest Neighbour and Support Vector Machine Classification
-
P. Christen, "Automatic Record Linkage Using Seeded Nearest Neighbour and Support Vector Machine Classification," Proc. ACM SIGKDD, pp. 151-159, 2008.
-
(2008)
Proc. ACM SIGKDD
, pp. 151-159
-
-
Christen, P.1
-
9
-
-
7444251738
-
Febrl - A Parallel Open Source Data Linkage System
-
Springer
-
P. Christen, T. Churches, and M. Hegland, "Febrl - A Parallel Open Source Data Linkage System," Advances in Knowledge Discovery and Data Mining, pp. 638-647, Springer, 2004.
-
(2004)
Advances in Knowledge Discovery and Data Mining
, pp. 638-647
-
-
Christen, P.1
Churches, T.2
Hegland, M.3
-
10
-
-
33846428121
-
Quality and Complexity Measures for Data Linkage and Deduplication
-
F. Guillet and H. Hamilton, eds, Springer
-
P. Christen and K. Goiser, "Quality and Complexity Measures for Data Linkage and Deduplication," Quality Measures in Data Mining, F. Guillet and H. Hamilton, eds., vol. 43, pp. 127-151, Springer, 2007.
-
(2007)
Quality Measures in Data Mining
, vol.43
, pp. 127-151
-
-
Christen, P.1
Goiser, K.2
-
11
-
-
0034592802
-
Hardening Soft Information Sources
-
W.W. Cohen, H. Kautz, and D. McAllester, "Hardening Soft Information Sources," Proc. ACM SIGKDD, pp. 255-259, 2000.
-
(2000)
Proc. ACM SIGKDD
, pp. 255-259
-
-
Cohen, W.W.1
Kautz, H.2
McAllester, D.3
-
12
-
-
0242540438
-
Learning to Match and Cluster Large High-Dimensional Datasets for Data Integration
-
W.W. Cohen and J. Richman, "Learning to Match and Cluster Large High-Dimensional Datasets for Data Integration," Proc. ACM SIGKDD, pp. 475-480, 2002.
-
(2002)
Proc. ACM SIGKDD
, pp. 475-480
-
-
Cohen, W.W.1
Richman, J.2
-
13
-
-
85059499008
-
-
A. Culotta and A. McCallum, "A Conditional Model of Dedupli-cation for Multi-Type Relational Data," Technical Report IR-443, Dept. of Computer Science, Univ. of Massachusetts Amherst, 2005.
-
A. Culotta and A. McCallum, "A Conditional Model of Dedupli-cation for Multi-Type Relational Data," Technical Report IR-443, Dept. of Computer Science, Univ. of Massachusetts Amherst, 2005.
-
-
-
-
14
-
-
84957055189
-
Positive and Unlabeled Examples Help Learning
-
F. DeComite, F. Denis, and R. Gilleron, "Positive and Unlabeled Examples Help Learning," Proc. 11th Int'l Conf. Algorithmic Learning Theory, pp. 219-230, 1999.
-
(1999)
Proc. 11th Int'l Conf. Algorithmic Learning Theory
, pp. 219-230
-
-
DeComite, F.1
Denis, F.2
Gilleron, R.3
-
16
-
-
29844452555
-
Reference Reconciliation in Complex Information Spaces
-
X. Dong, A. Halevy, and J. Madhavan, "Reference Reconciliation in Complex Information Spaces," Proc. ACM SIGMOD, pp. 85-96, 2005.
-
(2005)
Proc. ACM SIGMOD
, pp. 85-96
-
-
Dong, X.1
Halevy, A.2
Madhavan, J.3
-
17
-
-
33845667955
-
Duplicate Record Detection: A Survey
-
Jan
-
A.K. Elmagarmid, P.G. Ipeirotis, and V.S. Verykios, "Duplicate Record Detection: A Survey," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 1, pp. 1-16, Jan. 2007.
-
(2007)
IEEE Trans. Knowledge and Data Eng
, vol.19
, Issue.1
, pp. 1-16
-
-
Elmagarmid, A.K.1
Ipeirotis, P.G.2
Verykios, V.S.3
-
18
-
-
84944318804
-
Approximate String Joins in a Database (Almost) for Free
-
L. Gravano, P.G. Ipeirotis, H.V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava, "Approximate String Joins in a Database (Almost) for Free," Proc. 27th Int'l Conf. Very Large Data Bases, pp. 491-500, 2001.
-
(2001)
Proc. 27th Int'l Conf. Very Large Data Bases
, pp. 491-500
-
-
Gravano, L.1
Ipeirotis, P.G.2
Jagadish, H.V.3
Koudas, N.4
Muthukrishnan, S.5
Srivastava, D.6
-
19
-
-
33745213976
-
Automatic Complex Schema Matching Across Web Query Interfaces: A Correlation Mining Approach
-
B. He and K.C.-C. Chang, "Automatic Complex Schema Matching Across Web Query Interfaces: A Correlation Mining Approach," ACM Trans. Database Systems, vol. 31, no. 1, pp. 346-396, 2006.
-
(2006)
ACM Trans. Database Systems
, vol.31
, Issue.1
, pp. 346-396
-
-
He, B.1
Chang, K.C.-C.2
-
20
-
-
84976856849
-
The Merge/Purge Problem for Large Databases
-
M.A. Hernandez and S.J. Stolfo, "The Merge/Purge Problem for Large Databases," ACM SIGMOD Record, vol. 24, no. 2, pp. 127-138, 1995.
-
(1995)
ACM SIGMOD Record
, vol.24
, Issue.2
, pp. 127-138
-
-
Hernandez, M.A.1
Stolfo, S.J.2
-
21
-
-
84950419860
-
Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida
-
M.A. Jaro, "Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida," J. Am. Statistical Assoc., vol. 89, no. 406, pp. 414-420, 1989.
-
(1989)
J. Am. Statistical Assoc
, vol.89
, Issue.406
, pp. 414-420
-
-
Jaro, M.A.1
-
22
-
-
84880127702
-
Exploiting Relationships for Domain-Independent Data Cleaning
-
D.V. Kalashnikov, S. Mehrotra, and Z. Chen, "Exploiting Relationships for Domain-Independent Data Cleaning," Proc. SIAM Int'l Conf. Data Mining, pp. 262-273, 2005.
-
(2005)
Proc. SIAM Int'l Conf. Data Mining
, pp. 262-273
-
-
Kalashnikov, D.V.1
Mehrotra, S.2
Chen, Z.3
-
23
-
-
34250670467
-
Record Linkage: Similarity Measures and Algorithms (Tutorial)
-
N. Koudas, S. Sarawagi, and D. Srivastava, "Record Linkage: Similarity Measures and Algorithms (Tutorial)," Proc. ACM SIGMOD, pp. 802-803, 2006.
-
(2006)
Proc. ACM SIGMOD
, pp. 802-803
-
-
Koudas, N.1
Sarawagi, S.2
Srivastava, D.3
-
24
-
-
84974661615
-
Learning from Positive and Unlabeled Examples
-
F. Letouzey, F. Denis, and R. Gilleron, "Learning from Positive and Unlabeled Examples," Proc. 11th Int'l Conf. Algorithmic Learning Theory, pp. 71-85, 2000.
-
(2000)
Proc. 11th Int'l Conf. Algorithmic Learning Theory
, pp. 71-85
-
-
Letouzey, F.1
Denis, F.2
Gilleron, R.3
-
25
-
-
85096855936
-
One-Class SVMs for Document Classification
-
L.M. Manevitz and M. Yousef, "One-Class SVMs for Document Classification," J. Machine Learning Research, vol. 2, pp. 139-154, 2001.
-
(2001)
J. Machine Learning Research
, vol.2
, pp. 139-154
-
-
Manevitz, L.M.1
Yousef, M.2
-
26
-
-
85059497637
-
-
A. McCallum, "Cora Citation Matching," http://www.cs.umass.edu/ ~mccallum/data/cora-refs.tar.gz, 2004.
-
A. McCallum, "Cora Citation Matching," http://www.cs.umass.edu/ ~mccallum/data/cora-refs.tar.gz, 2004.
-
-
-
-
27
-
-
0034592784
-
Efficient Clustering of High-Dimensional Datasets with Application to Reference Matching
-
A. McCallum, K. Nigam, and L.H. Ungar, "Efficient Clustering of High-Dimensional Datasets with Application to Reference Matching," Proc. ACM SIGKDD, pp. 169-178, 2000.
-
(2000)
Proc. ACM SIGKDD
, pp. 169-178
-
-
McCallum, A.1
Nigam, K.2
Ungar, L.H.3
-
28
-
-
0242456811
-
Interactive Deduplication Using Active Learning
-
S. Sarawagi and A. Bhamidipaty, "Interactive Deduplication Using Active Learning," Proc. ACM SIGKDD, pp. 269-278, 2002.
-
(2002)
Proc. ACM SIGKDD
, pp. 269-278
-
-
Sarawagi, S.1
Bhamidipaty, A.2
-
30
-
-
33745595861
-
Holistic Schema Matching for Web Query Interfaces
-
W. Su, J. Wang, and F.H. Lochovsky, "Holistic Schema Matching for Web Query Interfaces," Proc. 10th Int'l. Conf. Extending Database Technology, pp. 77-94, 2006.
-
(2006)
Proc. 10th Int'l. Conf. Extending Database Technology
, pp. 77-94
-
-
Su, W.1
Wang, J.2
Lochovsky, F.H.3
-
31
-
-
0242456803
-
Learning Domain-Independent String Transformation Weights for High Accuracy Object Identification
-
S. Tejada, C.A. Knoblock, and S. Minton, "Learning Domain-Independent String Transformation Weights for High Accuracy Object Identification," Proc. ACM SIGKDD, pp. 350-359, 2002.
-
(2002)
Proc. ACM SIGKDD
, pp. 350-359
-
-
Tejada, S.1
Knoblock, C.A.2
Minton, S.3
-
32
-
-
1842353812
-
The Discrimination Power of Dependency Structures in Record Linkage
-
Y. Thibaudeau, "The Discrimination Power of Dependency Structures in Record Linkage," Survey Methodology, vol. 19, pp. 31-38, 1993.
-
(1993)
Survey Methodology
, vol.19
, pp. 31-38
-
-
Thibaudeau, Y.1
-
33
-
-
85059500543
-
-
V. Vapnik, The Nature of Statistical Learning Theory, second ed. Springer, 2000.
-
V. Vapnik, The Nature of Statistical Learning Theory, second ed. Springer, 2000.
-
-
-
-
34
-
-
0038208065
-
A Bayesian Decision Model for Cost Optimal Record Matching
-
V.S. Verykios, G.V. Moustakides, and M.G. Elfeky, "A Bayesian Decision Model for Cost Optimal Record Matching," The VLDB J., vol. 12, no. 1, pp. 28-40, 2003.
-
(2003)
The VLDB J
, vol.12
, Issue.1
, pp. 28-40
-
-
Verykios, V.S.1
Moustakides, G.V.2
Elfeky, M.G.3
-
35
-
-
0002940254
-
Using the EM Algorithm for Weight Computation in the Fellegi-Sunter Model of Record Linkage
-
W.E. Winkler, "Using the EM Algorithm for Weight Computation in the Fellegi-Sunter Model of Record Linkage," Proc. Section Survey Research Methods, pp. 667-671, 1988.
-
(1988)
Proc. Section Survey Research Methods
, pp. 667-671
-
-
Winkler, W.E.1
-
36
-
-
0742268826
-
PEBL: Web Page Classification without Negative Examples
-
Jan
-
H. Yu, J. Han, and C.C. Chang, "PEBL: Web Page Classification without Negative Examples," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 1, pp. 70-81, Jan. 2004.
-
(2004)
IEEE Trans. Knowledge and Data Eng
, vol.16
, Issue.1
, pp. 70-81
-
-
Yu, H.1
Han, J.2
Chang, C.C.3
-
37
-
-
33750797710
-
Structured Data Extraction from the Web Based on Partial Tree Alignment
-
Dec
-
Y. Zhai and B. Liu, "Structured Data Extraction from the Web Based on Partial Tree Alignment," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 12, pp. 1614-1628, Dec. 2006.
-
(2006)
IEEE Trans. Knowledge and Data Eng
, vol.18
, Issue.12
, pp. 1614-1628
-
-
Zhai, Y.1
Liu, B.2
-
38
-
-
33744899132
-
Fully Automatic Wrapper Generation for Search Engines
-
H. Zhao, W. Meng, A. Wu, V. Raghavan, and C. Yu, "Fully Automatic Wrapper Generation for Search Engines," Proc. 14th World Wide Web Conf., pp. 66-75, 2005.
-
(2005)
Proc. 14th World Wide Web Conf
, pp. 66-75
-
-
Zhao, H.1
Meng, W.2
Wu, A.3
Raghavan, V.4
Yu, C.5
|