-
1
-
-
12244298488
-
-
In: Proc. of ACM SIGKDD Int. Conf. On Knowledge Discovery and Data Mining Seattle Washington USA
-
Agichtein, E., Ganti, V.:Mining Reference Tables for Automatic Text Segmentation. Proc. of ACM SIGKDD Int. Conf. On Knowledge Discovery and Data Mining, Seattle, Washington, USA, pp. 20-29 (2004)
-
(2004)
Mining Reference Tables for Automatic Text Segmentation
, pp. 20-29
-
-
Agichtein, E.1
Ganti, V.2
-
2
-
-
2342576574
-
-
Proc. of Int. Conf. on Very Large Databases, Hong Kong China
-
Ananthakrishna, R., Chaudhuri, S., Ganti, V.: Eliminating Fuzzy Duplicates in Data Warehouses. Proc. of Int. Conf. on Very Large Databases, Hong Kong, China, pp. 586-597 (2002)
-
(2002)
Eliminating Fuzzy Duplicates in Data Warehouses
, pp. 586-597
-
-
Ananthakrishna, R.1
Chaudhuri, S.2
Ganti, V.3
-
3
-
-
38749118638
-
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
-
Las Vegas Nevada USA
-
Andoni, A., Indyk, P.: Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. Proc. of IEEE Symposium on Foundations of Computer Science, Las Vegas, Nevada, USA, pp. 459-468 (2006)
-
(2006)
Proc. of IEEE Symposium on Foundations of Computer Science
, pp. 459-468
-
-
Andoni, A.1
Indyk, P.2
-
4
-
-
37549058056
-
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
-
Andoni, A., Indyk, P.: Near-optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. Communications of the ACM 51(1), 117-122 (2008)
-
(2008)
Communications of the ACM
, vol.51
, Issue.1
, pp. 117-122
-
-
Andoni, A.1
Indyk, P.2
-
5
-
-
85104914015
-
-
Seoul Korea
-
Arasu, A., Ganti, V., Kaushik, R.: Efficient Exact Set-Similarity Joins. Proc. of Int. Conf. on Very Large Databases, Seoul, Korea, pp. 918-929 (2006)
-
(2006)
Efficient Exact Set-Similarity Joins. Proc. of Int. Conf. on Very Large Databases
, pp. 918-929
-
-
Arasu, A.1
Ganti, V.2
Kaushik, R.3
-
6
-
-
0001592068
-
Automatic Linkage of Vital Records
-
Axford, S.J., Newcombe, H.B., Kennedy, J.M., James, A.P.:Automatic Linkage of Vital Records. Science 130, 954-959 (1959)
-
(1959)
Science
, vol.130
, pp. 954-959
-
-
Axford, S.J.1
Newcombe, H.B.2
Kennedy, J.M.3
James, A.P.4
-
7
-
-
27944439775
-
Modern information retrieval
-
Addison-Wesley
-
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
-
(1999)
Reading
-
-
Baeza-Yates, R.1
Ribeiro-Neto, B.2
-
8
-
-
3142665421
-
Correlation clustering
-
Bansal, N., Blum, A., Chawla, S.: Correlation Clustering. Machine Learning 56(1-3), 89-113 (2004)
-
(2004)
Machine Learning
, vol.56
, Issue.1-3
, pp. 89-113
-
-
Bansal, N.1
Blum, A.2
Chawla, S.3
-
9
-
-
80455177968
-
-
Chiba Japan
-
Bawa, M., Tyson Condie, S., Ganesan, P.: LSH Forest: Self-Tuning Indexes for Similarity Search. Proc. of Int. Conf. on World Wide Web, Chiba, Japan, pp. 651-660 (2005)
-
(2005)
LSH Forest: Self-Tuning Indexes for Similarity Search. Proc. of Int. Conf. on World Wide Web
, pp. 651-660
-
-
Bawa, M.1
Tyson Condie, S.2
Ganesan, P.3
-
10
-
-
35348849154
-
-
Banff Alberta Canada
-
Bayardo, R.J., Srikant, R., Ma, Y.: Scaling Up All Pairs Similarity Search. Proc. of Int. Conf. on World Wide Web, Banff, Alberta, Canada, pp. 131-140 (2007)
-
(2007)
Scaling Up All Pairs Similarity Search. Proc. of Int. Conf. on World Wide Web
, pp. 131-140
-
-
Bayardo, R.J.1
Srikant, R.2
Ma, Y.3
-
11
-
-
58149472338
-
Swoosh: A generic approach to entity resolution
-
Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J.: Swoosh: a generic approach to entity resolution. VLDB Journal 18(1), 255-276 (2009)
-
(2009)
VLDB Journal
, vol.18
, Issue.1
, pp. 255-276
-
-
Benjelloun, O.1
Garcia-Molina, H.2
Menestrina, D.3
Su, Q.4
Whang, S.E.5
Widom, J.6
-
14
-
-
33749549918
-
-
Philadelphia, Pennsylvania USA
-
Bhattacharya, I., Getoor, L., Licamele, Louis: QueryTime Entity Resolution. Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Philadelphia, Pennsylvania, USA, pp. 529-534 (2006)
-
(2006)
Licamele Louis QueryTime Entity Resolution. Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
, pp. 529-534
-
-
Bhattacharya, I.1
Getoor, L.2
-
15
-
-
77952372966
-
Adaptive duplicate detection using learnable string similarity measures
-
proc. of Washington DC USA
-
Bilenko, M., Mooney, R.J.: Adaptive Duplicate Detection Using Learnable String Similarity Measures. Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Washington, DC, USA, pp. 39-48 (2003)
-
(2003)
ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
, pp. 39-48
-
-
Bilenko, M.1
Mooney, R.J.2
-
18
-
-
0034832365
-
-
In: Proc. of ACM SIGMOD Int. Conf. on Management of Data Santa Barbara California USA
-
Borkar, V.R., Deshmukh, K., Sarawagi, S.: Automatic Segmentation of Text into Structured Records. Proc. of ACM SIGMOD Int. Conf. on Management of Data, Santa Barbara, California, USA, pp. 175-186 (2001)
-
(2001)
Automatic Segmentation of Text into Structured Records
, pp. 175-186
-
-
Borkar, V.R.1
Deshmukh, K.2
Sarawagi, S.3
-
19
-
-
0031620041
-
Minwise independent permutations
-
USA
-
Broder, A., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Minwise Independent Permutations. Proc. of ACMSymposium on Theory of Computing, Dallas, Texas, USA, pp. 327-336 (1998)
-
(1998)
Proc. of ACMSymposium on Theory of Computing Dallas Texas
, pp. 327-336
-
-
Broder, A.1
Charikar, M.2
Frieze, A.M.3
Mitzenmacher, M.4
-
20
-
-
0010362121
-
-
Santa Clara California USA
-
Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic Clustering on theWeb. Proc. of Int. Conf. on World Wide Web, Santa Clara, California, USA, pp. 1157-1166 (1997)
-
(1997)
Syntactic Clustering on theWeb. Proc. of Int. Conf. on World Wide Web
, pp. 1157-1166
-
-
Broder, A.1
Glassman, S.2
Manasse, M.3
Zweig, G.4
-
21
-
-
44649181012
-
-
Cesario, E., Folino, F., Locane, A., Manco, G., Ortale, R.: Boosting Text Segmentation Via Progressive Classification. Knowl. and Inf. Syst. 15(3), 285-320 (2008)
-
(2008)
Boosting Text Segmentation Via Progressive Classification. Knowl. and Inf. Syst.
, vol.15
, Issue.3
, pp. 285-320
-
-
Cesario, E.1
Folino, F.2
Locane, A.3
Manco, G.4
Ortale, R.5
-
22
-
-
1142279457
-
-
Proc. of ACM SIGMOD Conf. on Management of Data, San Diego California USA
-
Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and Efficient Fuzzy Match for Online Data Cleaning. Proc. of ACM SIGMOD Conf. on Management of Data, San Diego, California, USA, pp. 313-324 (2003)
-
(2003)
Robust and Efficient Fuzzy Match for Online Data Cleaning
, pp. 313-324
-
-
Chaudhuri, S.1
Ganjam, K.2
Ganti, V.3
Motwani, R.4
-
23
-
-
26444550791
-
-
Tokyo Japan
-
Chaudhuri, S., Ganti, V., Motwani, R.: Robust Identification of Fuzzy Duplicates. Proc. of Int. Conf. on Data Engineering, Tokyo, Japan, pp. 865-876 (2005)
-
(2005)
Robust Identification of Fuzzy Duplicates. Proc. of Int. Conf. on Data Engineering
, pp. 865-876
-
-
Chaudhuri, S.1
Ganti, V.2
Motwani, R.3
-
24
-
-
0345043999
-
-
Chavez, E., Navarro, G., Baeza-Yates, R.,Marroquin, J.L.: Searching in Metric Spaces. ACM Comput. Surv. 33(3), 273-321 (2001)
-
(2001)
Searching in Metric Spaces. ACM Comput. Surv
, vol.33
, Issue.3
, pp. 273-321
-
-
Chavez, E.1
Navarro, G.2
Baeza-Yates, R.3
Marroquin, J.L.4
-
25
-
-
84993661659
-
-
Athens Greece
-
Ciaccia, P., Patella, M., Zezula, P.: M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces. Proc. of Int. Conf. on Very Large Databases, Athens, Greece, pp. 426-435 (1997)
-
(1997)
M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces. Proc. of Int. Conf. on Very Large Databases
, pp. 426-435
-
-
Ciaccia, P.1
Patella, M.2
Zezula, P.3
-
26
-
-
80455148347
-
-
Department of Computer Sciences Purdue University
-
Cochinwala, M., Dalal, S., Elmagarmid, A.K., Verykios, V.S.: Record Matching: Past, Present and Future. Technical Report, number CSD-TR #01-013. Department of Computer Sciences, Purdue University (2001)
-
(2001)
Record Matching: Past, Present and Future Technical Report number CSD-TR #01-013
-
-
Cochinwala, M.1
Dalal, S.2
Elmagarmid, A.K.3
Verykios, V.S.4
-
27
-
-
0035452641
-
Efficient data reconciliation
-
Cochinwala, M., Kurien, V., Lalk, G., Shasha, D.: Efficient Data Reconciliation. Information Sciences 137(1-4), 1-15 (2001)
-
(2001)
Information Sciences
, vol.137
, Issue.1-4
, pp. 1-15
-
-
Cochinwala, M.1
Kurien, V.2
Lalk, G.3
Shasha, D.4
-
28
-
-
0000666461
-
Data Integration using Similarity Joins and a Word-based Information Representation Language
-
Cohen, W.W.: Data Integration using Similarity Joins and a Word-based Information Representation Language. ACM Trans. on Inf. Syst. 18(3), 228-321 (2000)
-
(2000)
ACM Trans. on Inf. Syst.
, vol.18
, Issue.3
, pp. 228-321
-
-
Cohen, W.W.1
-
29
-
-
11144240583
-
A comparison of string distance metrics for name-matching tasks
-
Acapulco Mexico
-
Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A Comparison of String Distance Metrics for Name-Matching Tasks. Proc. of IJCAI Workshop on Information Integration on the Web, Acapulco, Mexico, pp. 73-78 (2003)
-
(2003)
Proc. of IJCAI Workshop on Information Integration on the Web
, pp. 73-78
-
-
Cohen, W.W.1
Ravikumar, P.2
Fienberg, S.E.3
-
30
-
-
0242540438
-
Learning to match and cluster large high-dimensional data sets for data integration
-
Edmonton Alberta Canada
-
Cohen, W.W., Richman, J.: Learning to Match and Cluster Large High-Dimensional Data Sets for Data Integration. Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 475-480 (2002)
-
(2002)
Proc. Of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
, pp. 475-480
-
-
Cohen, W.W.1
Richman, J.2
-
31
-
-
0028424239
-
Improving generalization with active learning
-
Cohn, D.A., Atlas, L., Ladner, R.E.: Improving Generalization with Active Learning. Machine Learning 15(2), 201-221 (1994)
-
(1994)
Machine Learning
, vol.15
, Issue.2
, pp. 201-221
-
-
Cohn, D.A.1
Atlas, L.2
Ladner, R.E.3
-
32
-
-
76749114248
-
An incremental clustering scheme for data deduplication
-
Costa, G., Manco, G., Ortale, R.: An Incremental Clustering Scheme for Data Deduplication. Data Min. and Knowl. Discovery 20(1), 152-187 (2010)
-
(2010)
Data Min. and Knowl. Discovery
, vol.20
, Issue.1
, pp. 152-187
-
-
Costa, G.1
Manco, G.2
Ortale, R.3
-
33
-
-
80455148350
-
-
Database Group Leipzig
-
Database Group Leipzig. Benchmark datasets for entity resolution, http://dbs.uni-leipzig.de/en/research/projects/objectmatching/fever/benchmark datasets for entity resolution
-
Benchmark datasets for entity resolution
-
-
-
34
-
-
80455177960
-
-
Journal of the Royal Statistical Society Series B 39
-
Dempster, A.P., Laird, N.M., Rubin, D.B.:Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B 39(1), 1-28 (2001)
-
(2001)
Maximum Likelihood from Incomplete Data via the EM Algorithm
, vol.1
, pp. 1-28
-
-
Dempster, A.P.1
Laird, N.M.2
Rubin, D.B.3
-
36
-
-
33845667955
-
Duplicate record detection: A survey
-
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate Record Detection: A Survey. IEEE Transanctions on Knowledge and Data Engineering 19(1), 1-16 (2007)
-
(2007)
IEEE Transanctions on Knowledge and Data Engineering
, vol.19
, Issue.1
, pp. 1-16
-
-
Elmagarmid, A.K.1
Ipeirotis, P.G.2
Verykios, V.S.3
-
37
-
-
85170282443
-
-
Portland, Oregon USA
-
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. of Int. Conf. on Knowledge Discovery and Data Mining, Portland, Oregon, USA, pp. 226-231 (1996)
-
(1996)
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. of Int. Conf. on Knowledge Discovery and Data Mining
, pp. 226-231
-
-
Ester, M.1
Kriegel, H.P.2
Sander, J.3
Xu, X.4
-
38
-
-
0030285403
-
The KDD process for extracting useful knowledge from volumes of data
-
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Widener, T.: The KDD Process for Extracting Useful Knowledge from Volumes of Data. Communications of the ACM 39(11), 27-34 (1996)
-
(1996)
Communications of the ACM
, vol.39
, Issue.11
, pp. 27-34
-
-
Fayyad, U.1
Piatetsky-Shapiro, G.2
Smyth, P.3
Widener, T.4
-
40
-
-
0032665257
-
-
Sydney Austrialia
-
Ganti, V., Ramakrishnan, R., Gehrke, J., Powell, A.: Clustering Large Datasets in Arbitrary Metric Spaces. Proc. of Int. Conf. on Data Engineering, Sydney, Austrialia, pp. 502-511 (1999)
-
(1999)
Clustering Large Datasets in Arbitrary Metric Spaces Proc. of Int. Conf. on Data Engineering
, pp. 502-511
-
-
Ganti, V.1
Ramakrishnan, R.2
Gehrke, J.3
Powell, A.4
-
41
-
-
84947935707
-
Entity resolution: Overview and challenges
-
Atzeni P., Chu, W. Lu H. Zhou S. Ling T.-W. eds Springer, Heidelberg
-
Garcia-Molina, H.: Entity resolution: Overview and challenges. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 1-2. Springer, Heidelberg (2004)
-
(2004)
ER 2004 LNCS
, vol.3288
, pp. 1-2
-
-
Garcia-Molina, H.1
-
43
-
-
0001944742
-
-
Proc. of Int. Conf. on Very Large Databases, Edinburgh Scotland
-
Gionis, A., Indyk, P., Motwani, R.: Similarity Search in High Dimensions via Hashing. Proc. of Int. Conf. on Very Large Databases, Edinburgh, Scotland, pp. 518-529 (1999)
-
(1999)
Similarity Search in High Dimensions via Hashing
, pp. 518-529
-
-
Gionis, A.1
Indyk, P.2
Motwani, R.3
-
45
-
-
0038119396
-
Techniques of cluster algorithms in data mining
-
Grabmeier, J., Rudolph, A.: Techniques of Cluster Algorithms in Data Mining. Data Min. and Knowl. Discovery 6(4), 303-360 (2002)
-
(2002)
Data Min. and Knowl. Discovery
, vol.6
, Issue.4
, pp. 303-360
-
-
Grabmeier, J.1
Rudolph, A.2
-
46
-
-
84944318804
-
-
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate String Joins in a Database (Almost) for Free. In: Proc of Int. Conf. On Very Large Databases, Rome, Italy, pp. 491-500 (2001)
-
(2001)
Approximate String Joins in a Database Almost for Free Proc of Int. Conf. On Very Large Databases Rome Italy
, pp. 491-500
-
-
Gravano, L.1
Ipeirotis, P.G.2
Jagadish, H.V.3
Koudas, N.4
Muthukrishnan, S.5
Srivastava, D.6
-
47
-
-
80455148345
-
Record linkage: Current practice and future directions
-
Gu, L., Baxter, R.A., Vickers, D., Rainsford, C.: Record Linkage: Current Practice and Future Directions. Technical Report, number 03/83. CSIRO Mathematical and Information Sciences (2001)
-
(2001)
Technical Report number 03/83 CSIRO Mathematical and Information Sciences
-
-
Gu, L.1
Baxter, R.A.2
Vickers, D.3
Rainsford, C.4
-
48
-
-
0032091595
-
-
Proc. of ACM SIGMOD Int. Conf. on Management of Data Seattle Washington USA
-
Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithm for Large Databases. Proc. of ACM SIGMOD Int. Conf. on Management of Data, Seattle, Washington, USA, pp. 73-84 (1998)
-
(1998)
CURE: An Efficient Clustering Algorithm for Large Databases
, pp. 73-84
-
-
Guha, S.1
Rastogi, R.2
Shim, K.3
-
49
-
-
0034228041
-
-
Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. Inf. Syst. 25(5), 345-366 (2001)
-
(2001)
ROCK: A Robust Clustering Algorithm for Categorical Attributes. Inf. Syst.
, vol.25
, Issue.5
, pp. 345-366
-
-
Guha, S.1
Rastogi, R.2
Shim, K.3
-
51
-
-
72649086387
-
-
Proceedings of VLDB
-
Hassanzadeh, O., Chiang, F., Lee, H.C.,Miller, R.J.: Framework for Evaluating Clustering Algorithms in Duplicate Detection. Proceedings of VLDB 2(1), 1282-1293 (2009)
-
(2009)
Framework for Evaluating Clustering Algorithms in Duplicate Detection
, vol.2
, Issue.1
, pp. 1282-1293
-
-
Hassanzadeh, O.1
Chiang, F.2
Lee, H.C.3
Miller, R.J.4
-
52
-
-
70349826301
-
Creating probabilistic databases from duplicated data
-
Hassanzadeh, O., Miller, R.J.: Creating Probabilistic Databases from Duplicated Data. The VLDB Journal 18(5), 1141-1166 (2009)
-
(2009)
VLDB Journal
, vol.18
, Issue.5
, pp. 1141-1166
-
-
Hassanzadeh, O.1
Miller, R.J.2
-
53
-
-
84976856849
-
-
Proc. of ACM SIGMOD Int. Conf. on Management of Data San Jose California USA
-
Hernández, M.A., Stolfo, S.J.: The Merge/Purge Problem for Large Databases. Proc. of ACM SIGMOD Int. Conf. on Management of Data, San Jose, California, USA, pp. 127-138 (1995)
-
(1995)
The Merge/Purge Problem for Large Databases
, pp. 127-138
-
-
Hernández, M.A.1
Stolfo, S.J.2
-
54
-
-
0013331361
-
Real-world data is dirty: Data cleansing and the merge/purge problem
-
Hernández, M.A., Stolfo, J.: Real-world Data is Dirty: Data Cleansing and the Merge/Purge Problem. Data Min. and Knowl. Discovery 2(1), 9-37 (1998)
-
(1998)
Data Min. and Knowl. Discovery
, vol.2
, Issue.1
, pp. 9-37
-
-
Hernández, M.A.1
Stolfo, J.2
-
56
-
-
0041664272
-
Index-driven similarity search in metric spaces
-
Hjatason, G.R., Samet, H.: Index-Driven Similarity Search in Metric Spaces. ACM Trans. on Database Syst. 28(4), 517-518 (2003)
-
(2003)
ACM Trans. on Database Syst.
, vol.28
, Issue.4
, pp. 517-518
-
-
Hjatason, G.R.1
Samet, H.2
-
57
-
-
0031644241
-
-
Proc. of Symposium on Theory of Computing Dallas Texas USA
-
Indyk, P.,Motwani, R.: Approximate Nearest Neighbor - Towards Removing the Curse of Dimensionality. Proc. of Symposium on Theory of Computing, Dallas, Texas, USA, pp. 604-613 (1998)
-
(1998)
Approximate Nearest Neighbor - Towards Removing the Curse of Dimensionality
, pp. 604-613
-
-
Indyk, P.1
Motwani, R.2
-
58
-
-
33845667955
-
Duplicate Record Detection: A urvey
-
Ipeirotis, P.G., Verykios, V.S., Elmagarmid, A.K.: Duplicate Record Detection: A urvey. IEEE Trans. Knowl. Data Eng. 19(1), 1-16 (2007)
-
(2007)
IEEE Trans. Knowl. Data Eng.
, vol.19
, Issue.1
, pp. 1-16
-
-
Ipeirotis, P.G.1
Verykios, V.S.2
Elmagarmid, A.K.3
-
60
-
-
84893405732
-
Data clustering: A review
-
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Comput. Surv. 31(3), 264-323 (1999)
-
(1999)
ACM Comput. Surv.
, vol.31
, Issue.3
, pp. 264-323
-
-
Jain, A.K.1
Murty, M.N.2
Flynn, P.J.3
-
61
-
-
84950419860
-
Advances in Record Linkage Methodology as Applied to Matching the 1985 Census of Tampa Florida
-
Jaro, M.A.: Advances in Record Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida. Journal of the American Statistical Society 84, 420-424 (1989)
-
(1989)
Journal of the American Statistical Society
, vol.84
, pp. 420-424
-
-
Jaro, M.A.1
-
63
-
-
80455148340
-
Evaluation of entity resolution approaches on realworld match problems
-
Kopcke, H., Rahm, E.: Frameworks for Entity Matching: A Comparison Data and Know. Engineering 69(2), 197-210 (2010) 64. Kopcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on realworld match problems. Proc. of the VLDB Endowment 3(1), 484-493 (2010)
-
(2010)
Proc. of the VLDB Endowment
, vol.3
, Issue.1
, pp. 484-493
-
-
Kopcke, H.1
Thor, A.2
Rahm, E.3
-
64
-
-
77954338155
-
Evaluation of learning-based approaches for matching web data entities
-
Kopcke, H., Thor, A., Rahm, E.: Evaluation of Learning-Based Approaches for Matching Web Data Entities. IEEE Internet Computing 14(4), 23-31 (2010)
-
(2010)
IEEE Internet Computing
, vol.14
, Issue.4
, pp. 23-31
-
-
Kopcke, H.1
Thor, A.2
Rahm, E.3
-
67
-
-
0032652968
-
Autonomous citation matching
-
Lawrence, S., Bollacker, K., Giles, C.L.: Autonomous Citation Matching. Proc. of ACM Int. Conf. on Autonomous Agents, pp. 392-393 (1999)
-
(1999)
Proc. Of ACM Int. Conf. on Autonomous Agents
, pp. 392-393
-
-
Lawrence, S.1
Bollacker, K.2
Giles, C.L.3
-
68
-
-
0035545906
-
A knowledge-based approach for duplicate elimination in data cleaning
-
Low, W.L., Lee, M.L., Ling, T.W.: A Knowledge-Based Approach for Duplicate Elimination in Data Cleaning. Information Systems 26(8), 585-606 (2001)
-
(2001)
Information Systems
, vol.26
, Issue.8
, pp. 585-606
-
-
Low, W.L.1
Lee, M.L.2
Ling, T.W.3
-
70
-
-
0000747663
-
-
Proc. of Int. Conf. on Machine Learning Standord California USA
-
McCallum, A., Freitag, D., Pereira, F.: Maximum Entropy Markov Models for Information Extraction and Segmentation. Proc. of Int. Conf. on Machine Learning, Standord, California, USA, pp. 591-598 (2000)
-
(2000)
Maximum Entropy Markov Models for Information Extraction and Segmentation
, pp. 591-598
-
-
McCallum, A.1
Freitag, D.2
Pereira, F.3
-
71
-
-
0034592784
-
-
In: Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining Boston Massachusetts USA
-
McCallum, A., Nigam, K., Ungar, L.: Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching. Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Boston, Massachusetts, USA, pp. 169-178 (2000)
-
(2000)
Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching
, pp. 169-178
-
-
McCallum, A.1
Nigam, K.2
Ungar, L.3
-
72
-
-
80455148342
-
-
In: Int. VLDB Workshop on Clean Databases Seoul, Korea
-
Menestrina, D., Benjelloun, O., Garcia-Molina, H.: Generic Entity Resolution with Data Confidences. In: Int. VLDB Workshop on Clean Databases, Seoul, Korea (2006)
-
(2006)
Generic Entity Resolution with Data Confidences
-
-
Menestrina, D.1
Benjelloun, O.2
Garcia-Molina, H.3
-
74
-
-
0004043396
-
-
Proc. of SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery Tucson Arizona USA
-
Monge, A.E., Elkan, C.P.: An Efficient Domain-Independent Algorithm For Detecting Approximately Duplicate Database Records. Proc. of SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tucson, Arizona, USA, pp. 23-29 (1997)
-
(1997)
An Efficient Domain-Independent Algorithm For Detecting Approximately Duplicate Database Records
, pp. 23-29
-
-
Monge, A.E.1
Elkan, C.P.2
-
75
-
-
85018108837
-
-
Portland Oregon USA
-
Monge, A.E., Elkan, C.P.: The Field Matching Problem: Algorithms and Applications. Proc. of Int. Conf. on Knowledge Discovery and Data Mining, Portland, Oregon, USA, pp. 267-270 (1996)
-
(1996)
The Field Matching Problem: Algorithms and Applications. Proc. of Int. Conf. on Knowledge Discovery and Data Mining
, pp. 267-270
-
-
Monge, A.E.1
Elkan, C.P.2
-
76
-
-
35048865820
-
-
Proc. of CoopIS/DOA/ODBASE Int. Conf., Agia Napa Cyprus
-
Mukherjee, S., Ramakrishnan, I.V.: Taming the Unstructured: Creating Structured Content from Partially Labeled Schematic Text Sequences. Proc. of CoopIS/DOA/ODBASE Int. Conf., Agia Napa, Cyprus, pp. 909-926 (2004)
-
(2004)
Taming the Unstructured: Creating Structured Content from Partially Labeled Schematic Text Sequences
, pp. 909-926
-
-
Mukherjee, S.1
Ramakrishnan, I.V.2
-
77
-
-
0028959905
-
Evaluating the quality of anonymous record linkage using deterministic procedures with the New York State AIDS registry and a hospital discharge file
-
Muse, A.G., Mikl, J., Smith, P.F.: Evaluating the quality of anonymous record linkage using deterministic procedures with the New York State AIDS registry and a hospital discharge file. Statistics in Medicine 14, 499-509 (1995)
-
(1995)
Statistics in Medicine
, vol.14
, pp. 499-509
-
-
Muse, A.G.1
Mikl, J.2
Smith, P.F.3
-
78
-
-
80455138856
-
-
Proc. KDD Workshop on Data Cleaning Record Linkage and Object Consolidation Washington DC USA
-
Neiling, M., Jurk, S.: The Object Identification Framework. In: Proc. KDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation, Washington, DC, USA, pp. 37-39 (2003)
-
(2003)
Object Identification Framework
, pp. 37-39
-
-
Neiling, M.1
Jurk, S.2
-
79
-
-
33750548434
-
Privacy issues in research using record linkage
-
Neutel, C.I.: Privacy Issues in Research Using Record Linkage. Pharmcoepidemiology and Drug Safety 6, 367-369 (1997)
-
(1997)
Pharmcoepidemiology and Drug Safety
, vol.6
, pp. 367-369
-
-
Neutel, C.I.1
-
80
-
-
0014087577
-
Record linking: The design of efficient systems for linking records into individual and family histories
-
Newcombe, H.B.: Record Linking: The Design of Efficient Systems for Linking Records into Individual and Family Histories. American Journal of Human Genetics 19, 335-359 (1967)
-
(1967)
American Journal of Human Genetics
, vol.19
, pp. 335-359
-
-
Newcombe, H.B.1
-
81
-
-
0001592068
-
Automatic linkage of vital records
-
Newcombe, H.B., Kennedy, J.M., Axford, S.J., James, A.P.:Automatic Linkage of Vital Records. Science 130, 954-959 (1959)
-
(1959)
Science
, vol.130
, pp. 954-959
-
-
Newcombe, H.B.1
Kennedy, J.M.2
Axford, S.J.3
James, A.P.4
-
83
-
-
85156206690
-
-
In: Proc. of Ann. Conf. on Neural Information Processing Systems
-
Pasula, H., Marthi, B., Milch, B., Russell, S.J., Shpitser, I.: Identity Uncertainty and Citation Matching. Proc. of Ann. Conf. on Neural Information Processing Systems, pp. 1401-1408 (2002)
-
(2002)
Identity Uncertainty and Citation Matching
, pp. 1401-1408
-
-
Pasula, H.1
Marthi, B.2
Milch, B.3
Russell, S.J.4
Shpitser, I.5
-
84
-
-
0242456811
-
-
In: Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining Edmonton Alberta Canada
-
Sarawagi, S., Bhamidipaty, A.: Interactive Deduplication using Active Learning. Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 269-278 (2002)
-
(2002)
Interactive Deduplication using Active Learning
, pp. 269-278
-
-
Sarawagi, S.1
Bhamidipaty, A.2
-
86
-
-
57049103006
-
Improved approximate detection of duplicates for data streams over sliding windows
-
Shen, H., Zhang, Y.: Improved Approximate Detection of Duplicates for Data Streams over Sliding Windows. Journal of Computer Science and Technology 23(6), 973-987 (2008)
-
(2008)
Journal of Computer Science and Technology
, vol.23
, Issue.6
, pp. 973-987
-
-
Shen, H.1
Zhang, Y.2
-
87
-
-
80455136873
-
-
In: Proc. of ACM Int. Ws. on Multi-Relational Data Mining
-
Singla, P., Domingos, P.: Multi-Relational Record Linkage. Proc. of ACM Int. Ws. on Multi-Relational Data Mining, pp. 31-38 (2004)
-
(2004)
Multi-Relational Record Linkage
, pp. 31-38
-
-
Singla, P.1
Domingos, P.2
-
88
-
-
0019887799
-
Identification of common molecular subsequences
-
Smith, S., Waterman, M.S.: Identification of Common Molecular Subsequences. Journal of Molecular Biology 147(1), 195-197 (1981)
-
(1981)
Journal of Molecular Biology
, vol.147
, Issue.1
, pp. 195-197
-
-
Smith, S.1
Waterman, M.S.2
-
90
-
-
0242456803
-
-
In: Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining Edmonton Alberta Canada
-
Tejada, S., Knoblock, C.A.,Minton, S.: Learning Domain-Independent String Transformation Weights for High Accuracy Object Identification. Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 350-359 (2002)
-
(2002)
Learning Domain-Independent String Transformation Weights for High Accuracy Object Identification
, pp. 350-359
-
-
Tejada, S.1
Knoblock, C.A.2
Minton, S.3
-
92
-
-
0034228352
-
-
Verykios, V.S., Elmagarmid, A.K., Houstis, E.N.: Automating the approximate recordmatching process. Inf. Sci. 126(1-4), 83-98 (2000)
-
(2000)
Automating the approximate recordmatching process. Inf. Sci.
, vol.126
, Issue.1-4
, pp. 83-98
-
-
Verykios, V.S.1
Elmagarmid, A.K.2
Houstis, E.N.3
-
93
-
-
0000681228
-
-
Proc. of Int. Conf. on Very Large Databases, New York City USA
-
Weber, R., Schek, H.J., Blott, S.: A Quantitative Analsysis and Performance Study for Similarity Search in High-Dimensional Spaces. Proc. of Int. Conf. on Very Large Databases, New York City, USA, pp. 194-205 (1998)
-
(1998)
A Quantitative Analsysis and Performance Study for Similarity Search in High-Dimensional Spaces
, pp. 194-205
-
-
Weber, R.1
Schek, H.J.2
Blott, S.3
-
99
-
-
77649244160
-
Duplicate-insensitive order statistics computation over data streams
-
Zhang, Y., Lin, X., Yuan, Y., Kitsuregawa, M., Zhou, X., Yu, J.X.: Duplicate-insensitive Order Statistics Computation over Data Streams. IEEE Transanctions on Knowledge and Data Engineering 22(4), 493-507 (2010)
-
(2010)
IEEE Transanctions on Knowledge and Data Engineering
, vol.22
, Issue.4
, pp. 493-507
-
-
Zhang, Y.1
Lin, X.2
Yuan, Y.3
Kitsuregawa, M.4
Zhou, X.5
Yu, J.X.6
|