-
1
-
-
2942709772
-
Methods for evaluating and creating data quality
-
W.E. Winkler, "Methods for Evaluating and Creating Data Quality," Elsevier Information Systems, vol. 29, no. 7, pp. 531-550, 2004.
-
(2004)
Elsevier Information Systems
, vol.29
, Issue.7
, pp. 531-550
-
-
Winkler, W.E.1
-
2
-
-
4344570142
-
Practical introduction to record linkage for injury research
-
D.E. Clark, "Practical Introduction to Record Linkage for Injury Research," Injury Prevention, vol. 10, pp. 186-191, 2004.
-
(2004)
Injury Prevention
, vol.10
, pp. 186-191
-
-
Clark, D.E.1
-
3
-
-
0036450652
-
Research use of linked health data - A best practice protocol
-
C.W. Kelman, J. Bass, and D. Holman, "Research Use of Linked Health Data - A Best Practice Protocol," Australian NZ J. Public Health, vol. 26, pp. 251-255, 2002.
-
(2002)
Australian NZ J. Public Health
, vol.26
, pp. 251-255
-
-
Kelman, C.W.1
Bass, J.2
Holman, D.3
-
5
-
-
45849148052
-
Effective counterterrorism and the limited role of predictive data mining
-
J. Jonas and J. Harper, "Effective Counterterrorism and the Limited Role of Predictive Data Mining," Policy Analysis, no. 584, pp. 1-11, 2006.
-
(2006)
Policy Analysis
, Issue.584
, pp. 1-11
-
-
Jonas, J.1
Harper, J.2
-
6
-
-
77956039068
-
Adaptive near-duplicate detection via similarity learning
-
H. Hajishirzi, W. Yih, and A. Kolcz, "Adaptive Near-Duplicate Detection via Similarity Learning," Proc. 33rd Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '10), pp. 419-426, 2010.
-
(2010)
Proc. 33rd Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '10)
, pp. 419-426
-
-
Hajishirzi, H.1
Yih, W.2
Kolcz, A.3
-
7
-
-
77649261370
-
Record matching over query results from multiple web databases
-
Apr.
-
W. Su, J. Wang, and F.H. Lochovsky, "Record Matching over Query Results from Multiple Web Databases," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 4, pp. 578-589, Apr. 2010.
-
(2010)
IEEE Trans. Knowledge and Data Eng.
, vol.22
, Issue.4
, pp. 578-589
-
-
Su, W.1
Wang, J.2
Lochovsky, F.H.3
-
8
-
-
33746054079
-
Adaptive product normalization: Using online learning for record linkage in comparison shopping
-
M. Bilenko, S. Basu, and M. Sahami, "Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping," Proc. IEEE Int'l Conf. Data Mining (ICDM '05), pp. 58-65, 2005.
-
(2005)
Proc. IEEE Int'l Conf. Data Mining (ICDM '05)
, pp. 58-65
-
-
Bilenko, M.1
Basu, S.2
Sahami, M.3
-
9
-
-
33846428121
-
Quality and complexity measures for data linkage and deduplication
-
ser. Studies in Computational Intelligence, F. Guillet and H. Hamilton, eds. Springer
-
P. Christen and K. Goiser, "Quality and Complexity Measures for Data Linkage and Deduplication," Quality Measures in Data Mining, ser. Studies in Computational Intelligence, F. Guillet and H. Hamilton, eds., vol. 43, Springer, pp. 127-151, 2007.
-
(2007)
Quality Measures in Data Mining
, vol.43
, pp. 127-151
-
-
Christen, P.1
Goiser, K.2
-
11
-
-
84947399464
-
A theory for record linkage
-
I.P. Fellegi and A.B. Sunter, "A Theory for Record Linkage," J. Am. Statistical Soc., vol. 64, no. 328, pp. 1183-1210, 1969.
-
(1969)
J. Am. Statistical Soc.
, vol.64
, Issue.328
, pp. 1183-1210
-
-
Fellegi, I.P.1
Sunter, A.B.2
-
13
-
-
0032091575
-
Integration of heterogeneous databases without common domains using queries based on textual similarity
-
W.W. Cohen, "Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '98), pp. 201-212, 1998.
-
(1998)
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '98)
, pp. 201-212
-
-
Cohen, W.W.1
-
14
-
-
0033891155
-
An extensible framework for data cleaning
-
H. Galhardas, D. Florescu, D. Shasha, and E. Simon, "An Extensible Framework for Data Cleaning," Proc. 16th Int'l Conf. Data Eng. (ICDE '00), 2000.
-
(2000)
Proc. 16th Int'l Conf. Data Eng. (ICDE '00)
-
-
Galhardas, H.1
Florescu, D.2
Shasha, D.3
Simon, E.4
-
15
-
-
0002490026
-
Data cleaning: Problems and current approaches
-
Dec.
-
E. Rahm and H.H. Do, "Data Cleaning: Problems and Current Approaches," IEEE Technical Committee Data Eng. Bull., vol. 23, no. 4, pp. 3-13, Dec. 2000.
-
(2000)
IEEE Technical Committee Data Eng. Bull.
, vol.23
, Issue.4
, pp. 3-13
-
-
Rahm, E.1
Do, H.H.2
-
18
-
-
33845667955
-
Duplicate record detection: A survey
-
Jan.
-
A.K. Elmagarmid, P.G. Ipeirotis, and V.S. Verykios, "Duplicate Record Detection: A Survey," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 1, pp. 1-16, Jan. 2007.
-
(2007)
IEEE Trans. Knowledge and Data Eng.
, vol.19
, Issue.1
, pp. 1-16
-
-
Elmagarmid, A.K.1
Ipeirotis, P.G.2
Verykios, V.S.3
-
20
-
-
34248229658
-
Collective entity resolution in relational data
-
I. Bhattacharya and L. Getoor, "Collective Entity Resolution in Relational Data," ACM Trans. Knowledge Discovery from Data, vol. 1, no. 1, pp. 5-es, 2007.
-
(2007)
ACM Trans. Knowledge Discovery from Data
, vol.1
, Issue.1
, pp. 5-es
-
-
Bhattacharya, I.1
Getoor, L.2
-
21
-
-
74549185155
-
Similarity-aware indexing for real-time entity resolution
-
P. Christen, R. Gayler, and D. Hawking, "Similarity-Aware Indexing for Real-Time Entity Resolution," Proc. 18th ACM Conf. Information and Knowledge Management (CIKM '09), pp. 1565-1568, 2009.
-
(2009)
Proc. 18th ACM Conf. Information and Knowledge Management (CIKM '09)
, pp. 1565-1568
-
-
Christen, P.1
Gayler, R.2
Hawking, D.3
-
22
-
-
70849098813
-
Entity resolution with iterative blocking
-
S.E. Whang, D. Menestrina, G Koutrika, M. Theobald, and H. Garcia-Molina, "Entity Resolution with Iterative Blocking," Proc. 35th ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '09), pp. 219-232, 2009.
-
(2009)
Proc. 35th ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '09)
, pp. 219-232
-
-
Whang, S.E.1
Menestrina, D.2
Koutrika, G.3
Theobald, M.4
Garcia-Molina, H.5
-
23
-
-
29844452555
-
Reference reconciliation in complex information spaces
-
X. Dong, A. Halevy, and J. Madhavan, "Reference Reconciliation in Complex Information Spaces," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '05), pp. 85-96, 2005.
-
(2005)
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '05)
, pp. 85-96
-
-
Dong, X.1
Halevy, A.2
Madhavan, J.3
-
25
-
-
84884417241
-
Preparation of name and address data for record linkage using hidden Markov models
-
T. Churches, P. Christen, K. Lim, and J.X. Zhu, "Preparation of Name and Address Data for Record Linkage Using Hidden Markov Models," BioMed Central Medical Informatics and Decision Making, vol. 2, no. 9, 2002.
-
(2002)
BioMed Central Medical Informatics and Decision Making
, vol.2
, Issue.9
-
-
Churches, T.1
Christen, P.2
Lim, K.3
Zhu, J.X.4
-
26
-
-
65449178105
-
Febrl: An open source data cleaning, deduplication and record linkage system with a graphical user interface
-
P. Christen, "Febrl: An Open Source Data Cleaning, Deduplication and Record Linkage System With a Graphical User Interface," Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '08), pp. 1065-1068, 2008.
-
(2008)
Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '08)
, pp. 1065-1068
-
-
Christen, P.1
-
27
-
-
84870515729
-
Decision models for record linkage
-
LNCS 3755, Springer
-
L. Gu and R. Baxter, "Decision Models for Record Linkage," Selected Papers from AusDM, LNCS 3755, Springer, 2006.
-
(2006)
Selected Papers from AusDM
-
-
Gu, L.1
Baxter, R.2
-
32
-
-
5444258997
-
A comparison of fast blocking methods for record linkage
-
R. Baxter, P. Christen, and T. Churches, "A Comparison of Fast Blocking Methods for Record Linkage," Proc. ACM Workshop Data Cleaning, Record Linkage and Object Consolidation (SIGKDD '03), pp. 25-27, 2003.
-
(2003)
Proc. ACM Workshop Data Cleaning, Record Linkage and Object Consolidation (SIGKDD '03)
, pp. 25-27
-
-
Baxter, R.1
Christen, P.2
Churches, T.3
-
33
-
-
47949115568
-
On the use of semantic blocking techniques for data cleansing and integration
-
J. Nin, V. Muntes-Mulero, N. Martinez-Bazan, and J.-L. Larriba-Pey, "On the Use of Semantic Blocking Techniques for Data Cleansing and Integration," Proc. 11th Int'l Database Eng. and Applications Symp. (IDEAS '07), 2007.
-
(2007)
Proc. 11th Int'l Database Eng. and Applications Symp. (IDEAS '07)
-
-
Nin, J.1
Muntes-Mulero, V.2
Martinez-Bazan, N.3
Larriba-Pey, J.-L.4
-
34
-
-
0013331361
-
Real-world data is dirty: Data cleansing and the merge/Purge problem
-
M.A. Hernandez and S.J. Stolfo, "Real-World Data is Dirty: Data Cleansing and the Merge/Purge Problem," Data Mining and Knowledge Discovery, vol. 2, no. 1, pp. 9-37, 1998.
-
(1998)
Data Mining and Knowledge Discovery
, vol.2
, Issue.1
, pp. 9-37
-
-
Hernandez, M.A.1
Stolfo, S.J.2
-
38
-
-
4944221285
-
Finding authoritative people from the web
-
M. Harada, S. Sato, and K. Kazama, "Finding Authoritative People from the Web," Proc. ACM/IEEE-CS Joint Conf. Digital Libraries, pp. 306-313, 2004.
-
(2004)
Proc. ACM/IEEE-CS Joint Conf. Digital Libraries
, pp. 306-313
-
-
Harada, M.1
Sato, S.2
Kazama, K.3
-
39
-
-
65449165865
-
Towards parameter-free blocking for scalable record linkage
-
The Australian Nat'l Univ.
-
P. Christen, "Towards Parameter-Free Blocking for Scalable Record Linkage," Technical Report TR-CS-07-03, Dept. of Computer Science, The Australian Nat'l Univ., 2007.
-
(2007)
Technical Report TR-CS-07-03, Dept. of Computer Science
-
-
Christen, P.1
-
40
-
-
36348961379
-
Adaptive sorted neighborhood methods for efficient record linkage
-
S. Yan, D. Lee, M.Y. Kan, and L.C. Giles, "Adaptive Sorted Neighborhood Methods for Efficient Record Linkage," Proc. Seventh ACM/IEEE-CS Joint Conf. Digital Libraries (JCDL '07), 2007.
-
(2007)
Proc. Seventh ACM/IEEE-CS Joint Conf. Digital Libraries (JCDL '07)
-
-
Yan, S.1
Lee, D.2
Kan, M.Y.3
Giles, L.C.4
-
41
-
-
84888417083
-
A comparison and generalization of blocking and windowing algorithms for duplicate detection
-
U. Draisbach and F. Naumann, "A Comparison and Generalization of Blocking and Windowing Algorithms for Duplicate Detection," Proc. Workshop Quality in Databases (VLDB '09), 2009.
-
(2009)
Proc. Workshop Quality in Databases (VLDB '09)
-
-
Draisbach, U.1
Naumann, F.2
-
42
-
-
84944318804
-
Approximate string joins in a database (Almost) for free
-
L. Gravano, P.G. Ipeirotis, H.V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava, "Approximate String Joins in a Database (Almost) for Free," Proc. 27th Int'l Conf. Very Large Data Bases (VLDB '01), pp. 491-500, 2001.
-
(2001)
Proc. 27th Int'l Conf. Very Large Data Bases (VLDB '01)
, pp. 491-500
-
-
Gravano, L.1
Ipeirotis, P.G.2
Jagadish, H.V.3
Koudas, N.4
Muthukrishnan, S.5
Srivastava, D.6
-
43
-
-
74549152150
-
Robust record linkage blocking using suffix arrays
-
T. de Vries, H. Ke, S. Chawla, and P. Christen, "Robust Record Linkage Blocking Using Suffix Arrays," Proc. ACM Conf. Information and Knowledge Management (CIKM '09), pp. 305-314. 2009.
-
(2009)
Proc. ACM Conf. Information and Knowledge Management (CIKM '09)
, pp. 305-314
-
-
De Vries, T.1
Ke, H.2
Chawla, S.3
Christen, P.4
-
44
-
-
0034592784
-
Efficient clustering of high-dimensional data sets with application to reference matching
-
A. McCallum, K. Nigam, and L.H. Ungar, "Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching," Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '00), pp. 169-178, 2000.
-
(2000)
Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '00)
, pp. 169-178
-
-
McCallum, A.1
Nigam, K.2
Ungar, L.H.3
-
45
-
-
84943425383
-
Efficient record linkage in large data sets
-
L. Jin, C. Li, and S. Mehrotra, "Efficient Record Linkage in Large Data Sets," Proc. Eighth Int'l Conf. Database Systems for Advanced Applications (DASFAA '03), pp. 137-146, 2003.
-
(2003)
Proc. Eighth Int'l Conf. Database Systems for Advanced Applications (DASFAA '03)
, pp. 137-146
-
-
Jin, L.1
Li, C.2
Mehrotra, S.3
-
46
-
-
84976803260
-
Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets
-
C. Faloutsos and K.-I. Lin, "Fastmap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '95), pp. 163-174, 1995.
-
(1995)
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '95)
, pp. 163-174
-
-
Faloutsos, C.1
Lin, K.-I.2
-
48
-
-
84920600570
-
Efficient record linkage using a double embedding scheme
-
N. Adly, "Efficient Record Linkage Using a Double Embedding Scheme," Proc. Int'l Conf. Data Mining (DMIN '09), pp. 274-281, 2009.
-
(2009)
Proc. Int'l Conf. Data Mining (DMIN '09)
, pp. 274-281
-
-
Adly, N.1
-
50
-
-
79952543891
-
Robust record linkage blocking using suffix arrays and bloom filters
-
T. de Vries, H. Ke, S. Chawla, and P. Christen, "Robust Record Linkage Blocking Using Suffix Arrays and Bloom Filters," ACM Trans. Knowledge Discovery from Data, vol. 5, no. 2, pp. 1-27, 2011.
-
(2011)
ACM Trans. Knowledge Discovery from Data
, vol.5
, Issue.2
, pp. 1-27
-
-
De Vries, T.1
Ke, H.2
Chawla, S.3
Christen, P.4
-
51
-
-
77956549963
-
Industry-scale duplicate detection
-
M. Weis, F. Naumann, U. Jehle, J. Lufter, and H. Schuster, "Industry-Scale Duplicate Detection," Proc. VLDB Endowment, vol. 1, no. 2, pp. 1253-1264, 2008.
-
(2008)
Proc. VLDB Endowment
, vol.1
, Issue.2
, pp. 1253-1264
-
-
Weis, M.1
Naumann, F.2
Jehle, U.3
Lufter, J.4
Schuster, H.5
-
52
-
-
84878049861
-
Adaptive blocking: Learning to scale up record linkage
-
M. Bilenko, B. Kamath, and R.J. Mooney, "Adaptive Blocking: Learning to Scale up Record Linkage," Proc. Sixth Int'l Conf. Data Mining (ICDM '06), pp. 87-96, 2006.
-
(2006)
Proc. Sixth Int'l Conf. Data Mining (ICDM '06)
, pp. 87-96
-
-
Bilenko, M.1
Kamath, B.2
Mooney, R.J.3
-
54
-
-
79251527493
-
Efficient techniques for online record linkage
-
Mar.
-
D. Dey, V. Mookerjee, and D. Liu, "Efficient Techniques for Online Record Linkage," IEEE Trans. Knowledge and Data Eng., vol. 23, no. 3, pp. 373-387, Mar. 2011.
-
(2011)
IEEE Trans. Knowledge and Data Eng.
, vol.23
, Issue.3
, pp. 373-387
-
-
Dey, D.1
Mookerjee, V.2
Liu, D.3
-
56
-
-
67649641448
-
Space-constrained gram-based indexing for efficient approximate string search
-
A. Behm, S. Ji, C. Li, and J. Lu, "Space-Constrained Gram-Based Indexing for Efficient Approximate String Search," Proc. IEEE Int'l Conf. Data Eng. (ICDE '09), pp. 604-615, 2009.
-
(2009)
Proc. IEEE Int'l Conf. Data Eng. (ICDE '09)
, pp. 604-615
-
-
Behm, A.1
Ji, S.2
Li, C.3
Lu, J.4
-
57
-
-
85123004356
-
Flexible string matching against large databases in practice
-
N. Koudas, A. Marathe, and D. Srivastava, "Flexible String Matching against Large Databases in Practice," Proc. 13th Int'l Conf. Very Large Data Bases (VLDB '04), pp. 1086-1094, 2004.
-
(2004)
Proc. 13th Int'l Conf. Very Large Data Bases (VLDB '04)
, pp. 1086-1094
-
-
Koudas, N.1
Marathe, A.2
Srivastava, D.3
-
59
-
-
70849105253
-
Ed-join: An efficient algorithm for similarity joins with edit distance constraints
-
C. Xiao, W. Wang, and X. Lin, "Ed-Join: An Efficient Algorithm for Similarity Joins with Edit Distance Constraints," Proc. VLDB Endowment, vol. 1, no. 1, pp. 933-944, 2008.
-
(2008)
Proc. VLDB Endowment
, vol.1
, Issue.1
, pp. 933-944
-
-
Xiao, C.1
Wang, W.2
Lin, X.3
-
60
-
-
77955171784
-
Effectively indexing the uncertain space
-
Sept.
-
Y. Zhang, X. Lin, W. Zhang, J. Wang, and Q. Lin, "Effectively Indexing the Uncertain Space," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 9, pp. 1247-1261, Sept. 2010.
-
(2010)
IEEE Trans. Knowledge and Data Eng.
, vol.22
, Issue.9
, pp. 1247-1261
-
-
Zhang, Y.1
Lin, X.2
Zhang, W.3
Wang, J.4
Lin, Q.5
-
61
-
-
77955134730
-
Scalable probabilistic similarity ranking in uncertain databases
-
Sept.
-
T. Bernecker, H.-P. Kriegel, N. Mamoulis, M. Renz, and A. Zuefle, "Scalable Probabilistic Similarity Ranking in Uncertain Databases," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 9, pp. 1234-1246, Sept. 2010.
-
(2010)
IEEE Trans. Knowledge and Data Eng.
, vol.22
, Issue.9
, pp. 1234-1246
-
-
Bernecker, T.1
Kriegel, H.-P.2
Mamoulis, N.3
Renz, M.4
Zuefle, A.5
|