-
1
-
-
33750296887
-
Finding near-duplicate web pages: A large-scale evaluation of algorithms
-
M. R. Henzinger, "Finding near-duplicate web pages: a large-scale evaluation of algorithms," in SIGIR, 2006.
-
(2006)
SIGIR
-
-
Henzinger, M.R.1
-
2
-
-
0032091575
-
Integration of heterogeneous databases without common domains using queries based on textual similarity
-
W. W. Cohen, "Integration of heterogeneous databases without common domains using queries based on textual similarity," in SIGMOD Conference, 1998, pp. 201 -212.
-
(1998)
SIGMOD Conference
, pp. 201-212
-
-
Cohen, W.W.1
-
3
-
-
67649644273
-
-
W. E. Winkler, The state of record linkage and current research problems, U.S. Bureau of the Census, Tech. Rep., 1999.
-
W. E. Winkler, "The state of record linkage and current research problems," U.S. Bureau of the Census, Tech. Rep., 1999.
-
-
-
-
4
-
-
35348849154
-
-
R. J. Bayardo, Y. Ma, and R. Srikant, Scaling up all pairs similarity search, in WWW, 2007.
-
R. J. Bayardo, Y. Ma, and R. Srikant, "Scaling up all pairs similarity search," in WWW, 2007.
-
-
-
-
5
-
-
33749597967
-
A primitive operator for similarity joins in data cleaning
-
S. Chaudhuri, V. Ganti, and R. Kaushik, "A primitive operator for similarity joins in data cleaning," in ICDE, 2006.
-
(2006)
ICDE
-
-
Chaudhuri, S.1
Ganti, V.2
Kaushik, R.3
-
6
-
-
67649653665
-
-
C. Xiao, W. Wang, X. Lin, and J. X. Yu, Efficient similarity joins for near duplicate detection, in WWW, 2008.
-
C. Xiao, W. Wang, X. Lin, and J. X. Yu, "Efficient similarity joins for near duplicate detection," in WWW, 2008.
-
-
-
-
7
-
-
85104914015
-
Efficient exact set-similarity joins
-
A. Arasu, V. Ganti, and R. Kaushik, "Efficient exact set-similarity joins," in VLDB, 2006.
-
(2006)
VLDB
-
-
Arasu, A.1
Ganti, V.2
Kaushik, R.3
-
8
-
-
0039253795
-
Closest pair queries in spatial databases
-
A. Corral, Y. Manolopoulos, Y. Theodoridis, and M. Vassilakopoulos, "Closest pair queries in spatial databases," in SIGMOD Conference, 2000, pp. 189 -200.
-
(2000)
SIGMOD Conference
, pp. 189-200
-
-
Corral, A.1
Manolopoulos, Y.2
Theodoridis, Y.3
Vassilakopoulos, M.4
-
9
-
-
84948989773
-
The impact of buffering on closest pairs queries using r-trees
-
A. Corral, M. Vassilakopoulos, and Y. Manolopoulos, "The impact of buffering on closest pairs queries using r-trees," in ADBIS, 2001, pp. 41 -54.
-
(2001)
ADBIS
, pp. 41-54
-
-
Corral, A.1
Vassilakopoulos, M.2
Manolopoulos, Y.3
-
10
-
-
1642398164
-
Algorithms for processing k-closest-pair queries in spatial databases
-
A. Corral, Y. Manolopoulos, Y. Theodoridis, and M. Vassilakopoulos, "Algorithms for processing k-closest-pair queries in spatial databases," Data Knowl. Eng., vol. 49, no. 1, pp. 67 -104, 2004.
-
(2004)
Data Knowl. Eng
, vol.49
, Issue.1
, pp. 67-104
-
-
Corral, A.1
Manolopoulos, Y.2
Theodoridis, Y.3
Vassilakopoulos, M.4
-
12
-
-
85011032600
-
VGRAM: Improving performance of approximate queries on string collections using variable-length grams
-
C. Li, B. Wang, and X. Yang, "VGRAM: Improving performance of approximate queries on string collections using variable-length grams," in VLDB, 2007.
-
(2007)
VLDB
-
-
Li, C.1
Wang, B.2
Yang, X.3
-
13
-
-
85011072445
-
Extending q-grams to estimate selectivity of string matching with low edit distance
-
H. Lee, R. T. Ng, and K. Shim, "Extending q-grams to estimate selectivity of string matching with low edit distance," in VLDB, 2007, pp. 195 -206.
-
(2007)
VLDB
, pp. 195-206
-
-
Lee, H.1
Ng, R.T.2
Shim, K.3
-
14
-
-
0013331361
-
Real-world data is dirty: Data cleansing and the merge/purge problem
-
M. A. Hernández and S. J. Stolfo, "Real-world data is dirty: Data cleansing and the merge/purge problem," Data Mining and Knowledge Discovery, vol. 2, no. 1, pp. 9 -37, 1998.
-
(1998)
Data Mining and Knowledge Discovery
, vol.2
, Issue.1
, pp. 9-37
-
-
Hernández, M.A.1
Stolfo, S.J.2
-
15
-
-
0242456811
-
-
S. Sarawagi and A. Bhamidipaty, Interactive deduplication using active learning, in KDD, 2002.
-
S. Sarawagi and A. Bhamidipaty, "Interactive deduplication using active learning," in KDD, 2002.
-
-
-
-
16
-
-
2342447399
-
Adaptive name matching in information integration
-
M. Bilenko, R. J. Mooney, W. W. Cohen, P. Ravikumar, and S. E. Fienberg, "Adaptive name matching in information integration," IEEE Intelligent Sys., vol. 18, no. 5, pp. 16 -23, 2003.
-
(2003)
IEEE Intelligent Sys
, vol.18
, Issue.5
, pp. 16-23
-
-
Bilenko, M.1
Mooney, R.J.2
Cohen, W.W.3
Ravikumar, P.4
Fienberg, S.E.5
-
17
-
-
3142777876
-
Efficient set joins on similarity predicates
-
S. Sarawagi and A. Kirpal, "Efficient set joins on similarity predicates," in SIGMOD, 2004.
-
(2004)
SIGMOD
-
-
Sarawagi, S.1
Kirpal, A.2
-
18
-
-
52649137537
-
Transformation-based framework for record matching
-
A. Arasu, S. Chaudhuri, and R. Kaushik, "Transformation-based framework for record matching," in ICDE, 2008, pp. 40 -49.
-
(2008)
ICDE
, pp. 40-49
-
-
Arasu, A.1
Chaudhuri, S.2
Kaushik, R.3
-
19
-
-
52649161208
-
A fast similarity join algorithm using graphics processing units
-
M. D. Lieberman, J. Sankaranarayanan, and H. Samet, "A fast similarity join algorithm using graphics processing units," in ICDE, 2008, pp. 1111 -1120.
-
(2008)
ICDE
, pp. 1111-1120
-
-
Lieberman, M.D.1
Sankaranarayanan, J.2
Samet, H.3
-
20
-
-
0010362121
-
Syntactic clustering of the web
-
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, "Syntactic clustering of the web," Computer Networks, vol. 29, no. 8-13, pp. 1157 -1166, 1997.
-
(1997)
Computer Networks
, vol.29
, Issue.8-13
, pp. 1157-1166
-
-
Broder, A.Z.1
Glassman, S.C.2
Manasse, M.S.3
Zweig, G.4
-
21
-
-
0013206133
-
Collection statistics for fast duplicate document detection
-
A. Chowdhury, O. Frieder, D. A. Grossman, and M. C. McCabe, "Collection statistics for fast duplicate document detection," ACM Trans. Inf. Syst., vol. 20, no. 2, pp. 171 -191, 2002.
-
(2002)
ACM Trans. Inf. Syst
, vol.20
, Issue.2
, pp. 171-191
-
-
Chowdhury, A.1
Frieder, O.2
Grossman, D.A.3
McCabe, M.C.4
-
22
-
-
0036040277
-
Similarity estimation techniques from rounding algorithms
-
M. Charikar, "Similarity estimation techniques from rounding algorithms," in STOC, 2002.
-
(2002)
STOC
-
-
Charikar, M.1
-
23
-
-
15044355327
-
Similarity search in high dimensions via hashing
-
A. Gionis, P. Indyk, and R. Motwani, "Similarity search in high dimensions via hashing," in VLDB, 1999.
-
(1999)
VLDB
-
-
Gionis, A.1
Indyk, P.2
Motwani, R.3
-
24
-
-
1142279457
-
Robust and efficient fuzzy match for online data cleaning
-
S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, "Robust and efficient fuzzy match for online data cleaning," in SIGMOD Conference, 2003, pp. 313 -324.
-
(2003)
SIGMOD Conference
, pp. 313-324
-
-
Chaudhuri, S.1
Ganjam, K.2
Ganti, V.3
Motwani, R.4
-
25
-
-
84944318804
-
Approximate string joins in a database (almost) for free
-
L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava, "Approximate string joins in a database (almost) for free," in VLDB, 2001.
-
(2001)
VLDB
-
-
Gravano, L.1
Ipeirotis, P.G.2
Jagadish, H.V.3
Koudas, N.4
Muthukrishnan, S.5
Srivastava, D.6
-
26
-
-
52649086729
-
Efficient merging and filtering algorithms for approximate string searches
-
C. Li, J. Lu, and Y. Lu, "Efficient merging and filtering algorithms for approximate string searches," in ICDE, 2008, pp. 257 -266.
-
(2008)
ICDE
, pp. 257-266
-
-
Li, C.1
Lu, J.2
Lu, Y.3
-
27
-
-
52649145249
-
Fast indexes and algorithms for set similarity selection queries
-
M. Hadjieleftheriou, A. Chandel, N. Koudas, and D. Srivastava, "Fast indexes and algorithms for set similarity selection queries," in ICDE, 2008, pp. 267 -276.
-
(2008)
ICDE
, pp. 267-276
-
-
Hadjieleftheriou, M.1
Chandel, A.2
Koudas, N.3
Srivastava, D.4
-
28
-
-
67649639190
-
-
E. Ukkonen, On approximate string matching, in FCT, 1983.
-
E. Ukkonen, "On approximate string matching," in FCT, 1983.
-
-
-
-
29
-
-
34547618938
-
On the resemblance and containment of documents
-
A. Z. Broder, "On the resemblance and containment of documents," in SEQS, 1997.
-
(1997)
SEQS
-
-
Broder, A.Z.1
-
30
-
-
67649666566
-
-
R. C. Russell, Index, U.S. patent 1,261,167, April 1918.
-
R. C. Russell, "Index, U.S. patent 1,261,167," April 1918.
-
-
-
-
31
-
-
0033075316
-
Combining fuzzy information from multiple systems
-
R. Fagin, "Combining fuzzy information from multiple systems," J. Comput. Syst. Sci., vol. 58, no. 1, pp. 83 -99, 1999.
-
(1999)
J. Comput. Syst. Sci
, vol.58
, Issue.1
, pp. 83-99
-
-
Fagin, R.1
-
32
-
-
0038504811
-
Optimal aggregation algorithms for middleware
-
R. Fagin, A. Lotem, and M. Naor, "Optimal aggregation algorithms for middleware," J. Comput. Syst. Sci., vol. 66, no. 4, pp. 614 -656, 2003.
-
(2003)
J. Comput. Syst. Sci
, vol.66
, Issue.4
, pp. 614-656
-
-
Fagin, R.1
Lotem, A.2
Naor, M.3
-
33
-
-
29844452789
-
Automated ranking of database query results
-
S. Agrawal, S. Chaudhuri, G. Das, and A. Gionis, "Automated ranking of database query results," in CIDR, 2003.
-
(2003)
CIDR
-
-
Agrawal, S.1
Chaudhuri, S.2
Das, G.3
Gionis, A.4
-
34
-
-
0036372482
-
Minimal probing: Supporting expensive predicates for top-k queries
-
K. C.-C. Chang and S. won Hwang, "Minimal probing: supporting expensive predicates for top-k queries," in SIGMOD Conference, 2002, pp. 346 -357.
-
(2002)
SIGMOD Conference
, pp. 346-357
-
-
Chang, K.C.-C.1
won Hwang, S.2
-
35
-
-
35448984017
-
SPARK: Top-k keyword query in relational databases
-
Y. Luo, X. Lin, W. Wang, and X. Zhou, "SPARK: top-k keyword query in relational databases," in SIGMOD Conference, 2007, pp. 115 -126.
-
(2007)
SIGMOD Conference
, pp. 115-126
-
-
Luo, Y.1
Lin, X.2
Wang, W.3
Zhou, X.4
-
36
-
-
17044386180
-
Top-k spatial joins
-
M. Zhu, D. Papadias, J. Zhang, and D. L. Lee, "Top-k spatial joins," IEEE Trans. Knowl. Data Eng., vol. 17, no. 4, pp. 567 -579, 2005.
-
(2005)
IEEE Trans. Knowl. Data Eng
, vol.17
, Issue.4
, pp. 567-579
-
-
Zhu, M.1
Papadias, D.2
Zhang, J.3
Lee, D.L.4
|