-
1
-
-
0038754128
-
Lower bounds for embedding edit distance into normed spaces
-
A. Andoni, M. Deza, A. Gupta, P. Indyk, and S. Raskhodnikova. Lower bounds for embedding edit distance into normed spaces. In SODA, pages 523-526, 2003.
-
(2003)
SODA
, pp. 523-526
-
-
Andoni, A.1
Deza, M.2
Gupta, A.3
Indyk, P.4
Raskhodnikova, S.5
-
2
-
-
85104914015
-
Efficient exact set-similarity joins
-
A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, 2006.
-
(2006)
VLDB
-
-
Arasu, A.1
Ganti, V.2
Kaushik, R.3
-
3
-
-
35348849154
-
Scaling up all pairs similarity search
-
R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In WWW, 2007.
-
(2007)
WWW
-
-
Bayardo, R.J.1
Ma, Y.2
Srikant, R.3
-
4
-
-
2342447399
-
Adaptive name matching in information integration
-
M. Bilenko, R. J. Mooney, W. W. Cohen, P. Ravikumar, and S. E. Fienberg. Adaptive name matching in information integration. IEEE Intelligent Sys., 18(5):16-23, 2003.
-
(2003)
IEEE Intelligent Sys
, vol.18
, Issue.5
, pp. 16-23
-
-
Bilenko, M.1
Mooney, R.J.2
Cohen, W.W.3
Ravikumar, P.4
Fienberg, S.E.5
-
5
-
-
0034831593
-
Epsilon grid order: An algorithm for the similarity join on massive high-dimensional data
-
C. Böhm, B. Braunmüller, F. Krebs, and H.-P. Kriegel. Epsilon grid order: An algorithm for the similarity join on massive high-dimensional data. In SIGMOD, pages 379-388, 2001.
-
(2001)
SIGMOD
, pp. 379-388
-
-
Böhm, C.1
Braunmüller, B.2
Krebs, F.3
Kriegel, H.-P.4
-
6
-
-
0010362121
-
Syntactic clustering of the web
-
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157-1166, 1997.
-
(1997)
Computer Networks
, vol.29
, Issue.8-13
, pp. 1157-1166
-
-
Broder, A.Z.1
Glassman, S.C.2
Manasse, M.S.3
Zweig, G.4
-
7
-
-
45249103790
-
One-gapped q-gram filters for Levenshtein distance
-
S. Burkhardt and J. Kärkkäinen. One-gapped q-gram filters for Levenshtein distance. In CPM, pages 225-234, 2002.
-
(2002)
CPM
, pp. 225-234
-
-
Burkhardt, S.1
Kärkkäinen, J.2
-
8
-
-
35448984015
-
Benchmarking declarative approximate selection predicates
-
A. Chandel, O. Hassanzadeh, N. Koudas, M. Sadoghi, and D. Srivastava. Benchmarking declarative approximate selection predicates. In SIGMOD, pages 353-364, 2007.
-
(2007)
SIGMOD
, pp. 353-364
-
-
Chandel, A.1
Hassanzadeh, O.2
Koudas, N.3
Sadoghi, M.4
Srivastava, D.5
-
9
-
-
0036040277
-
Similarity estimation techniques from rounding algorithms
-
M. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, 2002.
-
(2002)
STOC
-
-
Charikar, M.1
-
10
-
-
85011029434
-
Example-driven design of efficient record matching queries
-
S. Chaudhuri, B.-C. Chen, V. Ganti, and R. Kaushik. Example-driven design of efficient record matching queries. In VLDB, pages 327-338, 2007.
-
(2007)
VLDB
, pp. 327-338
-
-
Chaudhuri, S.1
Chen, B.-C.2
Ganti, V.3
Kaushik, R.4
-
11
-
-
84859202692
-
Data debugger: An operator-centric approach for data quality solutions
-
S. Chaudhuri, V. Ganti, and R. Kaushik. Data debugger: An operator-centric approach for data quality solutions. IEEE Data Eng. Bull., 29(2):60-66, 2006.
-
(2006)
IEEE Data Eng. Bull.
, vol.29
, Issue.2
, pp. 60-66
-
-
Chaudhuri, S.1
Ganti, V.2
Kaushik, R.3
-
12
-
-
33749597967
-
A primitive operator for similarity joins in data cleaning
-
S. Chaudhuri, V. Ganti, and R. Kaushik. A primitive operator for similarity joins in data cleaning. In ICDE, 2006.
-
(2006)
ICDE
-
-
Chaudhuri, S.1
Ganti, V.2
Kaushik, R.3
-
13
-
-
15044355327
-
Similarity search in high dimensions via hashing
-
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999.
-
(1999)
VLDB
-
-
Gionis, A.1
Indyk, P.2
Motwani, R.3
-
14
-
-
84944318804
-
Approximate string joins in a database (almost) for free
-
L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free. In VLDB, 2001.
-
(2001)
VLDB
-
-
Gravano, L.1
Ipeirotis, P.G.2
Jagadish, H.V.3
Koudas, N.4
Muthukrishnan, S.5
Srivastava, D.6
-
15
-
-
84859169313
-
-
Technical Report CUCS-011-03, Columbia University
-
L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free (erratum). Technical Report CUCS-011-03, Columbia University, 2003.
-
(2003)
Approximate string joins in a database (almost) for free (erratum)
-
-
Gravano, L.1
Ipeirotis, P.G.2
Jagadish, H.V.3
Koudas, N.4
Muthukrishnan, S.5
Srivastava, D.6
-
16
-
-
0004137004
-
-
Computer Science and Computational Biology. Cambridge University Press
-
D. Gusfield. Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. Cambridge University Press, 1997.
-
(1997)
Algorithms on Strings, Trees, and Sequences
-
-
Gusfield, D.1
-
17
-
-
84994164621
-
Evaluation of main memory join algorithms for joins with set comparison join predicates
-
S. Helmer and G. Moerkotte. Evaluation of main memory join algorithms for joins with set comparison join predicates. In VLDB, pages 386-395, 1997.
-
(1997)
VLDB
, pp. 386-395
-
-
Helmer, S.1
Moerkotte, G.2
-
18
-
-
33750296887
-
Finding near-duplicate web pages: a large-scale evaluation of algorithms
-
M. R. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR, 2006.
-
(2006)
SIGIR
-
-
Henzinger, M.R.1
-
19
-
-
0013331361
-
Real-world data is dirty: Data cleansing and the merge/purge problem
-
M. A. Hernández and S. J. Stolfo. Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery, 2(1):9-37, 1998.
-
(1998)
Data Mining and Knowledge Discovery
, vol.2
, Issue.1
, pp. 9-37
-
-
Hernández, M.A.1
Stolfo, S.J.2
-
20
-
-
85011072445
-
Extending q-grams to estimate selectivity of string matching with low edit distance
-
H. Lee, R. T. Ng, and K. Shim. Extending q-grams to estimate selectivity of string matching with low edit distance. In VLDB, pages 195-206, 2007.
-
(2007)
VLDB
, pp. 195-206
-
-
Lee, H.1
Ng, R.T.2
Shim, K.3
-
21
-
-
85011032600
-
VGRAM: Improving performance of approximate queries on string collections using variable-length grams
-
C. Li, B. Wang, and X. Yang. VGRAM: Improving performance of approximate queries on string collections using variable-length grams. In VLDB, 2007.
-
(2007)
VLDB
-
-
Li, C.1
Wang, B.2
Yang, X.3
-
22
-
-
1142267356
-
Efficient processing of joins on set-valued attributes
-
N. Mamoulis. Efficient processing of joins on set-valued attributes. In SIGMOD, pages 157-168, 2003.
-
(2003)
SIGMOD
, pp. 157-168
-
-
Mamoulis, N.1
-
23
-
-
0018985316
-
A faster algorithm computing string edit distances
-
W. J. Masek and M. Paterson. A faster algorithm computing string edit distances. J. Comput. Syst. Sci., 20(1):18-31, 1980.
-
(1980)
J. Comput. Syst. Sci.
, vol.20
, Issue.1
, pp. 18-31
-
-
Masek, W.J.1
Paterson, M.2
-
25
-
-
0000541351
-
A fast bit-vector algorithm for approximate string matching based on dynamic programming
-
G. Myers. A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM, 46(3):395-415, 1999.
-
(1999)
J. ACM
, vol.46
, Issue.3
, pp. 395-415
-
-
Myers, G.1
-
26
-
-
0345566149
-
A guided tour to approximate string matching
-
G. Navarro. A guided tour to approximate string matching. ACM Comput. Surv., 33(1):31-88, 2001.
-
(2001)
ACM Comput. Surv.
, vol.33
, Issue.1
, pp. 31-88
-
-
Navarro, G.1
-
27
-
-
0001670844
-
Performance in practice of string hashing functions
-
M. V. Ramakrishna and J. Zobel. Performance in practice of string hashing functions. In DASFAA, pages 215-224, 1997.
-
(1997)
DASFAA
, pp. 215-224
-
-
Ramakrishna, M.V.1
Zobel, J.2
-
28
-
-
10644238464
-
Set containment joins: The good, the bad and the ugly
-
K. Ramasamy, J. M. Patel, J. F. Naughton, and R. Kaushik. Set containment joins: The good, the bad and the ugly. In VLDB, pages 351-362, 2000.
-
(2000)
VLDB
, pp. 351-362
-
-
Ramasamy, K.1
Patel, J.M.2
Naughton, J.F.3
Kaushik, R.4
-
29
-
-
0242456811
-
Interactive deduplication using active learning
-
S. Sarawagi and A. Bhamidipaty. Interactive deduplication using active learning. In KDD, 2002.
-
(2002)
KDD
-
-
Sarawagi, S.1
Bhamidipaty, A.2
-
30
-
-
3142777876
-
Efficient set joins on similarity predicates
-
S. Sarawagi and A. Kirpal. Efficient set joins on similarity predicates. In SIGMOD, 2004.
-
(2004)
SIGMOD
-
-
Sarawagi, S.1
Kirpal, A.2
-
31
-
-
0004498253
-
On approximate string matching
-
E. Ukkonen. On approximate string matching. In FCT, 1983.
-
(1983)
FCT
-
-
Ukkonen, E.1
-
32
-
-
0015960104
-
The string-to-string correction problem
-
R. A. Wagner and M. J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, 1974.
-
(1974)
J. ACM
, vol.21
, Issue.1
, pp. 168-173
-
-
Wagner, R.A.1
Fischer, M.J.2
-
34
-
-
66249113620
-
Efficient similarity joins for near duplicate detection
-
C. Xiao, W. Wang, X. Lin, and J. X. Yu. Efficient similarity joins for near duplicate detection. In WWW, 2008.
-
(2008)
WWW
-
-
Xiao, C.1
Wang, W.2
Lin, X.3
Yu, J.X.4
-
35
-
-
33747729581
-
Inverted files for text search engines
-
J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv., 38(2), 2006.
-
(2006)
ACM Comput. Surv.
, vol.38
, Issue.2
-
-
Zobel, J.1
Moffat, A.2
|