-
1
-
-
38749118638
-
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
-
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS, p. 459-468, 2006.
-
(2006)
FOCS
, pp. 459-468
-
-
Andoni, A.1
Indyk, P.2
-
2
-
-
0005540823
-
-
Addison-Wesley Longman Publishing Co, Inc, Boston, MA, USA
-
R. A. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.
-
(1999)
Modern Information Retrieval
-
-
Baeza-Yates, R.A.1
Ribeiro-Neto, B.2
-
3
-
-
38149034062
-
LSH forest: Self-tuning indexes for similarity search
-
M. Bawa, T. Condie, and P. Ganesan. LSH forest: self-tuning indexes for similarity search. In WWW, p. 651-660, 2005.
-
(2005)
, pp. 651-660
-
-
Bawa, M.1
Condie, T.2
Ganesan, P.3
-
4
-
-
84976810280
-
Copy detection mechanisms for digital documents
-
S. Brin, J. Davis, and H. García-Molina. Copy detection mechanisms for digital documents. In SIGMOD, p. 398-409, 1995.
-
(1995)
SIGMOD
, pp. 398-409
-
-
Brin, S.1
Davis, J.2
García-Molina, H.3
-
5
-
-
79956075292
-
Identifying and filtering near-duplicate documents
-
A. Z. Broder. Identifying and filtering near-duplicate documents. In COM, p. 1-10, 2000.
-
(2000)
COM
, pp. 1-10
-
-
Broder, A.Z.1
-
6
-
-
57349121454
-
A derandomization using min-wise independent permutations
-
A. Z. Broder, M. Charikar, and M. Mitzenmacher. A derandomization using min-wise independent permutations. J. Discrete Algorithms, 1(1):11-20, 2003.
-
(2003)
J. Discrete Algorithms
, vol.1
, Issue.1
, pp. 11-20
-
-
Broder, A.Z.1
Charikar, M.2
Mitzenmacher, M.3
-
7
-
-
0010362121
-
Syntactic clustering of the Web
-
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the Web. Computer Networks, 29(8-13):1157-1166, 1997.
-
(1997)
Computer Networks
, vol.29
, Issue.8-13
, pp. 1157-1166
-
-
Broder, A.Z.1
Glassman, S.C.2
Manasse, M.S.3
Zweig, G.4
-
8
-
-
0002705495
-
Automatic retrieval with locality information using SMART
-
C. Buckley, G. Salton, and J. Allan. Automatic retrieval with locality information using SMART. In TREC, p. 59-72, 1992.
-
(1992)
TREC
, pp. 59-72
-
-
Buckley, C.1
Salton, G.2
Allan, J.3
-
9
-
-
34547631801
-
A document-centric approach to static index pruning in text retrieval systems
-
S. Büttcher and C L. A. Clarke. A document-centric approach to static index pruning in text retrieval systems. In CIKM, p. 182-189, 2006.
-
(2006)
CIKM
, pp. 182-189
-
-
Büttcher, S.1
Clarke, C.L.A.2
-
10
-
-
0036040277
-
Similarity estimation techniques from rounding algorithms
-
M. S. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, p. 380-388, 2002.
-
(2002)
STOC
, pp. 380-388
-
-
Charikar, M.S.1
-
11
-
-
33747096982
-
Stanford WebBase components and applications
-
153-186
-
J. Cho, H. Garcia-Molina, T. Haveliwala, W. Lam, A. Paepcke, S. Raghavan, and G. Wesley. Stanford WebBase components and applications. ACM Trans. Inter. Tech., 6(2):153-186, 2006.
-
(2006)
ACM Trans. Inter. Tech
, vol.6
, Issue.2
-
-
Cho, J.1
Garcia-Molina, H.2
Haveliwala, T.3
Lam, W.4
Paepcke, A.5
Raghavan, S.6
Wesley, G.7
-
12
-
-
0013206133
-
Collection statistics for fast duplicate document detection
-
A. Chowdhury, O. Frieder, D. Grossman, and M. C. McCabe. Collection statistics for fast duplicate document detection. ACM Trans. Inf. Syst., 20(2):171-191, 2002.
-
(2002)
ACM Trans. Inf. Syst
, vol.20
, Issue.2
, pp. 171-191
-
-
Chowdhury, A.1
Frieder, O.2
Grossman, D.3
McCabe, M.C.4
-
13
-
-
0035051307
-
Finding interesting associations without support pruning
-
E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang. Finding interesting associations without support pruning. Knowledge and Data Engineering, 13(1):64-78, 2001.
-
(2001)
Knowledge and Data Engineering
, vol.13
, Issue.1
, pp. 64-78
-
-
Cohen, E.1
Datar, M.2
Fujiwara, S.3
Gionis, A.4
Indyk, P.5
Motwani, R.6
Ullman, J.D.7
Yang, C.8
-
14
-
-
12244271239
-
Online duplicate document detection: Signature reliability in a dynamic retrieval environment
-
J. G. Conrad, X. S. Guo, and C. P. Schriber. Online duplicate document detection: signature reliability in a dynamic retrieval environment. In CIKM, p. 443-452, 2003.
-
(2003)
CIKM
, pp. 443-452
-
-
Conrad, J.G.1
Guo, X.S.2
Schriber, C.P.3
-
15
-
-
15044355327
-
Similarity search in high dimensions via hashing
-
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, p. 518-529, 1999.
-
(1999)
VLDB
, pp. 518-529
-
-
Gionis, A.1
Indyk, P.2
Motwani, R.3
-
16
-
-
33750296887
-
Finding near-duplicate Web pages: A large-scale evaluation of algorithms
-
M. Henzinger. Finding near-duplicate Web pages: a large-scale evaluation of algorithms. In SIGIR, p. 284-291, 2006.
-
(2006)
SIGIR
, pp. 284-291
-
-
Henzinger, M.1
-
17
-
-
0037319544
-
Methods for identifying versioned and plagiarized documents
-
T. C. Hoad and J. Zobel. Methods for identifying versioned and plagiarized documents. JASIST, 54(3):203-215, 2003.
-
(2003)
JASIST
, vol.54
, Issue.3
, pp. 203-215
-
-
Hoad, T.C.1
Zobel, J.2
-
18
-
-
0344612511
-
A small approximately min-wise independent family of hash functions
-
P. Indyk. A small approximately min-wise independent family of hash functions. J. Algorithms, 38(1):84-90, 2001.
-
(2001)
J. Algorithms
, vol.38
, Issue.1
, pp. 84-90
-
-
Indyk, P.1
-
20
-
-
0031644241
-
Approximate nearest neighbors: Towards removing the curse of dimensionality
-
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC, p. 604-613, 1998.
-
(1998)
STOC
, pp. 604-613
-
-
Indyk, P.1
Motwani, R.2
-
21
-
-
9444294778
-
From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering
-
D. Klein, S. D. Kamvar, and C. D. Manning. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In ICML, p. 307-314, 2002.
-
(2002)
ICML
, pp. 307-314
-
-
Klein, D.1
Kamvar, S.D.2
Manning, C.D.3
-
22
-
-
12244261882
-
Improved robustness of signature-based near-replica detection via lexicon randomization
-
A. Kolcz, A. Chowdhury, and J. Alspector. Improved robustness of signature-based near-replica detection via lexicon randomization. In KDD, p. 605-610, 2004.
-
(2004)
KDD
, pp. 605-610
-
-
Kolcz, A.1
Chowdhury, A.2
Alspector, J.3
-
23
-
-
84955245129
-
Multi-probe LSH: Efficient indexing for high-dimensional similarity search
-
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In VLDB, p. 950-961, 2007.
-
(2007)
VLDB
, pp. 950-961
-
-
Lv, Q.1
Josephson, W.2
Wang, Z.3
Charikar, M.4
Li, K.5
-
24
-
-
85043988965
-
Finding similar files in a large file system
-
U. Manber. Finding similar files in a large file system. In WTEC, p. 2, 1994.
-
(1994)
WTEC
, pp. 2
-
-
Manber, U.1
-
25
-
-
35348911985
-
Detecting near-duplicates for Web crawling
-
G. S. Manku, A. Jain, and A. D. Sarma. Detecting near-duplicates for Web crawling. In WWW, p. 141-150, 2007.
-
(2007)
, pp. 141-150
-
-
Manku, G.S.1
Jain, A.2
Sarma, A.D.3
-
26
-
-
79960290151
-
-
Web Sociologist's Workbench: http://dbpubs.stanford.edu/~testbed/doc2/ WabBase/SGERHighlight.pdf
-
Workbench
-
-
Sociologist's, W.1
-
28
-
-
0013454721
-
Finding near-replicas of documents and servers on the Web
-
N. Shivakumar and H. Garcia-Molina. Finding near-replicas of documents and servers on the Web. In WebDB, p. 204-212, 1998.
-
(1998)
WebDB
, pp. 204-212
-
-
Shivakumar, N.1
Garcia-Molina, H.2
-
29
-
-
85158080410
-
Clustering with instance-level constraints
-
K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In AAAI/IAAI, p. 1097, 2000.
-
(2000)
AAAI/IAAI
, pp. 1097
-
-
Wagstaff, K.1
Cardie, C.2
-
30
-
-
33750311279
-
Near-duplicate detection by instance-level constrained clustering
-
H. Yang and J. P. Callan. Near-duplicate detection by instance-level constrained clustering. In SIGIR, p. 421-428, 2006.
-
(2006)
SIGIR
, pp. 421-428
-
-
Yang, H.1
Callan, J.P.2
|