-
1
-
-
37549058056
-
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
-
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51(1):117-122, 2008.
-
(2008)
Communications of the ACM
, vol.51
, Issue.1
, pp. 117-122
-
-
Andoni, A.1
Indyk, P.2
-
2
-
-
79956075292
-
Identifying and filtering near-duplicate documents
-
Springer-Verlag
-
A. Z. Broder. Identifying and filtering near-duplicate documents. In COM '00, pages 1-10. Springer-Verlag, 2000.
-
(2000)
COM '00
, pp. 1-10
-
-
Broder, A.Z.1
-
3
-
-
0034207121
-
Min-wise independent permutations
-
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. Journal of Computer and System Sciences, 60:630-659, 2000.
-
(2000)
Journal of Computer and System Sciences
, vol.60
, pp. 630-659
-
-
Broder, A.Z.1
Charikar, M.2
Frieze, A.M.3
Mitzenmacher, M.4
-
4
-
-
0010362121
-
Syntactic clustering of the web
-
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Comput. Netw. ISDN Syst., 29(8-13):1157-1166, 1997.
-
(1997)
Comput. Netw. ISDN Syst.
, vol.29
, Issue.8-13
, pp. 1157-1166
-
-
Broder, A.Z.1
Glassman, S.C.2
Manasse, M.S.3
Zweig, G.4
-
6
-
-
0013206133
-
Collection statistics for fast duplicate document detection
-
A. Chowdhury, O. Frieder, D. Grossman, and M. C. McCabe. Collection statistics for fast duplicate document detection. ACM Trans. Inf. Syst., 20(2):171-191, 2002.
-
(2002)
ACM Trans. Inf. Syst.
, vol.20
, Issue.2
, pp. 171-191
-
-
Chowdhury, A.1
Frieder, O.2
Grossman, D.3
McCabe, M.C.4
-
7
-
-
84945137687
-
On the evolution of clusters of near-duplicate web pages
-
D. Fetterly, M. Manasse, and M. Najork. On the evolution of clusters of near-duplicate web pages. In LA-WEB '03, 2003.
-
(2003)
LA-WEB '03
-
-
Fetterly, D.1
Manasse, M.2
Najork, M.3
-
8
-
-
15044355327
-
Similarity search in high dimensions via hashing
-
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB '99, 1999.
-
(1999)
VLDB '99
-
-
Gionis, A.1
Indyk, P.2
Motwani, R.3
-
9
-
-
49149115880
-
A countermeasure to duplicate-detecting anti-spam techniques
-
R. J. Hall. A countermeasure to duplicate-detecting anti-spam techniques. Technical report, AT&T, 1999.
-
(1999)
Technical Report, AT&T
-
-
Hall, R.J.1
-
10
-
-
33750296887
-
Finding near-duplicate web pages: A large-scale evaluation of algorithms
-
New York, NY, USA ACM
-
M. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR '06, pages 284-291, New York, NY, USA, 2006. ACM.
-
(2006)
SIGIR '06
, pp. 284-291
-
-
Henzinger, M.1
-
12
-
-
0031644241
-
Approximate nearest neighbors: Towards removing the curse of dimensionality
-
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of 30th STOC, pages 604-613, 1998.
-
(1998)
Proceedings of 30th STOC
, pp. 604-613
-
-
Indyk, P.1
Motwani, R.2
-
13
-
-
84904813043
-
Hardening fingerprinting by context
-
A. Kolcz and A. Chowdhury. Hardening fingerprinting by context. In CEAS '07, 2007.
-
(2007)
CEAS '07
-
-
Kolcz, A.1
Chowdhury, A.2
-
14
-
-
49149127990
-
Lexicon randomization for near-duplicate detection with I-match
-
A. Kolcz and A. Chowdhury. Lexicon randomization for near-duplicate detection with I-match. Journal of Supercomputing, 45(3):255-276, 2008.
-
(2008)
Journal of Supercomputing
, vol.45
, Issue.3
, pp. 255-276
-
-
Kolcz, A.1
Chowdhury, A.2
-
15
-
-
12244261882
-
Improved robustness of signature-based near-replica detection via lexicon randomization
-
A. Kolcz, A. Chowdhury, and J. Alspector. Improved robustness of signature-based near-replica detection via lexicon randomization. In KDD '04, 2004.
-
(2004)
KDD '04
-
-
Kolcz, A.1
Chowdhury, A.2
Alspector, J.3
-
16
-
-
65449142381
-
Good word attacks on statistical spam filters
-
D. Lowd and C. Meek. Good word attacks on statistical spam filters. In CEAS'05, 2005.
-
(2005)
CEAS'05
-
-
Lowd, D.1
Meek, C.2
-
18
-
-
85009805214
-
Fighting spam with reputation systems
-
V. V. Prakash and A. O'Donnell. Fighting spam with reputation systems. Queue, 3(9):36-41, 2005.
-
(2005)
Queue
, vol.3
, Issue.9
, pp. 36-41
-
-
Prakash, V.V.1
O'Donnell, A.2
-
20
-
-
57349131623
-
Spotsigs: Robust and efficient near duplicate detection in large web collections
-
M. Theobald, J. Siddharth, and A. Paepcke. Spotsigs: robust and efficient near duplicate detection in large web collections. In SIGIR '08, pages 563-570, 2008.
-
(2008)
SIGIR '08
, pp. 563-570
-
-
Theobald, M.1
Siddharth, J.2
Paepcke, A.3
-
21
-
-
79957966387
-
Learning term-weighting functions for similarity measures
-
W. Yih. Learning term-weighting functions for similarity measures. In Proc. of EMNLP-09, 2009.
-
(2009)
Proc. of EMNLP-09
-
-
Yih, W.1
|