-
1
-
-
33646126481
-
A scalable system for identifying co-derivative documents
-
BERNSTEIN, Y. and ZOBEL, J. (2004): A scalable system for identifying co-derivative documents, Proc. of SPIRE '04.
-
(2004)
Proc. of SPIRE '04
-
-
Bernstein, Y.1
Zobel, J.2
-
3
-
-
4944224800
-
Identifying and filtering near-duplicate documents
-
BRODER, A. (2000): Identifying and filtering near-duplicate documents, Proc. of COM '00.
-
(2000)
Proc. of COM '00
-
-
Broder, A.1
-
4
-
-
34548706568
-
Indexing shared content in information retrieval systems
-
BRODER, A., EIRON, N., FONTOURA, M., HERSCOVICI, M., LEMPEL, R., MCPHERSON, J., QI, R. and SHEKITA, E. (2006): Indexing Shared Content in Information Retrieval Systems, Proc. of EDBT '06.
-
(2006)
Proc. of EDBT '06
-
-
Broder, A.1
Eiron, N.2
Fontoura, M.3
Herscovici, M.4
Lempel, R.5
McPherson, J.6
Qi, R.7
Shekita, E.8
-
5
-
-
0037844312
-
Similarity estimation techniques from rounding algorithms
-
CHARIKAR, M. (2002): Similarity Estimation Techniques from Rounding Algorithms, Proc. of STOC '02.
-
(2002)
Proc. of STOC '02
-
-
Charikar, M.1
-
6
-
-
0013206133
-
Collection statistics for fast duplicate document detection
-
CHOWDHURY, A., FRIEDER, O., GROSSMAN, D. and MCCABE, M. (2002): Collection statistics for fast duplicate document detection, ACM Trans. Inf. Syst.,20.
-
(2002)
ACM Trans. Inf. Syst.
, vol.20
-
-
Chowdhury, A.1
Frieder, O.2
Grossman, D.3
McCabe, M.4
-
7
-
-
12244271239
-
Online duplicate document detection: Signature reliability in a dynamic retrieval environment
-
CONRAD, J., GUO, X. and SCHRIBER, C. (2003): Online duplicate document detection: signature reliability in a dynamic retrieval environment, Proc. of CIKM '03.
-
(2003)
Proc. of CIKM '03
-
-
Conrad, J.1
Guo, X.2
Schriber, C.3
-
8
-
-
8644227073
-
Constructing a text corpus for inexact duplicate detection
-
CONRAD, J. and SCHRIBER, C. (2004): Constructing a text corpus for inexact duplicate detection, Proc. of SIGIR '04.
-
(2004)
Proc. of SIGIR '04
-
-
Conrad, J.1
Schriber, C.2
-
9
-
-
4544259509
-
Locality-sensitive hashing scheme based on p-stable distributions
-
DATAR, M., IMMORLICA, N., INDYK, P. and MIRROKNI, V. (2004): Locality-Sensitive Hashing Scheme Based on p-Stable Distributions, Proc. of SCG '04.
-
(2004)
Proc. of SCG '04
-
-
Datar, M.1
Immorlica, N.2
Indyk, P.3
Mirrokni, V.4
-
13
-
-
33750296887
-
Finding near-duplicate web pages: A large-scale evaluation of algorithms
-
HENZINGER, M. (2006): Finding Near-Duplicate Web Pages: a Large-Scale Evaluation of Algorithms, Proc. of SIGIR '06.
-
(2006)
Proc. of SIGIR '06
-
-
Henzinger, M.1
-
14
-
-
0037319544
-
Methods for identifying versioned and plagiarised documents
-
HOAD, T. and ZOBEL, J. (2003): Methods for Identifying Versioned and Plagiarised Documents, Jour. of ASIST, 54.
-
(2003)
Jour. of ASIST
, vol.54
-
-
Hoad, T.1
Zobel, J.2
-
15
-
-
0001907042
-
Approximate nearest neighbor-towards removing the curse of dimensionality
-
INDYK, P. and MOTWANI, R. (1998): Approximate Nearest Neighbor-Towards Removing the Curse of Dimensionality, Proc. of STOC '98.
-
(1998)
Proc. of STOC '98
-
-
Indyk, P.1
Motwani, R.2
-
16
-
-
12244261882
-
Improved robustness of signature-based near-replica detection via lexicon randomization
-
KOŁCZ, A., CHOWDHURY, A. and ALSPECTOR, J. (2004): Improved robustness of signature-based near-replica detection via lexicon randomization, Proc. of KDD '04.
-
(2004)
Proc. of KDD '04
-
-
KoŁcz, A.1
Chowdhury, A.2
Alspector, J.3
-
17
-
-
85043988965
-
Finding similar files in a large file system
-
MANBER, U. (1994): Finding similar files in a large file system, Proc. of USENIX-TC '94.
-
(1994)
Proc. of USENIX-TC '94
-
-
Manber, U.1
-
19
-
-
36448989077
-
Fuzzy-fingerprints for text-based information retrieval
-
STEIN, B. (2005): Fuzzy-Fingerprints for Text-based Information Retrieval, Proc. of I-KNOW '05.
-
(2005)
Proc. of I-KNOW '05
-
-
Stein, B.1
-
20
-
-
36448954599
-
Principles of hash-based text retrieval
-
STEIN, B. (2007): Principles of Hash-based Text Retrieval, Proc. of SIGIR '07.
-
(2007)
Proc. of SIGIR '07
-
-
Stein, B.1
-
21
-
-
0000681228
-
A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces
-
WEBER, R., SCHEK, H. and BLOTT, S. (1998): A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces, Proc. of VLDB '98.
-
(1998)
Proc. of VLDB '98
-
-
Weber, R.1
Schek, H.2
Blott, S.3
-
22
-
-
84879583318
-
A systematic study of parameter correlations in large scale duplicate document detection
-
YE, S., WEN, J. and MA, W. (2006): A Systematic Study of Parameter Correlations in Large Scale Duplicate Document Detection, Proc. of PAKDD '06.
-
(2006)
Proc. of PAKDD '06
-
-
Ye, S.1
Wen, J.2
Ma, W.3
-
23
-
-
84879585107
-
The case of the duplicate documents: Measurement, search, and science
-
ZOBEL, J. and BERNSTEIN, Y. (2006): The case of the duplicate documents: Measurement, search, and science, Proc. of APWeb '06.
-
(2006)
Proc. of APWeb '06
-
-
Zobel, J.1
Bernstein, Y.2
|