SCOPUS 정보 검색 플랫폼

Studies in Classification, Data Analysis, and Knowledge Organization

Volumn , Issue , 2008, Pages 601-609

New issues in near-duplicate detection

(2) Potthast, Martin a Stein, Benno a

a BAUHAUS UNIVERSITY WEIMAR (Germany)

Author keywords

[No Author keywords available]

Indexed keywords

CLASSIFICATION (OF INFORMATION); DATA HANDLING; DIGITAL STORAGE; MACHINE LEARNING;

ANALYSIS AND EVALUATION; CORPUS LINGUISTICS; NEAR-DUPLICATE DETECTION; RETRIEVAL PROPERTIES; STATE-OF-THE-ART ALGORITHMS; STORAGE MAINTENANCE; WEB RETRIEVAL;

INFORMATION RETRIEVAL;

EID: 84867666907 PISSN: 14318814 EISSN: None Source Type: Conference Proceeding
DOI: 10.1007/978-3-540-78246-9_71 Document Type: Conference Paper

Times cited : (13)

References (23)

1
- 33646126481
- A scalable system for identifying co-derivative documents
- BERNSTEIN, Y. and ZOBEL, J. (2004): A scalable system for identifying co-derivative documents, Proc. of SPIRE '04.
- (2004) Proc. of SPIRE '04
- Bernstein, Y.¹ Zobel, J.²

2
- 84976810280
- Copy detection mechanisms for digital documents
- BRIN, S., DAVIS, J. and GARCIA-MOLINA, H. (1995): Copy detection mechanisms for digital documents, Proc. of SIGMOD '95.
- (1995) Proc. of SIGMOD '95
- Brin, S.¹ Davis, J.² Garcia-Molina, H.³

3
- 4944224800
- Identifying and filtering near-duplicate documents
- BRODER, A. (2000): Identifying and filtering near-duplicate documents, Proc. of COM '00.
- (2000) Proc. of COM '00
- Broder, A.¹

4
- 34548706568
- Indexing shared content in information retrieval systems
- BRODER, A., EIRON, N., FONTOURA, M., HERSCOVICI, M., LEMPEL, R., MCPHERSON, J., QI, R. and SHEKITA, E. (2006): Indexing Shared Content in Information Retrieval Systems, Proc. of EDBT '06.
- (2006) Proc. of EDBT '06
- Broder, A.¹ Eiron, N.² Fontoura, M.³ Herscovici, M.⁴ Lempel, R.⁵ McPherson, J.⁶ Qi, R.⁷ Shekita, E.⁸

5
- 0037844312
- Similarity estimation techniques from rounding algorithms
- CHARIKAR, M. (2002): Similarity Estimation Techniques from Rounding Algorithms, Proc. of STOC '02.
- (2002) Proc. of STOC '02
- Charikar, M.¹

6
- 0013206133
- Collection statistics for fast duplicate document detection
- CHOWDHURY, A., FRIEDER, O., GROSSMAN, D. and MCCABE, M. (2002): Collection statistics for fast duplicate document detection, ACM Trans. Inf. Syst.,20.
- (2002) ACM Trans. Inf. Syst. , vol.20
- Chowdhury, A.¹ Frieder, O.² Grossman, D.³ McCabe, M.⁴

7
- 12244271239
- Online duplicate document detection: Signature reliability in a dynamic retrieval environment
- CONRAD, J., GUO, X. and SCHRIBER, C. (2003): Online duplicate document detection: signature reliability in a dynamic retrieval environment, Proc. of CIKM '03.
- (2003) Proc. of CIKM '03
- Conrad, J.¹ Guo, X.² Schriber, C.³

8
- 8644227073
- Constructing a text corpus for inexact duplicate detection
- CONRAD, J. and SCHRIBER, C. (2004): Constructing a text corpus for inexact duplicate detection, Proc. of SIGIR '04.
- (2004) Proc. of SIGIR '04
- Conrad, J.¹ Schriber, C.²

9
- 4544259509
- Locality-sensitive hashing scheme based on p-stable distributions
- DATAR, M., IMMORLICA, N., INDYK, P. and MIRROKNI, V. (2004): Locality-Sensitive Hashing Scheme Based on p-Stable Distributions, Proc. of SCG '04.
- (2004) Proc. of SCG '04
- Datar, M.¹ Immorlica, N.² Indyk, P.³ Mirrokni, V.⁴

10
- 84945137687
- On the evolution of clusters of near-duplicate web pages
- FETTERLY, D., MANASSE, M. and NAJORK, M. (2003): On the Evolution of Clusters of Near-Duplicate Web Pages, Proc. of LA-WEB '03.
- (2003) Proc. of LA-WEB '03
- Fetterly, D.¹ Manasse, M.² Najork, M.³

11
- 32344441912
- Finding similar files in large document repositories
- FORMAN, G., ESHGHI, K. and CHIOCCHETTI, S. (2005): Finding similar files in large document repositories, Proc. of KDD '05.
- (2005) Proc. of KDD '05
- Forman, G.¹ Eshghi, K.² Chiocchetti, S.³

12
- 0013207911
- Scalable document fingerprinting
- HEINTZE, N. (1996): Scalable document fingerprinting, Proc. of USENIX-EC '96.
- (1996) Proc. of USENIX-EC '96
- Heintze, N.¹

13
- 33750296887
- Finding near-duplicate web pages: A large-scale evaluation of algorithms
- HENZINGER, M. (2006): Finding Near-Duplicate Web Pages: a Large-Scale Evaluation of Algorithms, Proc. of SIGIR '06.
- (2006) Proc. of SIGIR '06
- Henzinger, M.¹

14
- 0037319544
- Methods for identifying versioned and plagiarised documents
- HOAD, T. and ZOBEL, J. (2003): Methods for Identifying Versioned and Plagiarised Documents, Jour. of ASIST, 54.
- (2003) Jour. of ASIST , vol.54
- Hoad, T.¹ Zobel, J.²

15
- 0001907042
- Approximate nearest neighbor-towards removing the curse of dimensionality
- INDYK, P. and MOTWANI, R. (1998): Approximate Nearest Neighbor-Towards Removing the Curse of Dimensionality, Proc. of STOC '98.
- (1998) Proc. of STOC '98
- Indyk, P.¹ Motwani, R.²

16
- 12244261882
- Improved robustness of signature-based near-replica detection via lexicon randomization
- KOŁCZ, A., CHOWDHURY, A. and ALSPECTOR, J. (2004): Improved robustness of signature-based near-replica detection via lexicon randomization, Proc. of KDD '04.
- (2004) Proc. of KDD '04
- KoŁcz, A.¹ Chowdhury, A.² Alspector, J.³

17
- 85043988965
- Finding similar files in a large file system
- MANBER, U. (1994): Finding similar files in a large file system, Proc. of USENIX-TC '94.
- (1994) Proc. of USENIX-TC '94
- Manber, U.¹

18
- 1142267351
- Winnowing: Local algorithms for document fingerprinting
- SCHLEIMER, S., WILKERSON, D. and AIKEN, A. (2003): Winnowing: local algorithms for document fingerprinting, Proc. of SIGMOD '03.
- (2003) Proc. of SIGMOD '03
- Schleimer, S.¹ Wilkerson, D.² Aiken, A.³

19
- 36448989077
- Fuzzy-fingerprints for text-based information retrieval
- STEIN, B. (2005): Fuzzy-Fingerprints for Text-based Information Retrieval, Proc. of I-KNOW '05.
- (2005) Proc. of I-KNOW '05
- Stein, B.¹

20
- 36448954599
- Principles of hash-based text retrieval
- STEIN, B. (2007): Principles of Hash-based Text Retrieval, Proc. of SIGIR '07.
- (2007) Proc. of SIGIR '07
- Stein, B.¹

21
- 0000681228
- A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces
- WEBER, R., SCHEK, H. and BLOTT, S. (1998): A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces, Proc. of VLDB '98.
- (1998) Proc. of VLDB '98
- Weber, R.¹ Schek, H.² Blott, S.³

22
- 84879583318
- A systematic study of parameter correlations in large scale duplicate document detection
- YE, S., WEN, J. and MA, W. (2006): A Systematic Study of Parameter Correlations in Large Scale Duplicate Document Detection, Proc. of PAKDD '06.
- (2006) Proc. of PAKDD '06
- Ye, S.¹ Wen, J.² Ma, W.³

23
- 84879585107
- The case of the duplicate documents: Measurement, search, and science
- ZOBEL, J. and BERNSTEIN, Y. (2006): The case of the duplicate documents: Measurement, search, and science, Proc. of APWeb '06.
- (2006) Proc. of APWeb '06
- Zobel, J.¹ Bernstein, Y.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.