SCOPUS 정보 검색 플랫폼

SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Volumn , Issue , 2010, Pages 419-426

Adaptive near-duplicate detection via similarity learning

(3) Hajishirzi, Hannaneh a Yih, Wen Tau b Kołcz, Aleksander b

a UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

b MICROSOFT RESEARCH (United States)

Author keywords

Near duplicate detection; Similarity learning; Spam detection

Indexed keywords

COMMONLY USED; COSINE SIMILARITY; DUPLICATE DOCUMENT DETECTION; EMAIL MESSAGES; EXISTING METHOD; JACCARD COEFFICIENTS; LOCALITY SENSITIVE HASHING; NEAR-DUPLICATE DETECTION; NEWS ARTICLES; SIMILARITY COMPUTATION; SIMILARITY FUNCTIONS; SIMILARITY LEARNING; SIMILARITY MEASURE; SPAM DETECTION; TARGET DOMAIN;

BUILDING MATERIALS; INFORMATION RETRIEVAL;

INTERNET;

EID: 77956039068 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1835449.1835520 Document Type: Conference Paper

Times cited : (62)

References (21)

1
- 37549058056
- Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
- A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51(1):117-122, 2008.
- (2008) Communications of the ACM , vol.51 , Issue.1 , pp. 117-122
- Andoni, A.¹ Indyk, P.²

2
- 79956075292
- Identifying and filtering near-duplicate documents
- Springer-Verlag
- A. Z. Broder. Identifying and filtering near-duplicate documents. In COM '00, pages 1-10. Springer-Verlag, 2000.
- (2000) COM '00 , pp. 1-10
- Broder, A.Z.¹

3
- 0034207121
- Min-wise independent permutations
- A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. Journal of Computer and System Sciences, 60:630-659, 2000.
- (2000) Journal of Computer and System Sciences , vol.60 , pp. 630-659
- Broder, A.Z.¹ Charikar, M.² Frieze, A.M.³ Mitzenmacher, M.⁴

4
- 0010362121
- Syntactic clustering of the web
- A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Comput. Netw. ISDN Syst., 29(8-13):1157-1166, 1997.
- (1997) Comput. Netw. ISDN Syst. , vol.29 , Issue.8-13 , pp. 1157-1166
- Broder, A.Z.¹ Glassman, S.C.² Manasse, M.S.³ Zweig, G.⁴

5
- 0036040277
- Similarity estimation techniques from rounding algorithms
- M. S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing, 2002.
- (2002) Proceedings of the 34th Annual ACM Symposium on Theory of Computing
- Charikar, M.S.¹

6
- 0013206133
- Collection statistics for fast duplicate document detection
- A. Chowdhury, O. Frieder, D. Grossman, and M. C. McCabe. Collection statistics for fast duplicate document detection. ACM Trans. Inf. Syst., 20(2):171-191, 2002.
- (2002) ACM Trans. Inf. Syst. , vol.20 , Issue.2 , pp. 171-191
- Chowdhury, A.¹ Frieder, O.² Grossman, D.³ McCabe, M.C.⁴

7
- 84945137687
- On the evolution of clusters of near-duplicate web pages
- D. Fetterly, M. Manasse, and M. Najork. On the evolution of clusters of near-duplicate web pages. In LA-WEB '03, 2003.
- (2003) LA-WEB '03
- Fetterly, D.¹ Manasse, M.² Najork, M.³

8
- 15044355327
- Similarity search in high dimensions via hashing
- A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB '99, 1999.
- (1999) VLDB '99
- Gionis, A.¹ Indyk, P.² Motwani, R.³

9
- 49149115880
- A countermeasure to duplicate-detecting anti-spam techniques
- R. J. Hall. A countermeasure to duplicate-detecting anti-spam techniques. Technical report, AT&T, 1999.
- (1999) Technical Report, AT&T
- Hall, R.J.¹

10
- 33750296887
- Finding near-duplicate web pages: A large-scale evaluation of algorithms
- New York, NY, USA ACM
- M. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR '06, pages 284-291, New York, NY, USA, 2006. ACM.
- (2006) SIGIR '06 , pp. 284-291
- Henzinger, M.¹

11
- 0037319544
- Methods for identifying versioned and plagiarized documents
- T. C. Hoad and J. Zobel. Methods for identifying versioned and plagiarized documents. Journal of the American Society for Information Science and Technology, 54(3):203-215, 2003.
- (2003) Journal of the American Society for Information Science and Technology , vol.54 , Issue.3 , pp. 203-215
- Hoad, T.C.¹ Zobel, J.²

12
- 0031644241
- Approximate nearest neighbors: Towards removing the curse of dimensionality
- P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of 30th STOC, pages 604-613, 1998.
- (1998) Proceedings of 30th STOC , pp. 604-613
- Indyk, P.¹ Motwani, R.²

13
- 84904813043
- Hardening fingerprinting by context
- A. Kolcz and A. Chowdhury. Hardening fingerprinting by context. In CEAS '07, 2007.
- (2007) CEAS '07
- Kolcz, A.¹ Chowdhury, A.²

14
- 49149127990
- Lexicon randomization for near-duplicate detection with I-match
- A. Kolcz and A. Chowdhury. Lexicon randomization for near-duplicate detection with I-match. Journal of Supercomputing, 45(3):255-276, 2008.
- (2008) Journal of Supercomputing , vol.45 , Issue.3 , pp. 255-276
- Kolcz, A.¹ Chowdhury, A.²

15
- 12244261882
- Improved robustness of signature-based near-replica detection via lexicon randomization
- A. Kolcz, A. Chowdhury, and J. Alspector. Improved robustness of signature-based near-replica detection via lexicon randomization. In KDD '04, 2004.
- (2004) KDD '04
- Kolcz, A.¹ Chowdhury, A.² Alspector, J.³

16
- 65449142381
- Good word attacks on statistical spam filters
- D. Lowd and C. Meek. Good word attacks on statistical spam filters. In CEAS'05, 2005.
- (2005) CEAS'05
- Lowd, D.¹ Meek, C.²

17
- 35348911985
- Detecting near-duplicates for web crawling
- G. S. Manku, A. Jain, and A. Das Sarma. Detecting near-duplicates for web crawling. In WWW '07, 2007.
- (2007) WWW '07
- Manku, G.S.¹ Jain, A.² Das Sarma, A.³

18
- 85009805214
- Fighting spam with reputation systems
- V. V. Prakash and A. O'Donnell. Fighting spam with reputation systems. Queue, 3(9):36-41, 2005.
- (2005) Queue , vol.3 , Issue.9 , pp. 36-41
- Prakash, V.V.¹ O'Donnell, A.²

19
- 0003676885
- Fingerprinting by random polynomials
- M. Rabin. Fingerprinting by random polynomials. Report TR-1581, Harvard University, 1981.
- (1981) Report TR-1581, Harvard University
- Rabin, M.¹

20
- 57349131623
- Spotsigs: Robust and efficient near duplicate detection in large web collections
- M. Theobald, J. Siddharth, and A. Paepcke. Spotsigs: robust and efficient near duplicate detection in large web collections. In SIGIR '08, pages 563-570, 2008.
- (2008) SIGIR '08 , pp. 563-570
- Theobald, M.¹ Siddharth, J.² Paepcke, A.³

21
- 79957966387
- Learning term-weighting functions for similarity measures
- W. Yih. Learning term-weighting functions for similarity measures. In Proc. of EMNLP-09, 2009.
- (2009) Proc. of EMNLP-09
- Yih, W.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.