메뉴 건너뛰기




Volumn , Issue , 2010, Pages 419-426

Adaptive near-duplicate detection via similarity learning

Author keywords

Near duplicate detection; Similarity learning; Spam detection

Indexed keywords

COMMONLY USED; COSINE SIMILARITY; DUPLICATE DOCUMENT DETECTION; EMAIL MESSAGES; EXISTING METHOD; JACCARD COEFFICIENTS; LOCALITY SENSITIVE HASHING; NEAR-DUPLICATE DETECTION; NEWS ARTICLES; SIMILARITY COMPUTATION; SIMILARITY FUNCTIONS; SIMILARITY LEARNING; SIMILARITY MEASURE; SPAM DETECTION; TARGET DOMAIN;

EID: 77956039068     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1835449.1835520     Document Type: Conference Paper
Times cited : (62)

References (21)
  • 1
    • 37549058056 scopus 로고    scopus 로고
    • Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
    • A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51(1):117-122, 2008.
    • (2008) Communications of the ACM , vol.51 , Issue.1 , pp. 117-122
    • Andoni, A.1    Indyk, P.2
  • 2
    • 79956075292 scopus 로고    scopus 로고
    • Identifying and filtering near-duplicate documents
    • Springer-Verlag
    • A. Z. Broder. Identifying and filtering near-duplicate documents. In COM '00, pages 1-10. Springer-Verlag, 2000.
    • (2000) COM '00 , pp. 1-10
    • Broder, A.Z.1
  • 6
  • 7
    • 84945137687 scopus 로고    scopus 로고
    • On the evolution of clusters of near-duplicate web pages
    • D. Fetterly, M. Manasse, and M. Najork. On the evolution of clusters of near-duplicate web pages. In LA-WEB '03, 2003.
    • (2003) LA-WEB '03
    • Fetterly, D.1    Manasse, M.2    Najork, M.3
  • 8
    • 15044355327 scopus 로고    scopus 로고
    • Similarity search in high dimensions via hashing
    • A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB '99, 1999.
    • (1999) VLDB '99
    • Gionis, A.1    Indyk, P.2    Motwani, R.3
  • 9
    • 49149115880 scopus 로고    scopus 로고
    • A countermeasure to duplicate-detecting anti-spam techniques
    • R. J. Hall. A countermeasure to duplicate-detecting anti-spam techniques. Technical report, AT&T, 1999.
    • (1999) Technical Report, AT&T
    • Hall, R.J.1
  • 10
    • 33750296887 scopus 로고    scopus 로고
    • Finding near-duplicate web pages: A large-scale evaluation of algorithms
    • New York, NY, USA ACM
    • M. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR '06, pages 284-291, New York, NY, USA, 2006. ACM.
    • (2006) SIGIR '06 , pp. 284-291
    • Henzinger, M.1
  • 12
    • 0031644241 scopus 로고    scopus 로고
    • Approximate nearest neighbors: Towards removing the curse of dimensionality
    • P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of 30th STOC, pages 604-613, 1998.
    • (1998) Proceedings of 30th STOC , pp. 604-613
    • Indyk, P.1    Motwani, R.2
  • 13
    • 84904813043 scopus 로고    scopus 로고
    • Hardening fingerprinting by context
    • A. Kolcz and A. Chowdhury. Hardening fingerprinting by context. In CEAS '07, 2007.
    • (2007) CEAS '07
    • Kolcz, A.1    Chowdhury, A.2
  • 14
    • 49149127990 scopus 로고    scopus 로고
    • Lexicon randomization for near-duplicate detection with I-match
    • A. Kolcz and A. Chowdhury. Lexicon randomization for near-duplicate detection with I-match. Journal of Supercomputing, 45(3):255-276, 2008.
    • (2008) Journal of Supercomputing , vol.45 , Issue.3 , pp. 255-276
    • Kolcz, A.1    Chowdhury, A.2
  • 15
    • 12244261882 scopus 로고    scopus 로고
    • Improved robustness of signature-based near-replica detection via lexicon randomization
    • A. Kolcz, A. Chowdhury, and J. Alspector. Improved robustness of signature-based near-replica detection via lexicon randomization. In KDD '04, 2004.
    • (2004) KDD '04
    • Kolcz, A.1    Chowdhury, A.2    Alspector, J.3
  • 16
    • 65449142381 scopus 로고    scopus 로고
    • Good word attacks on statistical spam filters
    • D. Lowd and C. Meek. Good word attacks on statistical spam filters. In CEAS'05, 2005.
    • (2005) CEAS'05
    • Lowd, D.1    Meek, C.2
  • 17
    • 35348911985 scopus 로고    scopus 로고
    • Detecting near-duplicates for web crawling
    • G. S. Manku, A. Jain, and A. Das Sarma. Detecting near-duplicates for web crawling. In WWW '07, 2007.
    • (2007) WWW '07
    • Manku, G.S.1    Jain, A.2    Das Sarma, A.3
  • 18
    • 85009805214 scopus 로고    scopus 로고
    • Fighting spam with reputation systems
    • V. V. Prakash and A. O'Donnell. Fighting spam with reputation systems. Queue, 3(9):36-41, 2005.
    • (2005) Queue , vol.3 , Issue.9 , pp. 36-41
    • Prakash, V.V.1    O'Donnell, A.2
  • 20
    • 57349131623 scopus 로고    scopus 로고
    • Spotsigs: Robust and efficient near duplicate detection in large web collections
    • M. Theobald, J. Siddharth, and A. Paepcke. Spotsigs: robust and efficient near duplicate detection in large web collections. In SIGIR '08, pages 563-570, 2008.
    • (2008) SIGIR '08 , pp. 563-570
    • Theobald, M.1    Siddharth, J.2    Paepcke, A.3
  • 21
    • 79957966387 scopus 로고    scopus 로고
    • Learning term-weighting functions for similarity measures
    • W. Yih. Learning term-weighting functions for similarity measures. In Proc. of EMNLP-09, 2009.
    • (2009) Proc. of EMNLP-09
    • Yih, W.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.