메뉴 건너뛰기




Volumn , Issue , 2008, Pages 186-195

De-duping URLs via rewrite rules

Author keywords

De duping; Rewrite rules; URL normalization

Indexed keywords

CANONICAL FORMS; DE-DUPING; EFFICIENT ALGORITHMS; LARGE-SCALE EXPERIMENTS; PRINCIPAL FUNCTIONS; REWRITE RULES; TRANSFORMATION RULES; URL NORMALIZATION;

EID: 65449167674     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1401890.1401917     Document Type: Conference Paper
Times cited : (26)

References (17)
  • 2
    • 84961379302 scopus 로고
    • Finding patterns common to a set of strings
    • D. Angluin. Finding patterns common to a set of strings. In Proc. 11th STOC, pages 130-141, 1979.
    • (1979) Proc. 11th STOC , pp. 130-141
    • Angluin, D.1
  • 3
    • 84976832596 scopus 로고
    • Inference of reversible languages
    • D. Angluin. Inference of reversible languages. J. ACM, 29(3):741-765, 1982.
    • (1982) J. ACM , vol.29 , Issue.3 , pp. 741-765
    • Angluin, D.1
  • 4
    • 0020815483 scopus 로고
    • Inductive inference: Theory and methods
    • D. Angluin and C. H. Smith. Inductive inference: Theory and methods. ACM Comput. Surv., 15(3):237-269, 1983.
    • (1983) ACM Comput. Surv , vol.15 , Issue.3 , pp. 237-269
    • Angluin, D.1    Smith, C.H.2
  • 5
    • 35348921241 scopus 로고    scopus 로고
    • Do not crawl in the DUST: Different URLs with similar text
    • Z. Bar-Yossef, I. Keidar, and U. Schonfeld. Do not crawl in the DUST: Different URLs with similar text. In Proc. 16th WWW, pages 111-120, 2007.
    • (2007) Proc. 16th WWW , pp. 111-120
    • Bar-Yossef, Z.1    Keidar, I.2    Schonfeld, U.3
  • 7
    • 0002698813 scopus 로고    scopus 로고
    • On the resemblance and containment of documents
    • A. Broder. On the resemblance and containment of documents. In SEQS: Sequences '91, 1998.
    • (1998) SEQS: Sequences '91
    • Broder, A.1
  • 9
    • 0036040277 scopus 로고    scopus 로고
    • Similarity estimation techniques from rounding algorithms
    • M. Charikar. Similarity estimation techniques from rounding algorithms. In Proc. 34th STOC, pages 380-388, 2002.
    • (2002) Proc. 34th STOC , pp. 380-388
    • Charikar, M.1
  • 10
    • 26444550791 scopus 로고    scopus 로고
    • Robust idenfication of fuzzy duplicates
    • S. Chaudhuri, V. Ganti, and R. Motwani. Robust idenfication of fuzzy duplicates. In Proc. 21st ICDE, pages 865-876, 2005.
    • (2005) Proc. 21st ICDE , pp. 865-876
    • Chaudhuri, S.1    Ganti, V.2    Motwani, R.3
  • 11
  • 13
    • 34547625209 scopus 로고    scopus 로고
    • Pair-wise entity resolution: Overview and challenges
    • H. Garcia-Molina. Pair-wise entity resolution: Overview and challenges. In Proc. CIKM, 2006.
    • (2006) Proc. CIKM
    • Garcia-Molina, H.1
  • 14
    • 33750296887 scopus 로고    scopus 로고
    • Finding near-duplicate web pages: A large-scale evaluation of algorithms
    • M. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In Proc. 29th SIGIR, pages 284-291, 2006.
    • (2006) Proc. 29th SIGIR , pp. 284-291
    • Henzinger, M.1
  • 15
    • 35348911985 scopus 로고    scopus 로고
    • Detecting near-duplicates for web crawling
    • G. S. Manku, A. Jain, and A. D. Sarma. Detecting near-duplicates for web crawling. In Proc. 16th WWW, pages 141-150, 2007.
    • (2007) Proc. 16th WWW , pp. 141-150
    • Manku, G.S.1    Jain, A.2    Sarma, A.D.3
  • 16
    • 65449165922 scopus 로고    scopus 로고
    • M. Najork. Systems and methods for inferring uniform resource locator (URL) normalization rules, 2006. US Patent Application Publication, 2006/0218143
    • M. Najork. Systems and methods for inferring uniform resource locator (URL) normalization rules, 2006. US Patent Application Publication, 2006/0218143.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.