|
Volumn , Issue , 2006, Pages 1015-1016
|
Do not crawl in the DUST: Different URLs with similar text
|
Author keywords
Duplicates; Mining; Rules; Similarity
|
Indexed keywords
ALGORITHMS;
DATA MINING;
INDEXING (OF INFORMATION);
SAMPLING;
SEARCH ENGINES;
SERVERS;
INTERNET;
MINING;
WEB SERVICES;
WEBSITES;
CRAWL LOGS;
DUPLICATES;
DUSTBUSTER;
WEBSITES;
DUST;
CANONICAL FORM;
DUPLICATES;
EXTENDED ABSTRACTS;
NOVEL ALGORITHM;
PAGERANK;
RULES;
SIMILARITY;
WEB PAGE;
WEB SERVER LOGS;
WEB SERVERS;
|
EID: 34250618783
PISSN: None
EISSN: None
Source Type: Conference Proceeding
DOI: 10.1145/1135777.1135992 Document Type: Conference Paper |
Times cited : (7)
|
References (0)
|