|
Volumn , Issue , 2010, Pages 381-390
|
Learning URL patterns for webpage de-duplication
|
Author keywords
Decision trees; Generalization; MapReduce; Page importance; Search engines; Site specific delimiters; Webpage de duplication
|
Indexed keywords
BUILDING BLOCKES;
DELIMITERS;
MACHINE LEARNING TECHNIQUES;
MINE RULES;
RULE EXTRACTION;
SET OF RULES;
SITE-SPECIFIC;
TRANSFORMATION RULES;
WEB SEARCHES;
WEB-PAGE;
DECISION TREES;
INFORMATION RETRIEVAL;
INTERNET;
LEARNING ALGORITHMS;
MINING;
SCALABILITY;
SEARCH ENGINES;
WORLD WIDE WEB;
|
EID: 77950949494
PISSN: None
EISSN: None
Source Type: Conference Proceeding
DOI: 10.1145/1718487.1718535 Document Type: Conference Paper |
Times cited : (41)
|
References (19)
|