-
2
-
-
84861065882
-
Highly efficient algorithms for structural clustering of large websites
-
ACM
-
Lorenzo Blanco, Nilesh Dalvi, and Ashwin Machanavajjhala. Highly efficient algorithms for structural clustering of large websites. In Proceedings of the 20th international conference on Worldwide web, WWW'11, pages 437-446. ACM, 2011.
-
(2011)
Proceedings of the 20th International Conference on Worldwide Web, WWW'11
, pp. 437-446
-
-
Blanco, L.1
Dalvi, N.2
Machanavajjhala, A.3
-
5
-
-
84957632308
-
Measuring structural similarity among web documents: Preliminary results
-
Isabel F. Cruz, Slava Borisov, Michael A. Marks, and Timothy R. Webbs. Measuring structural similarity among web documents: preliminary results. In EP '98: Proceedings of the 7th Int. Conference on Electronic Publishing, Artistic Imaging, and Digital Typography, pages 513-524, 1998.
-
(1998)
EP '98: Proceedings of the 7th Int. Conference on Electronic Publishing, Artistic Imaging, and Digital Typography
, pp. 513-524
-
-
Cruz, I.F.1
Borisov, S.2
Marks, M.A.3
Webbs, T.R.4
-
10
-
-
2442561063
-
A bag of paths model for measuring structural similarity in web documents
-
ACM Press
-
Sachindra Joshi, Neeraj Agrawal, Raghu Krishnapuram, and Sumit Negi. A bag of paths model for measuring structural similarity in web documents. In KDD '03: Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 577-582. ACM Press, 2003.
-
(2003)
KDD '03: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, pp. 577-582
-
-
Joshi, S.1
Agrawal, N.2
Krishnapuram, R.3
Negi, S.4
-
11
-
-
0001116877
-
Binary codes capable of correcting deletions, insertions and reversals
-
VI Levenshtein. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady, 10:707-710, 1966.
-
(1966)
Soviet Physics Doklady
, vol.10
, pp. 707-710
-
-
Levenshtein, V.I.1
-
12
-
-
35348911985
-
Detecting near-duplicates for web crawling
-
ACM
-
Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma. Detecting near-duplicates for web crawling. In Proceedings of the 16th international conference on World Wide Web, WWW'07, pages 141-150. ACM, 2007.
-
(2007)
Proceedings of the 16th International Conference on World Wide Web, WWW'07
, pp. 141-150
-
-
Manku, G.S.1
Jain, A.2
Sarma, A.D.3
-
13
-
-
84950632109
-
Objective criteria for the evaluation of clustering methods
-
William M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846-850, 1971.
-
(1971)
Journal of the American Statistical Association
, vol.66
, Issue.336
, pp. 846-850
-
-
Rand, W.M.1
-
14
-
-
4644340823
-
Automatic web news extraction using tree edit distance
-
ACM Press
-
D. C. Reis, P. B. Golgher, A. S. da Silva, and A. F. Laender. Automatic web news extraction using tree edit distance. In WWW '04: Proceedings of the 13th International Conference on World Wide Web, pages 502-511. ACM Press, 2004.
-
(2004)
WWW '04: Proceedings of the 13th International Conference on World Wide Web
, pp. 502-511
-
-
Reis, D.C.1
Golgher, P.B.2
Da Silva, A.S.3
Laender, A.F.4
-
16
-
-
78649256542
-
A DOM tree alignment model for mining parallel data from the web
-
Association for Computational Linguistics
-
Lei Shi, Cheng Niu, Ming Zhou, and Jianfeng Gao. A DOM tree alignment model for mining parallel data from the web. In ACL '06: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, pages 489-496. Association for Computational Linguistics, 2006.
-
(2006)
ACL '06: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL
, pp. 489-496
-
-
Shi, L.1
Niu, C.2
Zhou, M.3
Gao, J.4
-
18
-
-
57349131623
-
Spotsigs: Robust and efficient near duplicate detection in large web collections
-
New York, NY, USA ACM
-
Martin Theobald, Jonathan Siddharth, and Andreas Paepcke. Spotsigs: robust and efficient near duplicate detection in large web collections. In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 563-570, New York, NY, USA, 2008. ACM.
-
(2008)
SIGIR '08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, pp. 563-570
-
-
Theobald, M.1
Siddharth, J.2
Paepcke, A.3
-
19
-
-
71349086902
-
On finding templates on web collections
-
10.1007/s11280-009-0059-3
-
Karane Vieira, André da Costa Carvalho, Klessius Berlt, Edleno de Moura, Altigran da Silva, and Juliana Freire. On finding templates on web collections. World Wide Web, 12:171-211, 2009. 10.1007/s11280-009-0059-3.
-
(2009)
World Wide Web
, vol.12
, pp. 171-211
-
-
Vieira, K.1
Da Costa Carvalho, A.2
Berlt, K.3
De Moura, E.4
Da Silva, A.5
Freire, J.6
-
20
-
-
0021439618
-
A technique for high-performance data compression
-
T. A. Welch. A technique for high-performance data compression. Computer, 17(6):8-19, 1984.
-
(1984)
Computer
, vol.17
, Issue.6
, pp. 8-19
-
-
Welch, T.A.1
|