-
1
-
-
35348921241
-
Do Not Crawl in the DUST: Different URLs with Similar Text
-
May
-
Bar-Yossef, Z., Keidar, I., Schonfeld, U., "Do Not Crawl in the DUST: Different URLs with Similar Text", in the Proceedings of the International World Wide Web Conference (WWW 2007), May 2007, pp. 111 - 120.
-
(2007)
Proceedings of the International World Wide Web Conference (WWW
, pp. 111-120
-
-
Bar-Yossef, Z.1
Keidar, I.2
Schonfeld, U.3
-
2
-
-
0003841075
-
-
available at
-
Berners-Lee, T., Fielding, R., Masinter, L., "Uniform Resource Identifier (URI): General Syntax", available at http://gbiv.com/protocols/ uri/rfc/rfc3986.html.
-
Uniform Resource Identifier (URI): General Syntax
-
-
Berners-Lee, T.1
Fielding, R.2
Masinter, L.3
-
3
-
-
84903657263
-
-
Morgan Kaufmann Publishers, Elservier, San Francisco, CA
-
Chakrabarti, S.: Mining the Web, Discovering Knowledge from Hypertext Data, Morgan Kaufmann Publishers, Elservier, San Francisco, CA, 2003.
-
(2003)
Mining the Web, Discovering Knowledge from Hypertext Data
-
-
Chakrabarti, S.1
-
4
-
-
85013808225
-
-
Morgan Kaufmann Publishers, Elsevier, San Francisco, CA
-
Han, J., Kamber, M.: Data Mining Concepts and Techniques, Morgan Kaufmann Publishers, Elsevier, San Francisco, CA, 2006.
-
(2006)
Data Mining Concepts and Techniques
-
-
Han, J.1
Kamber, M.2
-
5
-
-
33745966135
-
Reliable Evaluations of URL Normalization
-
Glasgow, May
-
Kim, S. J., Jeong, H. S., and Lee, S. H., "Reliable Evaluations of URL Normalization", in Proceedings of the 2006 International Conference on Computational Science and its Applications (ICCSA), Glasgow, May 2006, pp. 609-617.
-
(2006)
Proceedings of the 2006 International Conference on Computational Science and its Applications (ICCSA)
, pp. 609-617
-
-
Kim, S.J.1
Jeong, H.S.2
Lee, S.H.3
-
6
-
-
24944473702
-
-
Kim, S. J. and Lee, S. H., Implementation of a Web Root and Statistics on the Korean Web, in Proceedings of the Second International Conference on Human.Society@Internet (HIS), Seoul, June 2003, pp. 341 - 350.
-
Kim, S. J. and Lee, S. H., "Implementation of a Web Root and Statistics on the Korean Web", in Proceedings of the Second International Conference on Human.Society@Internet (HIS), Seoul, June 2003, pp. 341 - 350.
-
-
-
-
7
-
-
24944580931
-
On URL Normalization
-
Singapore, May
-
Lee, S. H., Kim, S. J., Hong, S. H., "On URL Normalization", in Proceedings of the 2005 International Conference on Computational Science and its Applications (ICCSA), Singapore, May 2005, pp. 1076-1085.
-
(2005)
Proceedings of the 2005 International Conference on Computational Science and its Applications (ICCSA)
, pp. 1076-1085
-
-
Lee, S.H.1
Kim, S.J.2
Hong, S.H.3
-
8
-
-
60349127863
-
-
available at
-
Netcraft June 2008 Web Server Survey, available at: http://news.netcraft. com/archives/web-server-survey.html
-
(2008)
Web Server Survey
-
-
-
9
-
-
33644584276
-
Crawling the Web
-
Pant, G., Srinivasan, P., Menczer, F., "Crawling the Web", Web Dynamics 2004, pp. 153 - 178.
-
(2004)
Web Dynamics
, pp. 153-178
-
-
Pant, G.1
Srinivasan, P.2
Menczer, F.3
-
11
-
-
60349105105
-
-
available at
-
Web Data Extractor, available at: http://www.webextractor.com/
-
Web Data Extractor
-
-
|