-
1
-
-
77954752785
-
-
Candidate Recommendation, Sept.
-
Çelik, T., Bos, B., Hickson, I., and Lie, H. W. Cascading style sheets level 2 revision 1 (CSS 2.1) specification. Candidate recommendation, W3C, Sept. 2009. http://www.w3.org/TR/2009/CR-CSS2-20090908.
-
(2009)
Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification
-
-
Çelik, T.1
Bos, B.2
Hickson, I.3
Lie, H.W.4
-
2
-
-
80054777630
-
-
Online; accessed 18-January-2011
-
Cleaneval home page. http://cleaneval.sigwac.org.uk. [Online; accessed 18-January-2011].
-
Cleaneval Home Page
-
-
-
3
-
-
80054792930
-
-
DOM CSS Properties - MDC Doc Center. Online; accessed 13-April-2011
-
DOM CSS Properties - MDC Doc Center. https://developer.mozilla.org/en/ DOM/CSS. [Online; accessed 13-April-2011].
-
-
-
-
4
-
-
80054776800
-
-
Online; accessed 01-September-2010
-
Gecko. https://developer.mozilla.org/en/Gecko. [Online; accessed 01-September-2010].
-
Gecko
-
-
-
5
-
-
76749092270
-
The weka data mining software: An update
-
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11, 1 (2009), 10-18.
-
(2009)
ACM SIGKDD Explorations Newsletter
, vol.11
, Issue.1
, pp. 10-18
-
-
Hall, M.1
Frank, E.2
Holmes, G.3
Pfahringer, B.4
Reutemann, P.5
Witten, I.6
-
6
-
-
84885654015
-
Title extraction from bodies of html documents and its application to web page retrieval
-
ACM
-
Hu, Y., Xin, G., Song, R., Hu, G., Shi, S., Cao, Y., and Li, H. Title extraction from bodies of html documents and its application to web page retrieval. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (New York, NY, USA, 2005), ACM, pp. 250-257.
-
SIGIR '05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (New York, NY, USA, 2005)
, pp. 250-257
-
-
Hu, Y.1
Xin, G.2
Song, R.3
Hu, G.4
Shi, S.5
Cao, Y.6
Li, H.7
-
7
-
-
0042560939
-
-
W3C recommendation, Dec.
-
Jacobs, I., Raggett, D., and Hors, A. L. HTML 4.01 specification. W3C recommendation, W3C, Dec. 1999. http://www.w3.org/TR/1999/REC-html401- 19991224.
-
(1999)
HTML 4.01 Specification
-
-
Jacobs, I.1
Raggett, D.2
Hors, A.L.3
-
8
-
-
74549203263
-
A fast and simple method for extracting relevant content from news webpages
-
D. W.-L. Cheung, I.-Y. Song, W. W. Chu, X. Hu, and J. J. Lin, Eds., ACM
-
Laber, E. S., de Souza, C. P., Jabour, I. V., de Amorim, E. C. F., Cardoso, E. T., Rentería, R. P., Tinoco, L. C., and Valentim, C. D. A fast and simple method for extracting relevant content from news webpages. In CIKM (2009), D. W.-L. Cheung, I.-Y. Song, W. W. Chu, X. Hu, and J. J. Lin, Eds., ACM, pp. 1685-1688.
-
(2009)
CIKM
, pp. 1685-1688
-
-
Laber, E.S.1
De Souza, C.P.2
Jabour, I.V.3
De Amorim, E.C.F.4
Cardoso, E.T.5
Rentería, R.P.6
Tinoco, L.C.7
Valentim, C.D.8
-
9
-
-
0000390142
-
Binary codes with correction of deletions, insertions and substitution of symbols
-
Levenshtein, V. I. Binary codes with correction of deletions, insertions and substitution of symbols. Doklady Akademii Nauk SSSR 163, 4 (1965), 845-848.
-
(1965)
Doklady Akademii Nauk SSSR
, vol.163
, Issue.4
, pp. 845-848
-
-
Levenshtein, V.I.1
-
10
-
-
70450228579
-
Web article extraction for web printing: A dom+visual based approach
-
ACM
-
Luo, P., Fan, J., Liu, S., Lin, F., Xiong, Y., and Liu, J. Web article extraction for web printing: a dom+visual based approach. In Proceedings of the 9th ACM symposium on Document engineering (2009), ACM, pp. 66-69.
-
(2009)
Proceedings of the 9th ACM Symposium on Document Engineering
, pp. 66-69
-
-
Luo, P.1
Fan, J.2
Liu, S.3
Lin, F.4
Xiong, Y.5
Liu, J.6
-
11
-
-
80054781618
-
Web page cleaning with conditional random fields
-
Marek, M., Pecina, P., and Spousta, M. Web page cleaning with conditional random fields. Cahiers du Cental 5 (2007), 1.
-
(2007)
Cahiers du Cental
, vol.5
, pp. 1
-
-
Marek, M.1
Pecina, P.2
Spousta, M.3
-
12
-
-
80054780817
-
-
MSHTML. Online; accessed 01-September-2010
-
MSHTML. http://msdn.microsoft.com/en- us/library/bb508516(v=VS.85).aspx. [Online; accessed 01-September-2010].
-
-
-
-
13
-
-
33646046750
-
-
W3C recommendation, Apr.
-
Nicol, G., Champion, M., Hégaret, P. L., Robie, J., Wood, L., Hors, A. L., and Byrne, S. Document object model (DOM) level 3 core specification. W3C recommendation, W3C, Apr. 2004. http://www.w3.org/TR/2004/ REC-DOM-Level-3- Core-20040407.
-
(2004)
Document Object Model (DOM) Level 3 Core Specification
-
-
Nicol, G.1
Champion, M.2
Hégaret, P.L.3
Robie, J.4
Wood, L.5
Hors, A.L.6
Byrne, S.7
-
16
-
-
80054787514
-
A comparison of discriminative classifiers for web news content extraction
-
Spengler, A., Bordes, A., and Gallinari, P. A comparison of discriminative classifiers for web news content extraction. In Proceedings of RIAO 2010, 9th Int. Conf. on Adaptivity, Personalization and Fusion of Heterogeneous Information (2010).
-
(2010)
Proceedings of RIAO 2010, 9th Int. Conf. on Adaptivity, Personalization and Fusion of Heterogeneous Information
-
-
Spengler, A.1
Bordes, A.2
Gallinari, P.3
-
20
-
-
80054785444
-
-
[Online; accessed 13-April-2011].
-
The WebKit Open Source Project - CSS (Cascading Style Sheets). http://www.webkit.org/projects/css/index.html. [Online; accessed 13-April-2011].
-
-
-
-
22
-
-
34547631600
-
A fast and robust method for web page template detection and removal
-
ACM
-
Vieira, K., da Silva, A. S., Pinto, N., de Moura, E. S., Cavalcanti, J. M. B., and Freire, J. A fast and robust method for web page template detection and removal. In CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management (New York, NY, USA, 2006), ACM, pp. 258-267.
-
CIKM '06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management (New York, NY, USA, 2006)
, pp. 258-267
-
-
Vieira, K.1
Da Silva, A.S.2
Pinto, N.3
De Moura, E.S.4
Cavalcanti, J.M.B.5
Freire, J.6
-
23
-
-
71049182378
-
Can we learn a template-independent wrapper for news article extraction from a single training site?
-
ACM
-
Wang, J., Chen, C., Wang, C., Pei, J., Bu, J., Guan, Z., and Zhang, W. V. Can we learn a template-independent wrapper for news article extraction from a single training site? In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (2009), ACM, pp. 1345-1354.
-
Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009)
, pp. 1345-1354
-
-
Wang, J.1
Chen, C.2
Wang, C.3
Pei, J.4
Bu, J.5
Guan, Z.6
Zhang, W.V.7
-
24
-
-
78650301275
-
News article extraction with template-independent wrapper
-
ACM
-
Wang, J., He, X., Wang, C., Pei, J., Bu, J., Chen, C., Guan, Z., and Lu, G. News article extraction with template-independent wrapper. In Proceedings of the 18th international conference on World wide web (2009), ACM, pp. 1085-1086.
-
Proceedings of the 18th International Conference on World Wide Web (2009)
, pp. 1085-1086
-
-
Wang, J.1
He, X.2
Wang, C.3
Pei, J.4
Bu, J.5
Chen, C.6
Guan, Z.7
Lu, G.8
-
25
-
-
80054787405
-
-
Online; accessed 01-September-2010
-
WebKit. http://webkit.org/. [Online; accessed 01-September-2010].
-
-
-
-
27
-
-
80054780989
-
-
Online; accessed 14-September-2010
-
World Wide Web Consortium (W3C). http://www.w3c.org/. [Online; accessed 14-September-2010].
-
-
-
-
28
-
-
34247363313
-
Web page title extraction and its application
-
DOI 10.1016/j.ipm.2006.11.007, PII S0306457306001981
-
Xue, Y., Hu, Y., Xin, G., Song, R., Shi, S., Cao, Y., Lin, C.-Y., and Li, H. Web page title extraction and its application. Information Processing and Management 43, 5 (2007), 1332-1347. (Pubitemid 46636023)
-
(2007)
Information Processing and Management
, vol.43
, Issue.5
, pp. 1332-1347
-
-
Xue, Y.1
Hu, Y.2
Xin, G.3
Song, R.4
Shi, S.5
Cao, Y.6
Lin, C.-Y.7
Li, H.8
|