-
2
-
-
34250670773
-
Browsing on small screens: Recasting web-page segmentation into an efficient machine learning framework
-
New York, NY, USA, ACM
-
S. Baluja. Browsing on small screens: recasting web-page segmentation into an efficient machine learning framework. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 33-42, New York, NY, USA, 2006. ACM.
-
(2006)
WWW '06: Proceedings of the 15th International Conference on World Wide Web
, pp. 33-42
-
-
Baluja, S.1
-
3
-
-
77953052174
-
Template detection via data mining and its applications
-
Z. Bar-Yossef and S. Rajagopalan. Template detection via data mining and its applications. In WWW, pages 580-591, 2002.
-
(2002)
WWW
, pp. 580-591
-
-
Bar-Yossef, Z.1
Rajagopalan, S.2
-
4
-
-
85037352302
-
Cleaneval: A competition for cleaning web pages
-
N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, and D. Tapias, editors
-
M. Baroni, F. Chantree, A. Kilgarriff, and S. Sharoff. Cleaneval: a competition for cleaning web pages. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, and D. Tapias, editors, Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), 2008.
-
Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), 2008
-
-
Baroni, M.1
Chantree, F.2
Kilgarriff, A.3
Sharoff, S.4
-
5
-
-
21144444733
-
Extracting content structure for web pages based on visual representation
-
X. Zhou, Y. Zhang, and M. E. Orlowska, editors, Springer
-
D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. Extracting content structure for web pages based on visual representation. In X. Zhou, Y. Zhang, and M. E. Orlowska, editors, APWeb, volume 2642 of LNCS, pages 406-417. Springer, 2003.
-
(2003)
APWeb, Volume 2642 of LNCS
, pp. 406-417
-
-
Cai, D.1
Yu, S.2
Wen, J.-R.3
Ma, W.-Y.4
-
6
-
-
35348883378
-
Page-level template detection via isotonic smoothing
-
New York, NY, USA, ACM
-
D. Chakrabarti, R. Kumar, and K. Punera. Page-level template detection via isotonic smoothing. In WWW '07: Proc. of the 16th int. conf. on World Wide Web, pages 61-70, New York, NY, USA, 2007. ACM.
-
(2007)
WWW '07: Proc. of the 16th Int. Conf. on World Wide Web
, pp. 61-70
-
-
Chakrabarti, D.1
Kumar, R.2
Punera, K.3
-
7
-
-
57349119030
-
A graph-theoretic approach to webpage segmentation
-
New York, NY, USA, ACM
-
D. Chakrabarti, R. Kumar, and K. Punera. A graph-theoretic approach to webpage segmentation. In WWW '08: Proceeding of the 17th international conference on World Wide Web, pages 377-386, New York, NY, USA, 2008. ACM.
-
(2008)
WWW '08: Proceeding of the 17th International Conference on World Wide Web
, pp. 377-386
-
-
Chakrabarti, D.1
Kumar, R.2
Punera, K.3
-
8
-
-
84880495193
-
Detecting web page structure for adaptive viewing on small form factor devices
-
New York, NY, USA, ACM
-
Y. Chen, W.-Y. Ma, and H.-J. Zhang. Detecting web page structure for adaptive viewing on small form factor devices. In WWW '03: Proceedings of the 12th international conference on World Wide Web, pages 225-233, New York, NY, USA, 2003. ACM.
-
(2003)
WWW '03: Proceedings of the 12th International Conference on World Wide Web
, pp. 225-233
-
-
Chen, Y.1
Ma, W.-Y.2
Zhang, H.-J.3
-
9
-
-
33947621611
-
Automatic identification of informative sections of web pages
-
S. Debnath, P. Mitra, N. Pal, and C. L. Giles. Automatic identification of informative sections of web pages. IEEE Trans. on Knowledge and Data Engineering, 17(9):1233-1246, 2005.
-
(2005)
IEEE Trans. on Knowledge and Data Engineering
, vol.17
, Issue.9
, pp. 1233-1246
-
-
Debnath, S.1
Mitra, P.2
Pal, N.3
Giles, C.L.4
-
10
-
-
84869471345
-
A lightweight and efficient tool for cleaning web pages
-
N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, and D. Tapias, editors, European Language Resources Association (ELRA).
-
S. Evert. A lightweight and efficient tool for cleaning web pages. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, and D. Tapias, editors, Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, may 2008. European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2008/.
-
Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, May 2008
-
-
Evert, S.1
-
11
-
-
63149183145
-
Computing block importance for searching on web sites
-
D. Fernandes, E. S. de Moura, B. Ribeiro-Neto, A. S. da Silva, and M. A. Gonçalves. Computing block importance for searching on web sites. In CIKM '07, pages 165-174, 2007.
-
(2007)
CIKM '07
, pp. 165-174
-
-
Fernandes, D.1
De Moura, E.S.2
Ribeiro-Neto, B.3
Da Silva, A.S.4
Gonçalves, M.A.5
-
14
-
-
77953053369
-
The volume and evolution of web page templates
-
New York, NY, USA, ACM
-
D. Gibson, K. Punera, and A. Tomkins. The volume and evolution of web page templates. In WWW'05, pages 830-839, New York, NY, USA, 2005. ACM.
-
(2005)
WWW'05
, pp. 830-839
-
-
Gibson, D.1
Punera, K.2
Tomkins, A.3
-
15
-
-
77951129919
-
Adaptive web-page content identification
-
New York, NY, USA
-
J. Gibson, B. Wellner, and S. Lubar. Adaptive web-page content identification. In WIDM '07: Proceedings of the 9th annual ACM international workshop on Web information and data management, pages 105-112, New York, NY, USA, 2007. ACM.
-
(2007)
WIDM '07: Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management
, pp. 105-112
-
-
Gibson, J.1
Wellner, B.2
Lubar, S.3
-
16
-
-
77950893778
-
Web corpus cleaning using content and structure
-
UCL Presses Universitaires de Louvain, September
-
K. Hofmann and W. Weerkamp. Web corpus cleaning using content and structure. In Building and Exploring Web Corpora, pages 145-154. UCL Presses Universitaires de Louvain, September 2007.
-
(2007)
Building and Exploring Web Corpora
, pp. 145-154
-
-
Hofmann, K.1
Weerkamp, W.2
-
17
-
-
19944413623
-
Wisdom: Web intrapage informative structure mining based on document object model
-
May
-
H.-Y. Kao, J.-M. Ho, and M.-S. Chen. Wisdom: Web intrapage informative structure mining based on document object model. Knowledge and Data Engineering, IEEE Transactions on, 17(5):614-627, May 2005.
-
(2005)
Knowledge and Data Engineering, IEEE Transactions on
, vol.17
, Issue.5
, pp. 614-627
-
-
Kao, H.-Y.1
Ho, J.-M.2
Chen, M.-S.3
-
20
-
-
84873529351
-
Overview of the trec 2006 blog track
-
E. M. Voorhees and L. P. Buckland, editors National Institute of Standards and Technology (NIST)
-
I. Ounis, C. Macdonald, M. de Rijke, G. Mishne, and I. Soboroff. Overview of the trec 2006 blog track. In E. M. Voorhees and L. P. Buckland, editors, TREC, volume Special Publication 500-272. National Institute of Standards and Technology (NIST), 2006.
-
(2006)
TREC, volume Special Publication 500-272
-
-
Ounis, I.1
Macdonald, C.2
De Rijke, M.3
Mishne, G.4
Soboroff, I.5
-
21
-
-
84865651487
-
Extracting article text from the web with maximum subsequence segmentation
-
New York, NY, USA, ACM
-
J. Pasternack and D. Roth. Extracting article text from the web with maximum subsequence segmentation. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 971-980, New York, NY, USA, 2009. ACM.
-
(2009)
WWW '09: Proceedings of the 18th International Conference on World Wide Web
, pp. 971-980
-
-
Pasternack, J.1
Roth, D.2
-
22
-
-
84856043672
-
A mathematical theory of communication
-
C. E. Shannon. A mathematical theory of communication. Bell system techn. journal, 27, 1948.
-
(1948)
Bell System Techn. Journal
, vol.27
-
-
Shannon, C.E.1
-
24
-
-
34547631600
-
A fast and robust method for web page template detection and removal
-
K. Vieira, A. S. da Silva, N. Pinto, E. S. de Moura, a. M. B. C. Jo and J. Freire. A fast and robust method for web page template detection and removal. In CIKM '06: Proc. 15th ACM int. conf. on Information and knowledge management, pages 258-267, 2006.
-
(2006)
CIKM '06: Proc. 15th ACM Int. Conf. on Information and Knowledge Management
, pp. 258-267
-
-
Vieira, K.1
Da Silva, A.S.2
Pinto, N.3
De Moura, E.S.4
Jo, A.M.B.C.5
Freire, J.6
|