메뉴 건너뛰기




Volumn , Issue , 2010, Pages 441-450

Boilerplate detection using shallow text features

Author keywords

Boilerplate removal; Full text extraction; Template detection; Text cleaning; Web document modeling

Indexed keywords

CREATION PROCESS; DETECTION ACCURACY; RETRIEVAL PERFORMANCE; TEMPLATE DETECTION; TEXT ELEMENTS; TEXT EXTRACTION; TEXT FEATURE; WEB DOCUMENT; WEB DOCUMENT MODELING; WEB PAGE;

EID: 77950904942     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1718487.1718542     Document Type: Conference Paper
Times cited : (383)

References (26)
  • 2
    • 34250670773 scopus 로고    scopus 로고
    • Browsing on small screens: Recasting web-page segmentation into an efficient machine learning framework
    • New York, NY, USA, ACM
    • S. Baluja. Browsing on small screens: recasting web-page segmentation into an efficient machine learning framework. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 33-42, New York, NY, USA, 2006. ACM.
    • (2006) WWW '06: Proceedings of the 15th International Conference on World Wide Web , pp. 33-42
    • Baluja, S.1
  • 3
    • 77953052174 scopus 로고    scopus 로고
    • Template detection via data mining and its applications
    • Z. Bar-Yossef and S. Rajagopalan. Template detection via data mining and its applications. In WWW, pages 580-591, 2002.
    • (2002) WWW , pp. 580-591
    • Bar-Yossef, Z.1    Rajagopalan, S.2
  • 5
    • 21144444733 scopus 로고    scopus 로고
    • Extracting content structure for web pages based on visual representation
    • X. Zhou, Y. Zhang, and M. E. Orlowska, editors, Springer
    • D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. Extracting content structure for web pages based on visual representation. In X. Zhou, Y. Zhang, and M. E. Orlowska, editors, APWeb, volume 2642 of LNCS, pages 406-417. Springer, 2003.
    • (2003) APWeb, Volume 2642 of LNCS , pp. 406-417
    • Cai, D.1    Yu, S.2    Wen, J.-R.3    Ma, W.-Y.4
  • 10
    • 84869471345 scopus 로고    scopus 로고
    • A lightweight and efficient tool for cleaning web pages
    • N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, and D. Tapias, editors, European Language Resources Association (ELRA).
    • S. Evert. A lightweight and efficient tool for cleaning web pages. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, and D. Tapias, editors, Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, may 2008. European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2008/.
    • Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, May 2008
    • Evert, S.1
  • 14
    • 77953053369 scopus 로고    scopus 로고
    • The volume and evolution of web page templates
    • New York, NY, USA, ACM
    • D. Gibson, K. Punera, and A. Tomkins. The volume and evolution of web page templates. In WWW'05, pages 830-839, New York, NY, USA, 2005. ACM.
    • (2005) WWW'05 , pp. 830-839
    • Gibson, D.1    Punera, K.2    Tomkins, A.3
  • 16
    • 77950893778 scopus 로고    scopus 로고
    • Web corpus cleaning using content and structure
    • UCL Presses Universitaires de Louvain, September
    • K. Hofmann and W. Weerkamp. Web corpus cleaning using content and structure. In Building and Exploring Web Corpora, pages 145-154. UCL Presses Universitaires de Louvain, September 2007.
    • (2007) Building and Exploring Web Corpora , pp. 145-154
    • Hofmann, K.1    Weerkamp, W.2
  • 17
    • 19944413623 scopus 로고    scopus 로고
    • Wisdom: Web intrapage informative structure mining based on document object model
    • May
    • H.-Y. Kao, J.-M. Ho, and M.-S. Chen. Wisdom: Web intrapage informative structure mining based on document object model. Knowledge and Data Engineering, IEEE Transactions on, 17(5):614-627, May 2005.
    • (2005) Knowledge and Data Engineering, IEEE Transactions on , vol.17 , Issue.5 , pp. 614-627
    • Kao, H.-Y.1    Ho, J.-M.2    Chen, M.-S.3
  • 20
    • 84873529351 scopus 로고    scopus 로고
    • Overview of the trec 2006 blog track
    • E. M. Voorhees and L. P. Buckland, editors National Institute of Standards and Technology (NIST)
    • I. Ounis, C. Macdonald, M. de Rijke, G. Mishne, and I. Soboroff. Overview of the trec 2006 blog track. In E. M. Voorhees and L. P. Buckland, editors, TREC, volume Special Publication 500-272. National Institute of Standards and Technology (NIST), 2006.
    • (2006) TREC, volume Special Publication 500-272
    • Ounis, I.1    Macdonald, C.2    De Rijke, M.3    Mishne, G.4    Soboroff, I.5
  • 22
  • 23
    • 80053652639 scopus 로고    scopus 로고
    • Victor: The web-page cleaning tool
    • M. Spousta, M. Marek, and P. Pecina. Victor: the web-page cleaning tool. In WaC4, 2008.
    • (2008) WaC4
    • Spousta, M.1    Marek, M.2    Pecina, P.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.