메뉴 건너뛰기




Volumn , Issue , 2008, Pages 1173-1182

A densitometric approach to web page segmentation

Author keywords

Full text extraction; Noise removal; Template detection; Web page Segmentation

Indexed keywords

FULL-TEXT EXTRACTION; FULL-TEXT SEARCH; HTML PAGES; NEW APPROACHES; NOISE REMOVAL; QUANTITATIVE LINGUISTICS; TEMPLATE DETECTION; TEXT CLASSIFICATION; TEXT SEGMENTS; WEB PAGE; WEB PAGE SEGMENTATION;

EID: 70349243805     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1458082.1458237     Document Type: Conference Paper
Times cited : (97)

References (26)
  • 1
    • 65849329232 scopus 로고
    • Verteilungen der Satzlängen (Distribution of Sentence Lengths)
    • K.-P. Schulz, editor, Brockmeyer
    • Gabriel Altmann. Verteilungen der Satzlängen (Distribution of Sentence Lengths). In K.-P. Schulz, editor, Glottometrika 9. Brockmeyer, 1988.
    • (1988) Glottometrika 9
    • Altmann, G.1
  • 3
    • 34250670773 scopus 로고    scopus 로고
    • Browsing on small screens: Recasting web-page segmentation into an efficient machine learning framework
    • New York, NY, USA, ACM
    • Shumeet Baluja. Browsing on small screens: recasting web-page segmentation into an efficient machine learning framework. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 33-42, New York, NY, USA, 2006. ACM.
    • (2006) WWW '06: Proceedings of the 15th international conference on World Wide Web , pp. 33-42
    • Baluja, S.1
  • 4
    • 77953052174 scopus 로고    scopus 로고
    • Template detection via data mining and its applications
    • Ziv Bar-Yossef and Sridhar Rajagopalan. Template detection via data mining and its applications. In WWW, pages 580-591, 2002.
    • (2002) , pp. 580-591
    • Bar-Yossef, Z.1    Rajagopalan, S.2
  • 6
    • 70349239684 scopus 로고    scopus 로고
    • Sprachliche Einheiten in Textblöcken
    • RAM Verlag, L̈udenscheid
    • Karl-Heinz Best. Sprachliche Einheiten in Textblöcken. In Glottometrics 9, pages 1-12. RAM Verlag, L̈udenscheid, 2005.
    • (2005) Glottometrics 9 , pp. 1-12
    • Best, K.-H.1
  • 7
    • 21144444733 scopus 로고    scopus 로고
    • Extracting content structure for web pages based on visual representation
    • X. Zhou, Y. Zhang, and M. E. Orlowska, editors, APWeb, of, Springer
    • Deng Cai, Shipeng Yu, Ji-Rong Wen, and Wei-Ying Ma. Extracting content structure for web pages based on visual representation. In X. Zhou, Y. Zhang, and M. E. Orlowska, editors, APWeb, volume 2642 of LNCS, pages 406-417. Springer, 2003.
    • (2003) LNCS , vol.2642 , pp. 406-417
    • Cai, D.1    Yu, S.2    Wen, J.-R.3    Ma, W.-Y.4
  • 14
    • 65849422541 scopus 로고    scopus 로고
    • Zipf's law against the text size: A half-rational model
    • RAM Verlag, Lüdenscheid
    • Lukasz Debowski. Zipf's law against the text size: a half-rational model. In Glottometrics 4, pages 49-60. RAM Verlag, Lüdenscheid, 2002.
    • (2002) Glottometrics 4 , pp. 49-60
    • Debowski, L.1
  • 15
    • 63149183145 scopus 로고    scopus 로고
    • David Fernandes, Edleno S. de Moura, Berthie Ribeiro-Neto, Altigran S. da Silva, and Marcos André Gonçalves. Computing block importance for searching on web sites. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 165-174, New York, NY, USA, 2007. ACM.
    • David Fernandes, Edleno S. de Moura, Berthie Ribeiro-Neto, Altigran S. da Silva, and Marcos André Gonçalves. Computing block importance for searching on web sites. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 165-174, New York, NY, USA, 2007. ACM.
  • 16
    • 77953053369 scopus 로고    scopus 로고
    • The volume and evolution of web page templates
    • Allan Ellis and Tatsuya Hagino, editors, ACM
    • David Gibson, Kunal Punera, and Andrew Tomkins. The volume and evolution of web page templates. In Allan Ellis and Tatsuya Hagino, editors, WWW (Special interest track), pages 830-839. ACM, 2005.
    • (2005) WWW (Special interest track) , pp. 830-839
    • Gibson, D.1    Punera, K.2    Tomkins, A.3
  • 17
    • 70349259571 scopus 로고    scopus 로고
    • Peter Grzybek. On the systematic and system-based study of grapheme frequencies - a re-analysis of german letter frequencies. In G. Altmann, K.-H. Best, and P. Grzybek et al., editors, Glottometrics 15, pages 82-91. RAM Verlag, Lüdenscheid, 2007.
    • Peter Grzybek. On the systematic and system-based study of grapheme frequencies - a re-analysis of german letter frequencies. In G. Altmann, K.-H. Best, and P. Grzybek et al., editors, Glottometrics 15, pages 82-91. RAM Verlag, Lüdenscheid, 2007.
  • 19
    • 0000008146 scopus 로고
    • Comparing partitions
    • December
    • Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of Classification, 2(1):193-218, December 1985.
    • (1985) Journal of Classification , vol.2 , Issue.1 , pp. 193-218
    • Hubert, L.1    Arabie, P.2
  • 20
    • 19944413623 scopus 로고    scopus 로고
    • Wisdom: Web intrapage informative structure mining based on document object model
    • May
    • Hung-Yu Kao, Jan-Ming Ho, and Ming-Syan Chen. Wisdom: Web intrapage informative structure mining based on document object model. Knowledge and Data Engineering, IEEE Transactions on, 17(5):614-627, May 2005.
    • (2005) Knowledge and Data Engineering, IEEE Transactions on , vol.17 , Issue.5 , pp. 614-627
    • Kao, H.-Y.1    Ho, J.-M.2    Chen, M.-S.3
  • 22
    • 0004285133 scopus 로고    scopus 로고
    • Prentice Hall PTR, Upper Saddle River, NJ, USA
    • George Stockman and Linda G. Shapiro. Computer Vision. Prentice Hall PTR, Upper Saddle River, NJ, USA, 2001.
    • (2001) Computer Vision
    • Stockman, G.1    Shapiro, L.G.2
  • 23
    • 0041965980 scopus 로고    scopus 로고
    • Cluster ensembles - a knowledge reuse framework for combining multiple partitions
    • Alexander Strehl and Joydeep Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res., 3:583-617, 2003.
    • (2003) J. Mach. Learn. Res , vol.3 , pp. 583-617
    • Strehl, A.1    Ghosh, J.2
  • 24
    • 34547631600 scopus 로고    scopus 로고
    • Karane Vieira, Altigran S. da Silva, Nick Pinto, Edleno S. de Moura, ao M. B. Cavalcanti Jo and Juliana Freire. A fast and robust method for web page template detection and removal. In CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management, pages 258-267, New York, NY, USA, 2006. ACM.
    • Karane Vieira, Altigran S. da Silva, Nick Pinto, Edleno S. de Moura, ao M. B. Cavalcanti Jo and Juliana Freire. A fast and robust method for web page template detection and removal. In CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management, pages 258-267, New York, NY, USA, 2006. ACM.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.