메뉴 건너뛰기




Volumn , Issue , 2009, Pages 1345-1353

Can we learn a template-independent wrapper for news article extraction from a single training site?

Author keywords

Classification; Data extraction; Web mining

Indexed keywords

CLASSIFICATION; DATA EXTRACTION; EXISTING METHOD; EXTRACTION METHOD; MACHINE LEARNING PROBLEM; NEWS AGGREGATION; NEWS ARTICLES; PLAIN TEXT; SINGLE SITES; USER FRIENDLY; WEB APPLICATION; WEB MINING; WRAPPER INDUCTION;

EID: 71049182378     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1557019.1557163     Document Type: Conference Paper
Times cited : (45)

References (25)
  • 1
    • 1142303684 scopus 로고    scopus 로고
    • Extracting structured data from web
    • H. Arasu, A. Garcia-Molina. Extracting structured data from web pages. In SIGMOD'03, pages 337-348, 2003.
    • (2003) SIGMOD'03 , pp. 337-348
    • Arasu, H.1    Garcia-Molina, A.2
  • 2
    • 15544389985 scopus 로고    scopus 로고
    • Tree-structured template generation for web
    • J. Y. Chuang, S. L. Hsu. Tree-structured template generation for web pages. In Web Intelligence'04, pages 327-333, 2004.
    • (2004) Web Intelligence'04 , pp. 327-333
    • Chuang, J.Y.1    Hsu, S.L.2
  • 3
    • 84944327150 scopus 로고    scopus 로고
    • Roadrunner: Towards automatic data extraction from large web sites
    • G. M. P. Crescenzi, V. Mecca. Roadrunner: Towards automatic data extraction from large web sites. In VLDB'01, 2001.
    • (2001) VLDB'01
    • Crescenzi, G.M.P.1    Mecca, V.2
  • 4
    • 3142764439 scopus 로고    scopus 로고
    • Web wrapper induction: A brief survey
    • S. Flesca. Web wrapper induction: a brief survey. AI Communications, 17(2):57-61, 2004.
    • (2004) AI Communications , vol.17 , Issue.2 , pp. 57-61
    • Flesca, S.1
  • 5
    • 84880498138 scopus 로고    scopus 로고
    • G. N. D. G. P. Gupta, S. Kaiser. Dom-based content extraction of html documents. In WWW'03, pages 207-214, 2003.
    • G. N. D. G. P. Gupta, S. Kaiser. Dom-based content extraction of html documents. In WWW'03, pages 207-214, 2003.
  • 6
    • 0032309862 scopus 로고    scopus 로고
    • Generating finite-state transducers for semi-structured data extraction from the web
    • M. T. Hsu, C. N. Dung. Generating finite-state transducers for semi-structured data extraction from the web. Information Systems, 23(8):521-538, 1998.
    • (1998) Information Systems , vol.23 , Issue.8 , pp. 521-538
    • Hsu, M.T.1    Dung, C.N.2
  • 7
    • 84885654015 scopus 로고    scopus 로고
    • Title extraction from bodies of html documents and its application to web page retrieval
    • Y. Hu, G. Xin, R. Song, G. Hu, S. Shi, Y. Cao, and H. Li. Title extraction from bodies of html documents and its application to web page retrieval. In SIGIR'05, pages 250-257, 2005.
    • (2005) SIGIR'05 , pp. 250-257
    • Hu, Y.1    Xin, G.2    Song, R.3    Hu, G.4    Shi, S.5    Cao, Y.6    Li, H.7
  • 8
    • 34250750133 scopus 로고    scopus 로고
    • Interactive wrapper generation with minimal user effort
    • T. Irmak, U. Suel. Interactive wrapper generation with minimal user effort. In WWW'06, pages 553-563, 2006.
    • (2006) WWW'06 , pp. 553-563
    • Irmak, T.1    Suel, U.2
  • 9
    • 34548334405 scopus 로고    scopus 로고
    • N. Jindal and B. Liu. Review spam detection. In WWW'07, pages 1189-1190, 2007.
    • N. Jindal and B. Liu. Review spam detection. In WWW'07, pages 1189-1190, 2007.
  • 10
    • 71049193836 scopus 로고
    • An interactive, personalized, newspaper on the www
    • K. A. M. C. Kamba, T. Bharat. An interactive, personalized, newspaper on the www. In WWW'95, 1995.
    • (1995) WWW'95
    • Kamba, K.A.M.C.1    Bharat, T.2
  • 11
    • 0034172374 scopus 로고    scopus 로고
    • Wrapper induction: Efficiency and expressiveness
    • N. Kushmerick. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence, 118(1-2):15-68, 2000.
    • (2000) Artificial Intelligence , vol.118 , Issue.1-2 , pp. 15-68
    • Kushmerick, N.1
  • 13
    • 71049183492 scopus 로고    scopus 로고
    • Web content mining (tutorial)
    • B. Liu. Web content mining (tutorial). In WWW'05, 2005.
    • (2005) WWW'05
    • Liu, B.1
  • 15
    • 0003243224 scopus 로고    scopus 로고
    • Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods
    • J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 1999.
    • (1999) Advances in Large Margin Classifiers
    • Platt, J.C.1
  • 16
    • 4644340823 scopus 로고    scopus 로고
    • Automatic web news extraction using tree edit distance
    • D. C. Reis, P. B. Golgher, A. S. Silva, and A. F. Laender. Automatic web news extraction using tree edit distance. In WWW'04, pages 502-511, 2004.
    • (2004) WWW'04 , pp. 502-511
    • Reis, D.C.1    Golgher, P.B.2    Silva, A.S.3    Laender, A.F.4
  • 17
    • 33845883383 scopus 로고    scopus 로고
    • Automation in information extraction and data integration (tutorial)
    • S. Sarawagi. Automation in information extraction and data integration (tutorial). In VLDB'02, 2002.
    • (2002) VLDB'02
    • Sarawagi, S.1
  • 18
    • 0001122858 scopus 로고
    • The tree-to-tree editing problem
    • S. M. Selkow. The tree-to-tree editing problem. Information Processing Letters, 6(6):184-186, 1977.
    • (1977) Information Processing Letters , vol.6 , Issue.6 , pp. 184-186
    • Selkow, S.M.1
  • 20
    • 18744381159 scopus 로고    scopus 로고
    • Learning block importance models for web
    • R. Song, H. Liu, J. R. Wen, andW. Y. Ma. Learning block importance models for web pages. In WWW'04, 2004.
    • (2004) WWW'04
    • Song, R.1    Liu, H.2    Wen, J.R.3    andW4    Ma, Y.5
  • 21
    • 0040864988 scopus 로고
    • Principles of risk minimization for learning theory
    • V. Vapnik. Principles of risk minimization for learning theory. In NIPS'91, pages 831-838, 1991.
    • (1991) NIPS'91 , pp. 831-838
    • Vapnik, V.1
  • 22
    • 33744821948 scopus 로고    scopus 로고
    • Web data extraction based on partial tree alignment
    • B. Zhai, Y. Liu. Web data extraction based on partial tree alignment. In WWW'05, pages 76-85, 2005.
    • (2005) WWW'05 , pp. 76-85
    • Zhai, B.1    Liu, Y.2
  • 23
    • 71049171156 scopus 로고    scopus 로고
    • W. W. Z. R. V. Y. C. Zhao, H. Meng. Fully automatic wrapper generation for search engines. In WWW'05, pages 66-75, 2005.
    • W. W. Z. R. V. Y. C. Zhao, H. Meng. Fully automatic wrapper generation for search engines. In WWW'05, pages 66-75, 2005.
  • 24
    • 36849073188 scopus 로고    scopus 로고
    • Mining templates from search result records of search engines
    • W. Y. C. Zhao, H. Meng. Mining templates from search result records of search engines. In SIGKDD'07, pages 884-893, 2007.
    • (2007) SIGKDD'07 , pp. 884-893
    • Zhao, W.Y.C.1    Meng, H.2
  • 25
    • 54249088690 scopus 로고    scopus 로고
    • Template-independent news extraction based on visual consistency
    • S. Zheng, R. Song, and J. Wen. Template-independent news extraction based on visual consistency. In SAAAI'07, volume 22, pages 1507-1513, 2007.
    • (2007) SAAAI'07 , vol.22 , pp. 1507-1513
    • Zheng, S.1    Song, R.2    Wen, J.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.