메뉴 건너뛰기




Volumn , Issue , 2010, Pages 971-980

CETR - Content extraction via tag ratios

Author keywords

content extraction; tag ratio; world wide web

Indexed keywords

ALTERNATIVE METHODS; CLUSTERING TECHNIQUES; CONTENT EXTRACTION; EXISTING METHOD; HTML DOCUMENTS; PRECISION AND RECALL; TWO DIMENSIONAL MODEL; TWO-DIMENSION; WEB CORPORA; WEB DOMAINS; WEB PAGE;

EID: 77954569037     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1772690.1772789     Document Type: Conference Paper
Times cited : (120)

References (33)
  • 2
    • 0032092761 scopus 로고    scopus 로고
    • Nodose - A tool for semi-automatically extracting semi-structured data from text documents
    • ACM Press
    • B. Adelberg. Nodose - a tool for semi-automatically extracting semi-structured data from text documents. In SIGMOD Conference, pages 283-294. ACM Press, 1998.
    • (1998) SIGMOD Conference , pp. 283-294
    • Adelberg, B.1
  • 3
    • 77953052174 scopus 로고    scopus 로고
    • Template detection via data mining and its applications
    • Z. Bar-Yossef and S. Rajagopalan. Template detection via data mining and its applications. In WWW, pages 580-591, 2002.
    • (2002) WWW , pp. 580-591
    • Bar-Yossef, Z.1    Rajagopalan, S.2
  • 4
    • 0035029462 scopus 로고    scopus 로고
    • Accordion summarization for end-game browsing on pdas and cellular phones
    • O. Buyukkokten, H. Garcia-Molina, and A. Paepcke. Accordion summarization for end-game browsing on pdas and cellular phones. In CHI, pages 213-220, 2001.
    • (2001) CHI , pp. 213-220
    • Buyukkokten, O.1    Garcia-Molina, H.2    Paepcke, A.3
  • 5
    • 8644241107 scopus 로고    scopus 로고
    • Block-level link analysis
    • ACM
    • D. Cai, X. He, J.-R.Wen, andW.-Y. Ma. Block-level link analysis. In SIGIR, pages 440-447. ACM, 2004.
    • (2004) SIGIR , pp. 440-447
    • Cai, D.1    He, X.2    Wen, J.-R.3    Ma, W.-Y.4
  • 6
    • 21144444733 scopus 로고    scopus 로고
    • Extracting content structure for web pages based on visual representation
    • APWeb, Springer
    • D. Cai, S. Yu, J.-R.Wen, andW.-Y. Ma. Extracting content structure for web pages based on visual representation. In APWeb, volume 2642 of Lecture Notes in Computer Science, pages 406-417. Springer, 2003.
    • (2003) Lecture Notes in Computer Science , vol.2642 , pp. 406-417
    • Cai, D.1    Yu, S.2    Wen, J.-R.3    Ma, W.-Y.4
  • 7
    • 18744372151 scopus 로고    scopus 로고
    • Misuse detection for information retrieval systems
    • ACM
    • R. Cathey, L. Ma, N. Goharian, and D. A. Grossman. Misuse detection for information retrieval systems. In CIKM, pages 183-190. ACM, 2003.
    • (2003) CIKM , pp. 183-190
    • Cathey, R.1    Ma, L.2    Goharian, N.3    Grossman, D.A.4
  • 9
    • 33751046629 scopus 로고    scopus 로고
    • Template detection for large scale search engines
    • ACM
    • L. Chen, S. Ye, and X. Li. Template detection for large scale search engines. In SAC, pages 1094-1098. ACM, 2006.
    • (2006) SAC , pp. 1094-1098
    • Chen, L.1    Ye, S.2    Li, X.3
  • 10
    • 4644340823 scopus 로고    scopus 로고
    • Automatic web news extraction using tree edit distance
    • ACM
    • D. de Castro Reis, P. B. Golgher, A. S. da Silva, and A. H. F. Laender. Automatic web news extraction using tree edit distance. In WWW, pages 502-511. ACM, 2004.
    • (2004) WWW , pp. 502-511
    • De Castro Reis, D.1    Golgher, P.B.2    Da Silva, A.S.3    Laender, A.H.F.4
  • 11
    • 26844469211 scopus 로고    scopus 로고
    • Automatic extraction of informative blocks from webpages
    • ACM
    • S. Debnath, P. Mitra, and C. L. Giles. Automatic extraction of informative blocks from webpages. In SAC, pages 1722-1726. ACM, 2005.
    • (2005) SAC , pp. 1722-1726
    • Debnath, S.1    Mitra, P.2    Giles, C.L.3
  • 12
    • 26944496810 scopus 로고    scopus 로고
    • Identifying content blocks from web documents
    • ISMIS, Springer
    • S. Debnath, P. Mitra, and C. L. Giles. Identifying content blocks from web documents. In ISMIS, volume 3488 of Lecture Notes in Computer Science, pages 285-293. Springer, 2005.
    • (2005) Lecture Notes in Computer Science , vol.3488 , pp. 285-293
    • Debnath, S.1    Mitra, P.2    Giles, C.L.3
  • 14
    • 57849154238 scopus 로고    scopus 로고
    • Evaluating content extraction on html documents
    • T. Gottron. Evaluating content extraction on html documents. In ITA, pages 123-132, 2007.
    • (2007) ITA , pp. 123-132
    • Gottron, T.1
  • 15
    • 70349131454 scopus 로고    scopus 로고
    • Combining content extraction heuristics: The ombine system
    • ACM
    • T. Gottron. Combining content extraction heuristics: the ombine system. In iiWAS, pages 591-595. ACM, 2008.
    • (2008) iiWAS , pp. 591-595
    • Gottron, T.1
  • 16
    • 57849123029 scopus 로고    scopus 로고
    • Content code blurring: A new approach to content extraction
    • T. Gottron. Content code blurring: A new approach to content extraction. In DEXA Workshops [1], pages 29-33.
    • DEXA Workshops , Issue.1 , pp. 29-33
    • Gottron, T.1
  • 18
    • 84880498138 scopus 로고    scopus 로고
    • Dom-based content extraction of html documents
    • S. Gupta, G. E. Kaiser, D. Neistadt, and P. Grimm. Dom-based content extraction of html documents. In WWW, pages 207-214, 2003.
    • (2003) WWW , pp. 207-214
    • Gupta, S.1    Kaiser, G.E.2    Neistadt, D.3    Grimm, P.4
  • 20
    • 0002985122 scopus 로고    scopus 로고
    • Wrapping web data into xml
    • W. Han, D. Buttler, and C. Pu. Wrapping web data into xml. SIGMOD Rec., 30(3):33-38, 2001.
    • (2001) SIGMOD Rec. , vol.30 , Issue.3 , pp. 33-38
    • Han, W.1    Buttler, D.2    Pu, C.3
  • 21
    • 0742268832 scopus 로고    scopus 로고
    • Mining web informative structures and contents based on entropy analysis
    • H.-Y. Kao, S.-H. Lin, J.-M. Ho, and M.-S. Chen. Mining web informative structures and contents based on entropy analysis. IEEE Trans. Knowl. Data Eng., 16(1):41-55, 2004.
    • (2004) IEEE Trans. Knowl. Data Eng. , vol.16 , Issue.1 , pp. 41-55
    • Kao, H.-Y.1    Lin, S.-H.2    Ho, J.-M.3    Chen, M.-S.4
  • 22
    • 0034172374 scopus 로고    scopus 로고
    • Wrapper induction: Efficiency and expressiveness
    • N. Kushmerick. Wrapper induction: efficiency and expressiveness. Artificial Intelligence, 118(1-2):15-68, 2000.
    • (2000) Artificial Intelligence , vol.118 , Issue.1-2 , pp. 15-68
    • Kushmerick, N.1
  • 23
    • 0242456776 scopus 로고    scopus 로고
    • Discovering informative content blocks from web documents
    • ACM
    • S.-H. Lin and J.-M. Ho. Discovering informative content blocks from web documents. In KDD, pages 588-593. ACM, 2002.
    • (2002) KDD , pp. 588-593
    • Lin, S.-H.1    Ho, J.-M.2
  • 25
    • 77950332467 scopus 로고    scopus 로고
    • Separating xhtml content from navigation clutter using dom-structure block analysis
    • S. Reich and M. Tzagarakis, editors, ACM
    • C. Mantratzis, M. A. Orgun, and S. Cassidy. Separating xhtml content from navigation clutter using dom-structure block analysis. In S. Reich and M. Tzagarakis, editors, Hypertext, pages 145-147. ACM, 2005.
    • (2005) Hypertext , pp. 145-147
    • Mantratzis, C.1    Orgun, M.A.2    Cassidy, S.3
  • 26
    • 84938812620 scopus 로고    scopus 로고
    • Template detection through conditional random fields
    • M. Marek, P. Pecina, and M. Spousta. Template detection through conditional random fields. In WAC3, 2007.
    • (2007) WAC3
    • Marek, M.1    Pecina, P.2    Spousta, M.3
  • 27
  • 28
    • 84865651487 scopus 로고    scopus 로고
    • Extracting article text from the web with maximum subsequence segmentation
    • ACM
    • J. Pasternack and D. Roth. Extracting article text from the web with maximum subsequence segmentation. In WWW, pages 971-980. ACM, 2009.
    • (2009) WWW , pp. 971-980
    • Pasternack, J.1    Roth, D.2
  • 29
    • 0036989234 scopus 로고    scopus 로고
    • Quasm: A system for question answering using semi-structured data
    • ACM
    • D. Pinto, M. Branstein, R. Coleman, W. B. Croft, M. King, W. Li, and X. Wei. Quasm: a system for question answering using semi-structured data. In JCDL, pages 46-55. ACM, 2002.
    • (2002) JCDL , pp. 46-55
    • Pinto, D.1    Branstein, M.2    Coleman, R.3    Croft, W.B.4    King, M.5    Li, W.6    Wei, X.7
  • 30
    • 0038144389 scopus 로고    scopus 로고
    • Content extraction from html documents
    • A. F. R. Rahman, H. Alam, and R. Hartono. Content extraction from html documents. In WDA, pages 7-10, 2001.
    • (2001) WDA , pp. 7-10
    • Rahman, A.F.R.1    Alam, H.2    Hartono, R.3
  • 31
    • 58849102735 scopus 로고    scopus 로고
    • Toward 2w, beyond web 2.0
    • T. V. Raman. Toward 2w, beyond web 2.0. Commun. ACM, 52(2):52-59, 2009.
    • (2009) Commun. ACM , vol.52 , Issue.2 , pp. 52-59
    • Raman, T.V.1
  • 32
    • 57849147691 scopus 로고    scopus 로고
    • Text extraction from the web via text-to-tag ratio
    • T. Weninger and W. H. Hsu. Text extraction from the web via text-to-tag ratio. In DEXA Workshops [1], pages 23-28.
    • DEXA Workshops , Issue.1 , pp. 23-28
    • Weninger, T.1    Hsu, W.H.2
  • 33
    • 77952370025 scopus 로고    scopus 로고
    • Eliminating noisy information in web pages for data mining
    • ACM
    • L. Yi, B. Liu, and X. Li. Eliminating noisy information in web pages for data mining. In KDD, pages 296-305. ACM, 2003.
    • (2003) KDD , pp. 296-305
    • Yi, L.1    Liu, B.2    Li, X.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.