메뉴 건너뛰기




Volumn , Issue , 2009, Pages 971-980

Extracting article text from the Web with maximum subsequence segmentation

Author keywords

Maximum subsequence; Page segmentation; Text extraction

Indexed keywords

EXPERT KNOWLEDGE; HTML DOCUMENTS; LEVENSHTEIN DISTANCE; LINEAR TIME; LOCAL CLASSIFIER; MAXIMUM SUBSEQUENCE; ONLINE NEWS; PAGE SEGMENTATION; PAGE STRUCTURES; SEMI-SUPERVISED ALGORITHM; TEMPLATE DETECTION; TEXT EXTRACTION; TRAINING EXAMPLE; WRAPPER INDUCTION;

EID: 84865651487     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1526709.1526840     Document Type: Conference Paper
Times cited : (73)

References (24)
  • 1
    • 0032092761 scopus 로고    scopus 로고
    • NoDoSE - A tool for semi-automatically extracting structured and semistructured data from text documents
    • Adelberg, B. NoDoSE-a tool for semi-automatically extracting structured and semistructured data from text documents. SIGMOD 1998.
    • (1998) SIGMOD
    • Adelberg, B.1
  • 2
    • 17044366983 scopus 로고    scopus 로고
    • Accordion summarization for end-game browsing on PDAs and cellular phones
    • Buyukkokten, O., Garcia-Molina, H. and Paepcke, A. Accordion summarization for end-game browsing on PDAs and cellular phones. SIGCHI 2001.
    • (2001) SIGCHI
    • Buyukkokten, O.1    Garcia-Molina, H.2    Paepcke, A.3
  • 3
    • 8644284554 scopus 로고    scopus 로고
    • Extracting content structure for web pages based on visual representation
    • Cai, D., Yu, S., Wen, J.R. and Ma, W.Y. Extracting content structure for web pages based on visual representation. APWeb 2003.
    • (2003) APWeb
    • Cai, D.1    Yu, S.2    Wen, J.R.3    Ma, W.Y.4
  • 8
    • 34247382812 scopus 로고    scopus 로고
    • Template detection for large scale search engines
    • Chen, L., Ye, S. and Li, X. Template detection for large scale search engines. SAC 2006.
    • (2006) SAC
    • Chen, L.1    Ye, S.2    Li, X.3
  • 9
    • 84944327150 scopus 로고    scopus 로고
    • Roadrunner: Towards automatic data extraction from large web sites
    • Crescenzi, V., Mecca, G. and Merialdo, P. Roadrunner: Towards automatic data extraction from large web sites. VLDB 2001.
    • (2001) VLDB
    • Crescenzi, V.1    Mecca, G.2    Merialdo, P.3
  • 10
    • 0342770808 scopus 로고    scopus 로고
    • Experience with top gun wingman: A proxy-based graphical web browser for the 3com palmpilot
    • Fox, A., Goldberg, I., Gribble, S.D., Lee, D.C., Polito, A. and Brewer, E.A. Experience With Top Gun Wingman: A Proxy-Based Graphical Web Browser for the 3Com PalmPilot. Middleware 1998.
    • (1998) Middleware
    • Fox, A.1    Goldberg, I.2    Gribble, S.D.3    Lee, D.C.4    Polito, A.5    Brewer, E.A.6
  • 11
    • 0034172374 scopus 로고    scopus 로고
    • Wrapper induction: Efficiency and expressiveness
    • Kushmerick, N. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence, 118, 2000.
    • (2000) Artificial Intelligence , vol.118
    • Kushmerick, N.1
  • 13
    • 0242456776 scopus 로고    scopus 로고
    • Discovering informative content blocks from Web documents
    • Lin, S.H. and Ho, J.M. Discovering informative content blocks from Web documents. KDD 2002.
    • (2002) KDD
    • Lin, S.H.1    Ho, J.M.2
  • 14
    • 0033893885 scopus 로고    scopus 로고
    • XWRAP: An XML-enabled wrapper construction system for Web information sources
    • Liu, L., Pu, C. and Han, W. XWRAP: an XML-enabled wrapper construction system for Web information sources. ICDE 2000.
    • (2000) ICDE
    • Liu, L.1    Pu, C.2    Han, W.3
  • 15
    • 18744404260 scopus 로고    scopus 로고
    • Extracting unstructured data from template generated web documents
    • Ma, L., Goharian, N., Chowdhury, A. and Chung, M. Extracting unstructured data from template generated web documents. CIKM 2003.
    • (2003) CIKM
    • Ma, L.1    Goharian, N.2    Chowdhury, A.3    Chung, M.4
  • 16
    • 84938812620 scopus 로고    scopus 로고
    • Web page cleaning with conditional random fields
    • Marek, M., Pecina, P., Spousta, M. Web Page cleaning with Conditional Random Fields. WAC3 2007.
    • (2007) WAC3
    • Marek, M.1    Pecina, P.2    Spousta, M.3
  • 19
    • 84948481845 scopus 로고
    • An algorithm for suffix stripping
    • Porter, M.F. An algorithm for suffix stripping. Program, vol. 14, no. 3, pp. 130-137, 1980.
    • (1980) Program , vol.14 , Issue.3 , pp. 130-137
    • Porter, M.F.1
  • 20
    • 56449104958 scopus 로고    scopus 로고
    • Learning and inference over constrained output
    • Punyakanok, V., Roth, D., Yih, W. and Zimak, D. Learning and inference over constrained output. IJCAI 2005.
    • (2005) IJCAI
    • Punyakanok, V.1    Roth, D.2    Yih, W.3    Zimak, D.4
  • 21
    • 0033288922 scopus 로고    scopus 로고
    • A linear time algorithm for finding all maximal scoring subsequences
    • Ruzzo, W.L. and Tompa, M. A linear time algorithm for finding all maximal scoring subsequences. ISMB 1999.
    • (1999) ISMB
    • Ruzzo, W.L.1    Tompa, M.2
  • 22
    • 84880811191 scopus 로고    scopus 로고
    • Web page cleaning for web mining through feature weighting
    • Yi, L. and Liu, B. Web Page Cleaning for Web Mining through Feature Weighting. IJCAI-03.
    • IJCAI-03
    • Yi, L.1    Liu, B.2
  • 23
    • 77952370025 scopus 로고    scopus 로고
    • Eliminating noisy information in Web pages for data mining
    • Yi, L., Liu, B. and Li, X. Eliminating noisy information in Web pages for data mining. KDD 2003.
    • (2003) KDD
    • Yi, L.1    Liu, B.2    Li, X.3
  • 24
    • 84880475213 scopus 로고    scopus 로고
    • Improving pseudo-relevance feedback in web information retrieval using web page segmentation
    • Yu, S., Cai, D., Wen, J.R. and Ma, W.Y. "Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation", WWW2003
    • WWW2003
    • Yu, S.1    Cai, D.2    Wen, J.R.3    Ma, W.Y.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.