SCOPUS 정보 검색 플랫폼

WWW'09 - Proceedings of the 18th International World Wide Web Conference

Volumn , Issue , 2009, Pages 971-980

Extracting article text from the Web with maximum subsequence segmentation

(2) Pasternack, Jeff a Roth, Dan a

a UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

Author keywords

Maximum subsequence; Page segmentation; Text extraction

Indexed keywords

EXPERT KNOWLEDGE; HTML DOCUMENTS; LEVENSHTEIN DISTANCE; LINEAR TIME; LOCAL CLASSIFIER; MAXIMUM SUBSEQUENCE; ONLINE NEWS; PAGE SEGMENTATION; PAGE STRUCTURES; SEMI-SUPERVISED ALGORITHM; TEMPLATE DETECTION; TEXT EXTRACTION; TRAINING EXAMPLE; WRAPPER INDUCTION;

GLOBAL OPTIMIZATION;

WEBSITES;

EID: 84865651487 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1526709.1526840 Document Type: Conference Paper

Times cited : (73)

References (24)

1
- 0032092761
- NoDoSE - A tool for semi-automatically extracting structured and semistructured data from text documents
- Adelberg, B. NoDoSE-a tool for semi-automatically extracting structured and semistructured data from text documents. SIGMOD 1998.
- (1998) SIGMOD
- Adelberg, B.¹

2
- 17044366983
- Accordion summarization for end-game browsing on PDAs and cellular phones
- Buyukkokten, O., Garcia-Molina, H. and Paepcke, A. Accordion summarization for end-game browsing on PDAs and cellular phones. SIGCHI 2001.
- (2001) SIGCHI
- Buyukkokten, O.¹ Garcia-Molina, H.² Paepcke, A.³

3
- 8644284554
- Extracting content structure for web pages based on visual representation
- Cai, D., Yu, S., Wen, J.R. and Ma, W.Y. Extracting content structure for web pages based on visual representation. APWeb 2003.
- (2003) APWeb
- Cai, D.¹ Yu, S.² Wen, J.R.³ Ma, W.Y.⁴

4
- 8644236286
- VIPS: A visionbased page segmentation algorithm
- Cai, D., Yu, S., Wen, J.R. and Ma, W.Y. VIPS: a Visionbased Page Segmentation Algorithm. Microsoft Technical Report (MSR-TR-2003-79),2003
- (2003) Microsoft Technical Report (MSR-TR-2003-79)
- Cai, D.¹ Yu, S.² Wen, J.R.³ Ma, W.Y.⁴

5
- 8644241107
- Block-level link analysis
- Cai, D., He, X., Wen, J.R. and Ma, W.Y. "Block-level Link Analysis", SIGIR 2004.
- (2004) SIGIR
- Cai, D.¹ He, X.² Wen, J.R.³ Ma, W.Y.⁴

6
- 8644267730
- Block-based web search
- Cai, D., Yu, S., Wen, J.R. and Ma, W.Y. "Block-based Web Search", SIGIR 2004.
- (2004) SIGIR
- Cai, D.¹ Yu, S.² Wen, J.R.³ Ma, W.Y.⁴

7
- 84865632343
- Function-based object model towards website adaptation
- Chen, J., Zhou, B., Shi, J., Zhang, H. and Fengwu, Q. Function-based object model towards website adaptation. WWW2001.
- WWW2001
- Chen, J.¹ Zhou, B.² Shi, J.³ Zhang, H.⁴ Fengwu, Q.⁵

8
- 34247382812
- Template detection for large scale search engines
- Chen, L., Ye, S. and Li, X. Template detection for large scale search engines. SAC 2006.
- (2006) SAC
- Chen, L.¹ Ye, S.² Li, X.³

9
- 84944327150
- Roadrunner: Towards automatic data extraction from large web sites
- Crescenzi, V., Mecca, G. and Merialdo, P. Roadrunner: Towards automatic data extraction from large web sites. VLDB 2001.
- (2001) VLDB
- Crescenzi, V.¹ Mecca, G.² Merialdo, P.³

10
- 0342770808
- Experience with top gun wingman: A proxy-based graphical web browser for the 3com palmpilot
- Fox, A., Goldberg, I., Gribble, S.D., Lee, D.C., Polito, A. and Brewer, E.A. Experience With Top Gun Wingman: A Proxy-Based Graphical Web Browser for the 3Com PalmPilot. Middleware 1998.
- (1998) Middleware
- Fox, A.¹ Goldberg, I.² Gribble, S.D.³ Lee, D.C.⁴ Polito, A.⁵ Brewer, E.A.⁶

11
- 0034172374
- Wrapper induction: Efficiency and expressiveness
- Kushmerick, N. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence, 118, 2000.
- (2000) Artificial Intelligence , vol.118
- Kushmerick, N.¹

12
- 0037806547
- A brief survey of web data extraction tools
- Laender, A., Ribeiro-Neto, B., Silva, A. and Teixeira, J. A Brief Survey of Web Data Extraction Tools, SIGMOD Record, Volume 31, Number 2, 2002.
- (2002) SIGMOD Record , vol.31 , Issue.2
- Laender, A.¹ Ribeiro-Neto, B.² Silva, A.³ Teixeira, J.⁴

13
- 0242456776
- Discovering informative content blocks from Web documents
- Lin, S.H. and Ho, J.M. Discovering informative content blocks from Web documents. KDD 2002.
- (2002) KDD
- Lin, S.H.¹ Ho, J.M.²

14
- 0033893885
- XWRAP: An XML-enabled wrapper construction system for Web information sources
- Liu, L., Pu, C. and Han, W. XWRAP: an XML-enabled wrapper construction system for Web information sources. ICDE 2000.
- (2000) ICDE
- Liu, L.¹ Pu, C.² Han, W.³

15
- 18744404260
- Extracting unstructured data from template generated web documents
- Ma, L., Goharian, N., Chowdhury, A. and Chung, M. Extracting unstructured data from template generated web documents. CIKM 2003.
- (2003) CIKM
- Ma, L.¹ Goharian, N.² Chowdhury, A.³ Chung, M.⁴

16
- 84938812620
- Web page cleaning with conditional random fields
- Marek, M., Pecina, P., Spousta, M. Web Page cleaning with Conditional Random Fields. WAC3 2007.
- (2007) WAC3
- Marek, M.¹ Pecina, P.² Spousta, M.³

17
- 2942661100
- Tracking and summarizing news on a daily basis with Columbia's Newsblaster
- McKeown, K.R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J.L., Nenkova, A., Sable, C., Schiffman, B. and Sigelman, S. Tracking and summarizing news on a daily basis with Columbia's Newsblaster. HLT 2002.
- (2002) HLT
- McKeown, K.R.¹ Barzilay, R.² Evans, D.³ Hatzivassiloglou, V.⁴ Klavans, J.L.⁵ Nenkova, A.⁶ Sable, C.⁷ Schiffman, B.⁸ Sigelman, S.⁹

18
- 0035587215
- Hierarchical wrapper induction for semistructured information sources
- Muslea, I., Minton, S. and Knoblock, C.A. Hierarchical Wrapper Induction for Semistructured Information Sources. Autonomous Agents and Multi-Agent Systems, 2001.
- (2001) Autonomous Agents and Multi-Agent Systems
- Muslea, I.¹ Minton, S.² Knoblock, C.A.³

19
- 84948481845
- An algorithm for suffix stripping
- Porter, M.F. An algorithm for suffix stripping. Program, vol. 14, no. 3, pp. 130-137, 1980.
- (1980) Program , vol.14 , Issue.3 , pp. 130-137
- Porter, M.F.¹

20
- 56449104958
- Learning and inference over constrained output
- Punyakanok, V., Roth, D., Yih, W. and Zimak, D. Learning and inference over constrained output. IJCAI 2005.
- (2005) IJCAI
- Punyakanok, V.¹ Roth, D.² Yih, W.³ Zimak, D.⁴

21
- 0033288922
- A linear time algorithm for finding all maximal scoring subsequences
- Ruzzo, W.L. and Tompa, M. A linear time algorithm for finding all maximal scoring subsequences. ISMB 1999.
- (1999) ISMB
- Ruzzo, W.L.¹ Tompa, M.²

22
- 84880811191
- Web page cleaning for web mining through feature weighting
- Yi, L. and Liu, B. Web Page Cleaning for Web Mining through Feature Weighting. IJCAI-03.
- IJCAI-03
- Yi, L.¹ Liu, B.²

23
- 77952370025
- Eliminating noisy information in Web pages for data mining
- Yi, L., Liu, B. and Li, X. Eliminating noisy information in Web pages for data mining. KDD 2003.
- (2003) KDD
- Yi, L.¹ Liu, B.² Li, X.³

24
- 84880475213
- Improving pseudo-relevance feedback in web information retrieval using web page segmentation
- Yu, S., Cai, D., Wen, J.R. and Ma, W.Y. "Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation", WWW2003
- WWW2003
- Yu, S.¹ Cai, D.² Wen, J.R.³ Ma, W.Y.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.