SCOPUS 정보 검색 플랫폼

Volumn , Issue , 2013, Pages 2267-2272

URL tree: Efficient unsupervised content extraction from streams of web documents

Author keywords

Boilerplate removal; Content extraction; Stream data; Unsupervised learning; Web content

Indexed keywords

CONTENT EXTRACTION; EVALUATION RESULTS; HTML DOCUMENTS; OPEN SOURCES; STREAM DATA; STREAM-BASED; WEB CONTENT; WEB DOCUMENT;

ALGORITHMS; FORESTRY; HTML; KNOWLEDGE MANAGEMENT; TREES (MATHEMATICS); UNSUPERVISED LEARNING; WEBSITES;

INFORMATION RETRIEVAL SYSTEMS;

ALGORITHMS; FORESTRY; INFORMATION RETRIEVAL; INFORMATION SYSTEMS; MATHEMATICS; TREES;

EID: 84889610613 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2505515.2505654 Document Type: Conference Paper

Times cited : (11)

References (8)

1
- 77953052174
- Template detection via data mining and its applications
- Z. Bar-Yossef and S. Rajagopalan. Template detection via data mining and its applications. In Proc. of the Int. Conf. on World Wide Web, pages 580-591, 2002.
- (2002) Proc. of the Int. Conf. on World Wide Web , pp. 580-591
- Bar-Yossef, Z.¹ Rajagopalan, S.²

2
- 33751046629
- Template detection for large scale search engines
- L. Chen, S. Ye, and X. Li. Template detection for large scale search engines. In Proc. of the A CM Symposium on Applied Computing, pages 1094-1098, 2006.
- (2006) Proc. of the a CM Symposium on Applied Computing , pp. 1094-1098
- Chen, L.¹ Ye, S.² Li, X.³

3
- 84869471345
- A lightweight and efficient tool for cleaning web pages
- S. Evert. A lightweight and efficient tool for cleaning web pages. In Proc. of the Int. Conf. on Language Resources and Evaluation (LREC), 2008.
- (2008) Proc. of the Int. Conf. on Language Resources and Evaluation (LREC)
- Evert, S.¹

4
- 77950904942
- Boilerplate detection using shallow text features
- C. Kohlschütter, P. Fankhauser, and W. Nejdl. Boilerplate detection using shallow text features. In Proc. of the ACM Int. Conf. on Web Search and Data Mining, pages 441-450, 2010.
- (2010) Proc. of the ACM Int. Conf. on Web Search and Data Mining , pp. 441-450
- Kohlschütter, C.¹ Fankhauser, P.² Nejdl, W.³

5
- 84889593694
- Bachelor's Thesis, Faculty of Computer and Information Science, University of Ljubljana, Slovenia
- T. Kovacic. Evaluating Web Content Extraction Algorithms. Bachelor's Thesis, Faculty of Computer and Information Science, University of Ljubljana, Slovenia, 2012.
- (2012) Evaluating Web Content Extraction Algorithms
- Kovacic, T.¹

6
- 0242456776
- Discovering informative content blocks from web documents
- S.-H. Lin and J.-M. Ho. Discovering informative content blocks from web documents. In Proc. of the A CM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 588-593, 2002.
- (2002) Proc. of the a CM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining , pp. 588-593
- Lin, S.-H.¹ Ho, J.-M.²

7
- 84889592306
- PhD thesis, Faculty of Informatics, Masaryk University, Brno, Czech Republic
- J. Pomikalek. Removing Boilerplate and Duplicate Content from Web Corpora. PhD thesis, Faculty of Informatics, Masaryk University, Brno, Czech Republic, 2011.
- (2011) Removing Boilerplate and Duplicate Content from Web Corpora
- Pomikalek, J.¹

8
- 34547631600
- A fast and robust method for web page template detection and removal
- K. Vieira, A. S. da Silva, N. Pinto, E. S. de Moura, J. a. M. B. Cavalcanti, and J. Freire. A fast and robust method for web page template detection and removal. In Proc. of the ACM Int. Conf. on Information and Knowledge Management, pages 258-267, 2006.
- (2006) Proc. of the ACM Int. Conf. on Information and Knowledge Management , pp. 258-267
- Vieira, K.¹ Da, A.S.² Pinto, S.N.³ De Moura, E.S.⁴ Cavalcanti, J.A.M.B.⁵ Freire, J.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.