메뉴 건너뛰기




Volumn 64, Issue 2, 2008, Pages 491-509

Extracting lists of data records from semi-structured web pages

Author keywords

Data extraction; Data mining web based information; Web web based information systems

Indexed keywords

COMPUTER SOFTWARE; DATA MINING; DATABASE SYSTEMS; DIGITAL STORAGE; HTML; PROBLEM SOLVING;

EID: 37349086786     PISSN: 0169023X     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.datak.2007.10.002     Document Type: Article
Times cited : (61)

References (34)
  • 2
    • 38049168672 scopus 로고    scopus 로고
    • M. Álvarez, A. Pan, J. Raposo, F. Cacheda, F. Bellas, V. Carneiro, Crawling the content hidden behind web forms, in: Proceedings of the 2007 International Conference on Computational Science and Its Applications (ICCSA). Lecture Notes in Computer Science, vol. 4706, Part 2, Springer, Berlin/Heidelberg, 2007, pp. 322-333, ISSN: 0302-9743, ISBN-10: 3-540-74475-4, ISBN-13: 978-3-540-74475-7.
  • 3
    • 0344496637 scopus 로고    scopus 로고
    • A. Arasu, H. Garcia-Molina, Extracting structured data from web pages, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 2003.
  • 4
    • 37349128133 scopus 로고    scopus 로고
    • L. Arlota, V. Crescenzi, G. Mecca, P. Merialdo, Automatic annotation of data extracted from large websites, in: Proceedings of the WebDB Workshop, 2003, pp. 7-12.
  • 5
    • 84944318551 scopus 로고    scopus 로고
    • R. Baumgartner, S. Flesca, G. Gottlob, Visual web information extraction with Lixto, in: Proceedings of the 21st International Conference on Very Large DataBases (VLDB), 2001.
  • 6
    • 2442546444 scopus 로고    scopus 로고
    • J. Caverlee, L. Liu, D. Buttler, Probe, cluster, and discover: focused extraction of QA-Pagelets from the Deep Web, in: Proceedings of the 20th International Conference ICDE, 2004, pp. 103-115.
  • 8
    • 37349064740 scopus 로고    scopus 로고
    • S. Chakrabarti, Van den M. Berg, B. Dom, Focused crawling: a new approach to topic-specific web resource discovery, in: Proceedings of the Eighth International World Wide Web Conference, 1999.
  • 9
    • 85042021254 scopus 로고    scopus 로고
    • C. Chang, S. Lui, IEPAD: information extraction based on pattern discovery, in: Proceedings of 2001 International World Wide Web Conference, 2001, pp. 681-688.
  • 10
    • 37349103015 scopus 로고    scopus 로고
    • K. Chang, B. He, Z. Zhang, MetaQuerier over the Deep Web: Shallow Integration Across Holistic Sources, in: Proceedings of the VLDB Workshop on Information Integration on the Web (VLDB-IIWeb), 2004.
  • 11
    • 84944327150 scopus 로고    scopus 로고
    • V. Crescenzi, G. Mecca, P. Merialdo, ROADRUNNER: towards automatic data extraction from large web sites, in: Proceedings of the 2001 International VLDB Conference, 2001, pp. 109-118.
  • 14
    • 37349053378 scopus 로고    scopus 로고
    • J. Hammer, J. McHugh, H. Garcia-Molina, Semistructured data: the Tsimmis experience, in: Proceedings of the First East-European Symposium on Advances in Databases and Information Systems (ADBIS), 1997, pp. 1-8.
  • 15
    • 37349024612 scopus 로고    scopus 로고
    • A. Hogue, D. Karger, Thresher: automating the unwrapping of semantic content from the wold wide web, in: Proceedings of the 14th International World Wide Web Conference, 2005.
  • 16
    • 0032309862 scopus 로고    scopus 로고
    • Generating finite-state transducers for semi-structured data extraction from the web
    • Hsu C.N., and Dung M.T. Generating finite-state transducers for semi-structured data extraction from the web. Information Systems 23 8 (1998) 521-538
    • (1998) Information Systems , vol.23 , Issue.8 , pp. 521-538
    • Hsu, C.N.1    Dung, M.T.2
  • 17
    • 18844365362 scopus 로고    scopus 로고
    • HW-STALKER: a machine learning-based system for transforming QURE-Pagelets to XML
    • Kovalev V., Bhowmick S., and Madria S. HW-STALKER: a machine learning-based system for transforming QURE-Pagelets to XML. Data and Knowledge Engineering Journal 54 2 (2005) 241-276
    • (2005) Data and Knowledge Engineering Journal , vol.54 , Issue.2 , pp. 241-276
    • Kovalev, V.1    Bhowmick, S.2    Madria, S.3
  • 18
    • 35248898783 scopus 로고    scopus 로고
    • Y. Jung, J. Geller, Y. Wu, Ae S. Chun, Semantic deep web: automatic attribute extraction from the deep web data sources, in: Proceedings of the International SAC Conference, 2007, pp. 1667-1672.
  • 19
    • 0000298074 scopus 로고    scopus 로고
    • T. Kistlera, H. Marais, WebL: A programming language for the web, in: Proceedings of the Seventh International World Wide Web Conference (WWW7), 1998, pp. 259-270.
  • 20
    • 37349067027 scopus 로고    scopus 로고
    • N. Kushmerick, D.S. Weld, R.B. Doorenbos, Wrapper induction for information extraction, in: Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI), 1997, pp. 729-737.
  • 22
    • 3142742483 scopus 로고    scopus 로고
    • K. Lerman, L. Getoor, S. Minton, C. Knoblock, Using the structure of web sites for automatic segmentation of tables, in: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 2004, pp. 119-130.
  • 23
    • 0001116877 scopus 로고
    • Binary codes capable of correcting deletions, insertions, and reversals
    • Levenshtein V.I. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10 (1966) 707-710
    • (1966) Soviet Physics Doklady , vol.10 , pp. 707-710
    • Levenshtein, V.I.1
  • 24
    • 84937429789 scopus 로고    scopus 로고
    • S. Liddle, S. Yau, D. Embley, On the Automatic Extraction of Data from the Hidden Web. ER (Workshops), 2001, pp. 212-226.
  • 26
    • 37349007237 scopus 로고    scopus 로고
    • C. Notredame, Recent progresses in multiple sequence alignment: a survey, Technical report, Information Genetique et, 2002.
  • 27
    • 37349055686 scopus 로고    scopus 로고
    • A. Pan et al., Semi-automatic wrapper generation for commercial web sources, in: Proceedings of IFIP WG8.1 Conference on Engineering Inform, Systems in the Internet Context (EISIC), 2002.
  • 28
    • 37349003023 scopus 로고    scopus 로고
    • S. Raghavan, H. Garci{dotless}́a-Molina, Crawling the hidden web, in: Proceedings of the 27th International Conference on Very Large Databases (VLDB), 2001.
  • 30
    • 0343725648 scopus 로고    scopus 로고
    • Building intelligent web applications using lightweight wrappers
    • Sahuguet A., and Azavant F. Building intelligent web applications using lightweight wrappers. Data and Knowledge Engineering Journal 36 3 (2001) 283-316
    • (2001) Data and Knowledge Engineering Journal , vol.36 , Issue.3 , pp. 283-316
    • Sahuguet, A.1    Azavant, F.2
  • 31
    • 37349011090 scopus 로고    scopus 로고
    • The W3 Consortium. The Document Object Model. http://www.w3.org/DOM/.
  • 32
    • 84880476173 scopus 로고    scopus 로고
    • J. Wang, F. Lochovsky, Data extraction and label assignment for web databases, in: Proceedings of the 12th International World Wide Web Conference (WWW12), 2003.
  • 33
    • 33744810070 scopus 로고    scopus 로고
    • Y. Zhai, B. Liu, Extracting web data using instance-based learning, in: Proceedings of Web Information Systems Engineering Conference (WISE), 2005, pp. 318-331.
  • 34
    • 33750797710 scopus 로고    scopus 로고
    • Structured data extraction from the web based on partial tree alignment
    • Zhai Y., and Liu B. Structured data extraction from the web based on partial tree alignment. IEEE Transactions on Knowledge and Data Engineering 18 12 (2006) 1614-1628
    • (2006) IEEE Transactions on Knowledge and Data Engineering , vol.18 , Issue.12 , pp. 1614-1628
    • Zhai, Y.1    Liu, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.