메뉴 건너뛰기




Volumn 35, Issue 1, 2003, Pages 129-147

Automatic information extraction from semi-structured Web pages by pattern discovery

Author keywords

Information extraction; Multiple string alignment; PAT trees; Semi structured data; Wrapper generation

Indexed keywords

DATA STRUCTURES; INFORMATION RETRIEVAL; QUERY LANGUAGES; TREES (MATHEMATICS);

EID: 0037375290     PISSN: 01679236     EISSN: None     Source Type: Journal    
DOI: 10.1016/S0167-9236(02)00100-8     Document Type: Article
Times cited : (90)

References (30)
  • 3
    • 0003945547 scopus 로고    scopus 로고
    • July
    • BrightPlanet.com LLC. The deep web: surfacing hidden value, http://www.completeplanet.com/tutorials/deepweb/index.asp July, 2000.
    • (2000) The deep web: Surfacing hidden value
  • 4
    • 2342568689 scopus 로고    scopus 로고
    • IEPAD: Information extraction based on pattern discovery
    • Proceedings of the 10th International Conference on World Wide Web, Hong-Kong, Springer
    • Chang C.-H., Lui S.-C. IEPAD: information extraction based on pattern discovery. Proceedings of the 10th International Conference on World Wide Web, Hong-Kong. Lecture Notes in Artificial Intelligence. vol. 2336:2001;223-231 Springer.
    • (2001) Lecture Notes in Artificial Intelligence , vol.2336 , pp. 223-231
    • Chang, C.-H.1    Lui, S.-C.2
  • 8
    • 23044520054 scopus 로고    scopus 로고
    • Automatic wrapper generation for web search engines
    • Proceedings of the 1st International Conference on Web-Age Information Management (WAIM'2000), Shanghai, China
    • Chidlovskii B., Ragetli J., Rijke M. Automatic wrapper generation for web search engines. Proceedings of the 1st International Conference on Web-Age Information Management (WAIM'2000), LNCS Series, Shanghai, China. 2000.
    • (2000) LNCS Series
    • Chidlovskii, B.1    Ragetli, J.2    Rijke, M.3
  • 14
    • 84947290129 scopus 로고    scopus 로고
    • In search of the lost schema
    • Database Theory-ICDT '99, 7th International Conference, Proceedings, C. Beeri, & P. Buneman. Jerusalem, Israel: Springer
    • Grumbach S., Mecca G. In search of the lost schema. Beeri C., Buneman P. Database Theory-ICDT '99, 7th International Conference, Proceedings. Lecture Notes in Computer Science. vol. 1540:1999;314-331 Springer, Jerusalem, Israel.
    • (1999) Lecture Notes in Computer Science , vol.1540 , pp. 314-331
    • Grumbach, S.1    Mecca, G.2
  • 16
    • 0032309862 scopus 로고    scopus 로고
    • Generating finite-state transducers for semi-structured data extraction from the web
    • Hsu C.-N., Dung M.-T. Generating finite-state transducers for semi-structured data extraction from the web. Information Systems. 23(8):1998;521-538.
    • (1998) Information Systems , vol.23 , Issue.8 , pp. 521-538
    • Hsu, C.-N.1    Dung, M.-T.2
  • 20
    • 0033066718 scopus 로고    scopus 로고
    • REPuter: Fast computation of maximal repeats in complete genomes
    • Kurtz S., Schleiermacher C. REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics. 15(5):1999;426-427.
    • (1999) Bioinformatics , vol.15 , Issue.5 , pp. 426-427
    • Kurtz, S.1    Schleiermacher, C.2
  • 21
    • 0002516969 scopus 로고    scopus 로고
    • Gleaning the web
    • March/April
    • Kushmerick N. Gleaning the web. IEEE Intelligent Systems. 14(2):March/April 1999;20-22.
    • (1999) IEEE Intelligent Systems , vol.14 , Issue.2 , pp. 20-22
    • Kushmerick, N.1
  • 23
    • 38149018071 scopus 로고
    • PATRICIA - Practical algorithm to retrieve information coded in alphanumeric
    • Jan
    • Morrison D.R. PATRICIA - practical algorithm to retrieve information coded in alphanumeric. Journal of ACM. 15(4):Jan 1968;514-534.
    • (1968) Journal of ACM , vol.15 , Issue.4 , pp. 514-534
    • Morrison, D.R.1
  • 26
    • 84859139671 scopus 로고
    • Multi-engine search and comparison using the metacrawler
    • Boston, USA
    • Selberg E., Etzioni O. Multi-Engine Search and Comparison Using the MetaCrawler. Proc. of the Fourth Intl. WWW Conference, Boston, USA. 1995;. http://www.w3.org/Conferences/wwww4/.
    • (1995) Proc. of the Fourth Intl. WWW Conference
    • Selberg, E.1    Etzioni, O.2
  • 27
    • 0032624184 scopus 로고    scopus 로고
    • Learning information extraction rules for semi-structured and free text
    • Soderland S. Learning information extraction rules for semi-structured and free text. Machine Learning. 34(1-3):1996;233-272.
    • (1996) Machine Learning , vol.34 , Issue.1-3 , pp. 233-272
    • Soderland, S.1
  • 28
    • 0004135793 scopus 로고    scopus 로고
    • Extensible markup language (XML)
    • The World-Wide Web Consortium (W3C), Extensible markup language (XML), http://www.w3.org/XML/, 1997.
    • (1997) The World-Wide Web Consortium (W3C)
  • 29
    • 0012213309 scopus 로고    scopus 로고
    • Searching the deep web-directed Query engine applications at the department of energy
    • Jan
    • Warnick W.L., Lederman A., Scott R.L. et al. Searching the deep web-directed Query engine applications at the department of energy. DLib Magazine. 7(1):Jan 2001.
    • (2001) DLib Magazine , vol.7 , Issue.1
    • Warnick, W.L.1    Lederman, A.2    Scott, R.L.3
  • 30
    • 0012213310 scopus 로고    scopus 로고
    • Web Design Group, Wilbur-HTML 3.2, http://www.htmlhelp.com/reference/wilbur/, 1997.
    • (1997) Wilbur-HTML 3.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.