메뉴 건너뛰기




Volumn 69, Issue 2, 2010, Pages 169-196

Information extraction for search engines using fast heuristic techniques

Author keywords

Automatic wrapper; Information extraction; Search engine

Indexed keywords

AUTOMATIC WRAPPER; DATA ALIGNMENTS; DATA EXTRACTION; DATA ITEMS; DATA MERGING; DATA RECORDS; DATA REGIONS; FILTERING RULES; FREQUENCY MEASURES; HEURISTIC TECHNIQUES; INFORMATION EXTRACTION; META SEARCH ENGINES; NOVEL TECHNIQUES; NUMBER AND SIZE; PARTITIONING METHODS; STATE OF THE ART; TREE MATCHING ALGORITHM; VISUAL CUES; WEB PAGE;

EID: 72649103406     PISSN: 0169023X     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.datak.2009.10.002     Document Type: Article
Times cited : (42)

References (58)
  • 4
    • 0023586274 scopus 로고
    • The longest common subsequence problem revisited
    • Apostolico A., and Guerra C. The longest common subsequence problem revisited. Algorithmica 2 (1987)
    • (1987) Algorithmica , vol.2
    • Apostolico, A.1    Guerra, C.2
  • 5
    • 0343725648 scopus 로고    scopus 로고
    • Building intelligent web applications using lightweight wrappers
    • Sahuguet A., and Azavant F. Building intelligent web applications using lightweight wrappers. Data Knowl. Eng. 36 (2001) 283-316
    • (2001) Data Knowl. Eng. , vol.36 , pp. 283-316
    • Sahuguet, A.1    Azavant, F.2
  • 8
    • 33744786780 scopus 로고    scopus 로고
    • NET - a system for extracting web data from flat and nested data records
    • WISE
    • Bing Liu, Yanhong Zhai, NET - a system for extracting web data from flat and nested data records, in: Web Information Systems Engineering - WISE 2005, 2005, pp. 487-495.
    • (2005) Web Information Systems Engineering , pp. 487-495
    • Liu, B.1    Zhai, Y.2
  • 9
    • 0032092761 scopus 로고    scopus 로고
    • NoDoSE{minus 45 degree rule}-a tool for semi-automatically extracting structured and semistructured data from text documents
    • ACM, Washington, United States
    • Brad Adelberg, NoDoSE{minus 45 degree rule}-a tool for semi-automatically extracting structured and semistructured data from text documents, in: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data Seattle, ACM, Washington, United States, 1998.
    • (1998) Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data Seattle
    • Adelberg, B.1
  • 12
    • 10944234975 scopus 로고    scopus 로고
    • Olera: semisupervised Web-data extraction with visual support
    • Chang C.-H., and Kuo S.-C. Olera: semisupervised Web-data extraction with visual support. IEEE Intell. Syst. 19 (2004) 56-64
    • (2004) IEEE Intell. Syst. , vol.19 , pp. 56-64
    • Chang, C.-H.1    Kuo, S.-C.2
  • 13
    • 0032309862 scopus 로고    scopus 로고
    • Generating finite-state transducers for semi-structured data extraction from the Web
    • Hsu C.-N., and Dung M.-T. Generating finite-state transducers for semi-structured data extraction from the Web. Inf. Syst. 23 (1998) 521-538
    • (1998) Inf. Syst. , vol.23 , pp. 521-538
    • Hsu, C.-N.1    Dung, M.-T.2
  • 14
    • 67349276460 scopus 로고    scopus 로고
    • Automatic hidden-web table interpretation, conceptualization, and semantic annotation
    • Tao C., and Embley D.W. Automatic hidden-web table interpretation, conceptualization, and semantic annotation. Data Knowl. Eng. 68 (2009) 683-703
    • (2009) Data Knowl. Eng. , vol.68 , pp. 683-703
    • Tao, C.1    Embley, D.W.2
  • 23
    • 0345566149 scopus 로고    scopus 로고
    • A guided tour to approximate string matching
    • Navarro G. A guided tour to approximate string matching. ACM Comput. Surv. 33 (2001) 31-88
    • (2001) ACM Comput. Surv. , vol.33 , pp. 31-88
    • Navarro, G.1
  • 28
    • 0742268832 scopus 로고    scopus 로고
    • Mining web informative structures and contents based on entropy analysis
    • Kao H.-Y., Lin S.-H., Ho J.-M., and Chen M.-S. Mining web informative structures and contents based on entropy analysis. IEEE T. Knowl. Data Eng. 16 (2004) 41-55
    • (2004) IEEE T. Knowl. Data Eng. , vol.16 , pp. 41-55
    • Kao, H.-Y.1    Lin, S.-H.2    Ho, J.-M.3    Chen, M.-S.4
  • 35
    • 0018491659 scopus 로고
    • The tree-to-tree correction problem
    • Tai K.-C. The tree-to-tree correction problem. J. ACM 26 (1979) 422-433
    • (1979) J. ACM , vol.26 , pp. 422-433
    • Tai, K.-C.1
  • 37
    • 47949107901 scopus 로고    scopus 로고
    • Longzhuang Li, Yonghuai Liu, Abel Obregon, Matt Weatherston, Visual segmentation-based data record extraction from web documents, in: IEEE International Conference on Information Reuse and Integration, IRI 2007, 2007, pp. 502-507.
    • Longzhuang Li, Yonghuai Liu, Abel Obregon, Matt Weatherston, Visual segmentation-based data record extraction from web documents, in: IEEE International Conference on Information Reuse and Integration, IRI 2007, 2007, pp. 502-507.
  • 38
    • 37349086786 scopus 로고    scopus 로고
    • Extracting lists of data records from semi-structured web pages
    • lvarez M., Pan A., Raposo J., Bellas F., and Cacheda F. Extracting lists of data records from semi-structured web pages. Data Knowl. Eng. 64 (2008) 491-509
    • (2008) Data Knowl. Eng. , vol.64 , pp. 491-509
    • lvarez, M.1    Pan, A.2    Raposo, J.3    Bellas, F.4    Cacheda, F.5
  • 40
    • 34250196851 scopus 로고    scopus 로고
    • Integration of association rules and ontologies for semantic query expansion
    • Song M., Song I.-Y., Hu X., and Allen R.B. Integration of association rules and ontologies for semantic query expansion. Data Knowl. Eng. 63 (2007) 63-75
    • (2007) Data Knowl. Eng. , vol.63 , pp. 63-75
    • Song, M.1    Song, I.-Y.2    Hu, X.3    Allen, R.B.4
  • 42
    • 84976669911 scopus 로고
    • Algorithms for string searching
    • Baeza-Yates R.A. Algorithms for string searching. SIGIR Forum 23 (1989) 34-58
    • (1989) SIGIR Forum , vol.23 , pp. 34-58
    • Baeza-Yates, R.A.1
  • 43
    • 0014757386 scopus 로고
    • A general method applicable to the search for similarities in the amino acid sequences of two proteins
    • Needleman S.B., and Wünsch C.D. A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. (1970)
    • (1970) J. Mol. Biol.
    • Needleman, S.B.1    Wünsch, C.D.2
  • 45
    • 0032307936 scopus 로고    scopus 로고
    • Grammars have exceptions
    • Crescenzi V., and Mecca G. Grammars have exceptions. Inf. Syst. 23 (1998) 539-565
    • (1998) Inf. Syst. , vol.23 , pp. 539-565
    • Crescenzi, V.1    Mecca, G.2
  • 47
    • 0001116877 scopus 로고
    • Binary codes capable of correcting deletions, insertions, and reversals
    • Levenshtein V.I. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10 (1966) 707
    • (1966) Soviet Physics Doklady , vol.10 , pp. 707
    • Levenshtein, V.I.1
  • 49
    • 72649105417 scopus 로고    scopus 로고
    • Wei Liu, Xiaofeng Meng, Weiyi Meng, ViDE: a vision-based approach for deep web data extraction, IEEE T. Knowl. Data Eng. (2009).
    • Wei Liu, Xiaofeng Meng, Weiyi Meng, ViDE: a vision-based approach for deep web data extraction, IEEE T. Knowl. Data Eng. (2009).
  • 51
    • 0026185673 scopus 로고
    • Identifying syntactic differences between two programs
    • Yang W. Identifying syntactic differences between two programs. Softw. Pract. Exper. 21 (1991) 739-755
    • (1991) Softw. Pract. Exper. , vol.21 , pp. 739-755
    • Yang, W.1
  • 53
    • 33750797710 scopus 로고    scopus 로고
    • Structured data extraction from the web based on partial tree alignment
    • Zhai Y., and Liu B. Structured data extraction from the web based on partial tree alignment. IEEE T. Knowl. Data Eng. 18 (2006) 1614-1628
    • (2006) IEEE T. Knowl. Data Eng. , vol.18 , pp. 1614-1628
    • Zhai, Y.1    Liu, B.2
  • 54
    • 34247869740 scopus 로고    scopus 로고
    • Extracting web data using instance-based learning
    • Zhai Y., and Liu B. Extracting web data using instance-based learning. World Wide Web 10 (2007) 113-132
    • (2007) World Wide Web , vol.10 , pp. 113-132
    • Zhai, Y.1    Liu, B.2
  • 56
    • 72649098560 scopus 로고    scopus 로고
    • Yuan Kui Shen, Automatic record extraction for the World Wide Web, in: Department of Electrical Engineering and Computer Science, Master of Science in Computer Science and Engineering, MIT, 2006.
    • Yuan Kui Shen, Automatic record extraction for the World Wide Web, in: Department of Electrical Engineering and Computer Science, Master of Science in Computer Science and Engineering, MIT, 2006.
  • 58
    • 72649102832 scopus 로고    scopus 로고
    • http://en.wikipedia.org/wiki/Web_template.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.