메뉴 건너뛰기




Volumn 58, Issue 2, 2006, Pages 129-158

Information extraction from structured documents using k-testable tree automaton inference

Author keywords

Information extraction; Machine learning; Tree automata; Wrapper induction

Indexed keywords

AUTOMATA THEORY; DATA STRUCTURES; HTML; LEARNING SYSTEMS; TREES (MATHEMATICS); XML;

EID: 33747058044     PISSN: 0169023X     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.datak.2005.05.002     Document Type: Article
Times cited : (27)

References (57)
  • 1
    • 0000710299 scopus 로고
    • Queries and concept learning
    • Angluin D. Queries and concept learning. Machine Learning 2 4 (1988) 319-342
    • (1988) Machine Learning , vol.2 , Issue.4 , pp. 319-342
    • Angluin, D.1
  • 2
    • 0020815483 scopus 로고
    • Inductive inference: Theory and methods
    • Angluin D., and Smith C.H. Inductive inference: Theory and methods. ACM Computing Surveys 15 3 (1983) 237-269
    • (1983) ACM Computing Surveys , vol.15 , Issue.3 , pp. 237-269
    • Angluin, D.1    Smith, C.H.2
  • 4
    • 84944318551 scopus 로고    scopus 로고
    • R. Baumgartner, S. Flesca, G. Gottlob, Visual web information extraction with lixto, in: Proceedings of 27th international conference on very large data bases (VLDB 2001), 2001, pp. 119-128.
  • 5
    • 33747058203 scopus 로고    scopus 로고
    • F. Bry, S. Schaffert, Towards a declarative query and transformation language for XML and semistructured data: Simulation unification, in: Proceedings of the international conference on logic programming, 2002.
  • 6
    • 0031360033 scopus 로고    scopus 로고
    • Empirical methods in information extraction
    • Cardie C. Empirical methods in information extraction. AI Magazine 18 4 (1997) 65-79
    • (1997) AI Magazine , vol.18 , Issue.4 , pp. 65-79
    • Cardie, C.1
  • 7
    • 33846893464 scopus 로고    scopus 로고
    • R.C. Carrasco, J. Oncina, J. Calera-Rubio, Stochastic inference of regular tree languages, in: Proceedings of the 3rd International Colloquium on Grammatical Inference, Lecture Notes on Articial Intelligence, vol. 1433, 1998, pp. 187-198.
  • 8
    • 85042021254 scopus 로고    scopus 로고
    • C.-H. Chang, S.-C. Lui, IEPAD: Information extraction based on pattern discovery, in: Proceedings of the Tenth International Conference on World Wide Web, 2001, pp. 681-688.
  • 9
    • 33747032245 scopus 로고    scopus 로고
    • S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, J. Widom, The TSIMMIS project: Integration of heterogeneous information sources, in: Proceedings of the 10th Meeting of the Information Processing Society of Japan, 1994, pp. 7-18.
  • 10
    • 84974668178 scopus 로고    scopus 로고
    • B. Chidlovskii, J. Ragetli, M. de Rijke, Wrapper generation via grammar induction, in: 11th European Conference on Machine Learning, ECML'00, 2000, pp. 96-108.
  • 11
    • 0034172470 scopus 로고    scopus 로고
    • Whirl: A word-based information representation language
    • Cohen W. Whirl: A word-based information representation language. Artificial Intelligence 118 (2000) 163-196
    • (2000) Artificial Intelligence , vol.118 , pp. 163-196
    • Cohen, W.1
  • 12
    • 77953046656 scopus 로고    scopus 로고
    • W. Cohen, M. Hurst, L.S. Jensen, A flexible learning system for wrapping tables and lists in HTML documents, in: The Eleventh International World Wide Web Conference (WWW2002), 2002.
  • 13
    • 0032596650 scopus 로고    scopus 로고
    • W.W. Cohen, Recognizing structure in web pages using similarity queries, in: Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, 1999, pp. 59-66.
  • 14
    • 33747062377 scopus 로고    scopus 로고
    • H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, S. Tison, M. Tommasi, Tree Automata Techniques and Applications, Available from: , 1999.
  • 16
    • 33747062083 scopus 로고    scopus 로고
    • D. Freitag, Using grammatical inference to improve precision in information extraction, in: ICML-97 Workshop on Automata Induction, Grammatical Inference, and Language Acquisition, 1997.
  • 17
    • 0031643563 scopus 로고    scopus 로고
    • D. Freitag, Information extraction from HTML: Application of a general learning approach, in: Proceedings of the Fifteenth Conference on Artificial Intelligence AAAI-98, 1998. pp. 517-523.
  • 18
    • 0033907729 scopus 로고    scopus 로고
    • Machine learning for information extraction in informal domains
    • Freitag D. Machine learning for information extraction in informal domains. Machine Learning 39 2/3 (2000) 169-202
    • (2000) Machine Learning , vol.39 , Issue.2-3 , pp. 169-202
    • Freitag, D.1
  • 20
    • 33747052809 scopus 로고    scopus 로고
    • D. Freitag, A. McCallum, Information extraction with HMMs and shrinkage, in: AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.
  • 21
    • 33747065486 scopus 로고    scopus 로고
    • P. Garcia, Learning k-testable tree sets from positive data, Technical Report DSIC-ii-1993-46, DSIC, Universidad Politecnica de Valencia, 1993.
  • 22
    • 49949150022 scopus 로고
    • Language identification in the limit
    • Gold E.M. Language identification in the limit. Information and Control 10 5 (1967) 447-474
    • (1967) Information and Control , vol.10 , Issue.5 , pp. 447-474
    • Gold, E.M.1
  • 23
    • 0036041423 scopus 로고    scopus 로고
    • G. Gottlob, K. Koch, Monadic datalog over trees and the expressive power of languages for web information extraction, in: 21st ACM Symposium on Principles of Database Systems, 2002, pp. 17-28.
  • 24
    • 33747033649 scopus 로고    scopus 로고
    • J. Hammer, H. Garcia-Molina, J. Cho, A. Crespo, R. Aranha, Extracting semistructured information from the Web, in: Proceedings of the Workshop on Management of Semistructured Data, 1997, pp. 18-25.
  • 25
    • 49049094941 scopus 로고    scopus 로고
    • A. Hemnani, S. Bressan, Information extraction-tree alignment approach to pattern discovery in web documents, in: Database and Expert Systems Applications, 13th International Conference, DEXA 2002, 2002, pp. 789-798.
  • 26
    • 33747054149 scopus 로고    scopus 로고
    • C.-N. Hsu, C.-C. Chang, Finite-state transducers for semi-structured text mining, in: Proceedings of IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications, 1999.
  • 27
    • 0032309862 scopus 로고    scopus 로고
    • Generating finite-state transducers for semi-structured data extraction from the web
    • Hsu C.-N., and Dung M.-T. Generating finite-state transducers for semi-structured data extraction from the web. Information Systems 23 8 (1998) 521-538
    • (1998) Information Systems , vol.23 , Issue.8 , pp. 521-538
    • Hsu, C.-N.1    Dung, M.-T.2
  • 28
    • 84959050589 scopus 로고    scopus 로고
    • G. Huck, P. Fankhauser, K. Aberer, E.J. Neuhold, Jedi: Extracting and synthesizing information from the web, in: Conference on Cooperative Information Systems, 1998, pp. 32-43.
  • 30
    • 33747064902 scopus 로고    scopus 로고
    • R. Kosala, M. Bruynooghe, H. Blockeel, J. Van den Bussche, Information extraction by means of a generalized k-testable tree automata inference algorithm, in: Proceedings of the Fourth International Conference on Information Integration and Web-based Applications and Services (IIWAS), 2002, pp. 105-109.
  • 31
    • 33646411649 scopus 로고    scopus 로고
    • R. Kosala, J. Van den Bussche, M. Bruynooghe, H. Blockeel, Information extraction in structured documents using tree automata induction, in: Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2002, pp. 299-310.
  • 32
    • 33747072823 scopus 로고    scopus 로고
    • N. Kushmerick, Wrapper induction for information extraction, Ph.D. thesis, University of Washington, 1997.
  • 33
    • 0034172374 scopus 로고    scopus 로고
    • Wrapper induction: Efficiency and expressiveness
    • Kushmerick N. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence 118 (2000) 15-68
    • (2000) Artificial Intelligence , vol.118 , pp. 15-68
    • Kushmerick, N.1
  • 34
    • 33747078122 scopus 로고    scopus 로고
    • N. Kushmerick, D. Weld, R. Doorenbos, Wrapper induction for information extraction, in: Proceedings of the International Joint Conference on Artificial Intelligence IJCAI-97, 1997, pp. 729-737.
  • 38
    • 33747036897 scopus 로고    scopus 로고
    • K. Murphy, Learning finite automata, Technical Report 96-04-017, Santa Fe Institute, 1996.
  • 39
    • 33747037745 scopus 로고    scopus 로고
    • I. Muslea, Extraction patterns for information extraction tasks: A survey, in: AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.
  • 40
    • 0032684968 scopus 로고    scopus 로고
    • I. Muslea, S. Minton, C. Knoblock, A hierarchical approach to wrapper induction, in: Proceedings of the 3rd International Conference on Autonomous Agents, 1999.
  • 43
    • 0001520186 scopus 로고
    • Définition et etude des bilangages réguliers
    • Pair C., and Quere A. Définition et etude des bilangages réguliers. Information and Control 13 6 (1968) 565-593
    • (1968) Information and Control , vol.13 , Issue.6 , pp. 565-593
    • Pair, C.1    Quere, A.2
  • 45
    • 0001172265 scopus 로고
    • Learning logical definitions from relations
    • Quinlan J.R. Learning logical definitions from relations. Machine Learning 5 (1990) 239-266
    • (1990) Machine Learning , vol.5 , pp. 239-266
    • Quinlan, J.R.1
  • 46
    • 84974678448 scopus 로고    scopus 로고
    • Probabilistic k-testable tree languages
    • Proceedings of 5th International Colloquium, ICGI 2000, Lisbon (Portugal). Oliveira A. (Ed), Springer
    • Rico-Juan J., Calera-Rubio J., and Carrasco R. Probabilistic k-testable tree languages. In: Oliveira A. (Ed). Proceedings of 5th International Colloquium, ICGI 2000, Lisbon (Portugal). Lecture Notes in Computer Science vol. 1891 (2000), Springer 221-228
    • (2000) Lecture Notes in Computer Science , vol.1891 , pp. 221-228
    • Rico-Juan, J.1    Calera-Rubio, J.2    Carrasco, R.3
  • 47
    • 84949999342 scopus 로고    scopus 로고
    • A. Sahuguet, F. Azavant, Looking at the web through XML glasses, in: Proceedings of the Fourth IFCIS International Conference on Cooperative Information Systems, 1999, pp. 148-159.
  • 48
    • 0026839284 scopus 로고
    • Efficient learning of context-free grammars from positive structural examples
    • Sakakibara Y. Efficient learning of context-free grammars from positive structural examples. Information and Computation 97 1 (1992) 23-60
    • (1992) Information and Computation , vol.97 , Issue.1 , pp. 23-60
    • Sakakibara, Y.1
  • 49
    • 0031249196 scopus 로고    scopus 로고
    • Recent advances of grammatical inference
    • Sakakibara Y. Recent advances of grammatical inference. Theoretical Computer Science 185 1 (1997) 15-45
    • (1997) Theoretical Computer Science , vol.185 , Issue.1 , pp. 15-45
    • Sakakibara, Y.1
  • 50
    • 84867652833 scopus 로고    scopus 로고
    • Knowledge discovery from semistructured texts
    • Progress in Discovery Science-Final Report of the Japanese Discovery Science Project. Arikawa S., and Shinohara A. (Eds), Springer
    • Sakamoto H., Arimura H., and Arikawa S. Knowledge discovery from semistructured texts. In: Arikawa S., and Shinohara A. (Eds). Progress in Discovery Science-Final Report of the Japanese Discovery Science Project. LNAI vol. 2281 (2002), Springer 586-599
    • (2002) LNAI , vol.2281 , pp. 586-599
    • Sakamoto, H.1    Arimura, H.2    Arikawa, S.3
  • 51
    • 0032624184 scopus 로고    scopus 로고
    • Learning information extraction rules for semi-structured and free text
    • Soderland S. Learning information extraction rules for semi-structured and free text. Machine Learning 34 1-3 (1999) 233-272
    • (1999) Machine Learning , vol.34 , Issue.1-3 , pp. 233-272
    • Soderland, S.1
  • 52
    • 0016425480 scopus 로고
    • Generalizations of regular sets and their application to a study of context-free languages
    • Takahashi M. Generalizations of regular sets and their application to a study of context-free languages. Information and Control 27 (1975) 1-36
    • (1975) Information and Control , vol.27 , pp. 1-36
    • Takahashi, M.1
  • 53
    • 0021518106 scopus 로고
    • A theory of the learnable
    • Valiant L. A theory of the learnable. Communications of the ACM 27 11 (1984) 1134-1142
    • (1984) Communications of the ACM , vol.27 , Issue.11 , pp. 1134-1142
    • Valiant, L.1
  • 56
    • 33747046911 scopus 로고    scopus 로고
    • XML. Extensible markup language (XML) 1.0, second ed., W3C Recommendation 6 October 2000, Available from: , 2000.
  • 57
    • 33747083443 scopus 로고    scopus 로고
    • XQL. XQuery 1.0: An XML query language. W3C Working Draft 16 August 2002, Available from: .


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.