메뉴 건너뛰기




Volumn 68, Issue 1, 2009, Pages 107-125

An unsupervised method for joint information extraction and feature mining across different Web sites

Author keywords

Graphical models; Machine learning; Text mining; Web mining

Indexed keywords

GRAPHIC METHODS; INFERENCE ENGINES; INFORMATION ANALYSIS; LEARNING ALGORITHMS; LEARNING SYSTEMS; MINING; SPEECH RECOGNITION; WEBSITES; WORLD WIDE WEB;

EID: 56249148476     PISSN: 0169023X     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.datak.2008.08.009     Document Type: Article
Times cited : (30)

References (49)
  • 1
    • 12244298488 scopus 로고    scopus 로고
    • E. Agichtein, V. Ganti, Mining reference tables for automatic text segmentation, in: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2004, pp. 20-29.
    • E. Agichtein, V. Ganti, Mining reference tables for automatic text segmentation, in: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2004, pp. 20-29.
  • 2
    • 84880902141 scopus 로고    scopus 로고
    • M. Banko, M. Cafarella, S. Soderland, M. Broadhead, O. Etzioni, Open information extraction from the web, in: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), 2007, pp. 2670-2676.
    • M. Banko, M. Cafarella, S. Soderland, M. Broadhead, O. Etzioni, Open information extraction from the web, in: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), 2007, pp. 2670-2676.
  • 3
    • 56249130480 scopus 로고    scopus 로고
    • R. Bunescu, R. Mooney, Collective information extraction with relational markov networks, in: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 2004, pp. 439-446.
    • R. Bunescu, R. Mooney, Collective information extraction with relational markov networks, in: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 2004, pp. 439-446.
  • 4
    • 85042021254 scopus 로고    scopus 로고
    • C. Chang, S.C. Lui, IEPAD: information extraction based on pattern discovery, in: Proceedings of the 10th International Conference on World Wide Web (WWW), 2001, pp. 681-688.
    • C. Chang, S.C. Lui, IEPAD: information extraction based on pattern discovery, in: Proceedings of the 10th International Conference on World Wide Web (WWW), 2001, pp. 681-688.
  • 6
    • 8644243246 scopus 로고    scopus 로고
    • S. Chapman, A. Dingli, F. Ciravegna, Armadillo: harvesting information for the semantic web, in: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2004, p. 598.
    • S. Chapman, A. Dingli, F. Ciravegna, Armadillo: harvesting information for the semantic web, in: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2004, p. 598.
  • 7
    • 85011016482 scopus 로고    scopus 로고
    • S.-L. Chuang, K. Chang, C. Zhai, Context-aware wrapping: synchronized data extraction, in: Proceedings of the 33rd Very Large Databases Conference (VLDB), 2007, pp. 699-710.
    • S.-L. Chuang, K. Chang, C. Zhai, Context-aware wrapping: synchronized data extraction, in: Proceedings of the 33rd Very Large Databases Conference (VLDB), 2007, pp. 699-710.
  • 8
    • 84880859303 scopus 로고    scopus 로고
    • 2 an adaptive algorithm for information extraction from web-related texts, in: Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI), 2001, pp. 1251-1256.
    • 2 an adaptive algorithm for information extraction from web-related texts, in: Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI), 2001, pp. 1251-1256.
  • 9
    • 77953046656 scopus 로고    scopus 로고
    • W. Cohen, M. Hurst, L. Jensen, A flexible learning system for wrapping tables and lists in HTML documents, in: Proceedings of the 11th International World Wide Web Conference (WWW), 2002, pp. 232-241.
    • W. Cohen, M. Hurst, L. Jensen, A flexible learning system for wrapping tables and lists in HTML documents, in: Proceedings of the 11th International World Wide Web Conference (WWW), 2002, pp. 232-241.
  • 10
    • 12344333240 scopus 로고    scopus 로고
    • Automatic information extraction from large websites
    • Crescenzi V., and Mecca G. Automatic information extraction from large websites. Journal of the ACM 51 5 (2004) 731-779
    • (2004) Journal of the ACM , vol.51 , Issue.5 , pp. 731-779
    • Crescenzi, V.1    Mecca, G.2
  • 11
    • 84944327150 scopus 로고    scopus 로고
    • V. Crescenzi, G. Mecca, P. Merialdo, ROADRUNNER: towards automatic data extraction from large web sites, in: Proceedings of the 27th Very Large Databases Conference (VLDB), 2001, pp. 109-118.
    • V. Crescenzi, G. Mecca, P. Merialdo, ROADRUNNER: towards automatic data extraction from large web sites, in: Proceedings of the 27th Very Large Databases Conference (VLDB), 2001, pp. 109-118.
  • 13
    • 84858373635 scopus 로고    scopus 로고
    • A. Culotta, A. McCallum, J. Betz, Integrating probabilistic extraction models and data mining to discover relations and patterns in text, in: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006, pp. 296-303.
    • A. Culotta, A. McCallum, J. Betz, Integrating probabilistic extraction models and data mining to discover relations and patterns in text, in: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006, pp. 296-303.
  • 16
    • 14544291427 scopus 로고    scopus 로고
    • Mining interesting knowledge from weblogs: a survey
    • Facca F., and Lanzi P. Mining interesting knowledge from weblogs: a survey. Data and Knowledge Engineering 53 3 (2005) 225-241
    • (2005) Data and Knowledge Engineering , vol.53 , Issue.3 , pp. 225-241
    • Facca, F.1    Lanzi, P.2
  • 18
    • 56249115624 scopus 로고    scopus 로고
    • D. Freitag, A. McCallum, Information extraction with HMM structures learned by stochastic optimization, in: Proceedings of the 17th National Conference on Artificial Intelligence (AAAI), 2000, pp. 584-589.
    • D. Freitag, A. McCallum, Information extraction with HMM structures learned by stochastic optimization, in: Proceedings of the 17th National Conference on Artificial Intelligence (AAAI), 2000, pp. 584-589.
  • 19
    • 32344439113 scopus 로고    scopus 로고
    • R. Ghani, Price prediction and insurance for online auctions, in: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2005, pp. 411-418.
    • R. Ghani, Price prediction and insurance for online auctions, in: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2005, pp. 411-418.
  • 20
    • 56249108129 scopus 로고    scopus 로고
    • R. Ghani, H. Simmons, Predicting the end-price of online auctions, in: Proceedings of the International Workshop on Data Mining and Adaptive Modeling Methods for Economics and Management, 2004.
    • R. Ghani, H. Simmons, Predicting the end-price of online auctions, in: Proceedings of the International Workshop on Data Mining and Adaptive Modeling Methods for Economics and Management, 2004.
  • 21
    • 84859881704 scopus 로고    scopus 로고
    • T. Grenager, D. Klein, C. Manning, Unsupervised learning of field segmentation models for information extraction, in: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, 2005, pp. 371-378.
    • T. Grenager, D. Klein, C. Manning, Unsupervised learning of field segmentation models for information extraction, in: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, 2005, pp. 371-378.
  • 22
    • 33748195920 scopus 로고    scopus 로고
    • Sampling, information extraction and summarisation of hidden web databases
    • Hedley Y.-L., Younas M., James A., and Sanderson M. Sampling, information extraction and summarisation of hidden web databases. Data and Knowledge Engineering 59 2 (2006) 213-230
    • (2006) Data and Knowledge Engineering , vol.59 , Issue.2 , pp. 213-230
    • Hedley, Y.-L.1    Younas, M.2    James, A.3    Sanderson, M.4
  • 23
    • 12244305149 scopus 로고    scopus 로고
    • M. Hu, B. Liu, Mining and summarizing customer reviews, in: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2004, pp. 168-177.
    • M. Hu, B. Liu, Mining and summarizing customer reviews, in: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2004, pp. 168-177.
  • 25
    • 33747058044 scopus 로고    scopus 로고
    • Information extraction from structured documents using k-testable tree automaton inference
    • Kosala R., Blockeel H., Bruynooghe M., and Van den Bussche J. Information extraction from structured documents using k-testable tree automaton inference. Data and Knowledge Engineering 58 2 (2006) 129-158
    • (2006) Data and Knowledge Engineering , vol.58 , Issue.2 , pp. 129-158
    • Kosala, R.1    Blockeel, H.2    Bruynooghe, M.3    Van den Bussche, J.4
  • 27
    • 56249093535 scopus 로고    scopus 로고
    • N. Kushmerick, B. Grace, The wrapper induction environment, in: Proceedings of the Workshop on Software Tools for Developing Agents (AAAI), 1998, pp. 131-132.
    • N. Kushmerick, B. Grace, The wrapper induction environment, in: Proceedings of the Workshop on Software Tools for Developing Agents (AAAI), 1998, pp. 131-132.
  • 28
    • 23144437876 scopus 로고    scopus 로고
    • N. Kushmerick, B. Thomas, Adaptive information extraction: core technologies for information agents, in: Intelligent Information Agents R&D in Europe: An AgentLink Perspective, 2002, pp. 79-103.
    • N. Kushmerick, B. Thomas, Adaptive information extraction: core technologies for information agents, in: Intelligent Information Agents R&D in Europe: An AgentLink Perspective, 2002, pp. 79-103.
  • 29
    • 56249087289 scopus 로고    scopus 로고
    • J. Lafferty, A. McCallum, F. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in: Proceedings of 18th International Conference on Machine Learning (ICML), 2001, pp. 282-289.
    • J. Lafferty, A. McCallum, F. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in: Proceedings of 18th International Conference on Machine Learning (ICML), 2001, pp. 282-289.
  • 30
    • 77952333945 scopus 로고    scopus 로고
    • B. Liu, R. Grossman, Y. Zhai, Mining data records in web pages, in: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2003, pp. 601-606.
    • B. Liu, R. Grossman, Y. Zhai, Mining data records in web pages, in: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2003, pp. 601-606.
  • 31
    • 56249088348 scopus 로고    scopus 로고
    • B. Liu, M. Hu, J. Cheng, Opinion observer: analyzing and comparing opinions on the web, in: Proceedings of the 11th International World Wide Web Conference (WWW), 2005, pp. 342-351.
    • B. Liu, M. Hu, J. Cheng, Opinion observer: analyzing and comparing opinions on the web, in: Proceedings of the 11th International World Wide Web Conference (WWW), 2005, pp. 342-351.
  • 32
    • 56249136619 scopus 로고    scopus 로고
    • A. McCallum, D. Jensen, A note on the unification of information extraction and data mining using conditional-probability, relational models, in: Proceedings of the IJCAI Workshop on Learning Statistical Models from Relational Data, 2003.
    • A. McCallum, D. Jensen, A note on the unification of information extraction and data mining using conditional-probability, relational models, in: Proceedings of the IJCAI Workshop on Learning Statistical Models from Relational Data, 2003.
  • 33
    • 56249089022 scopus 로고    scopus 로고
    • A. McCallum, B. Wellner, Toward conditional models of identity uncertainty with application to proper noun coreference, in: Proceedings of the IJCAI Workshop on Information Integration on the Web, 2003.
    • A. McCallum, B. Wellner, Toward conditional models of identity uncertainty with application to proper noun coreference, in: Proceedings of the IJCAI Workshop on Information Integration on the Web, 2003.
  • 34
    • 0242540451 scopus 로고    scopus 로고
    • S. Morinaga, K. Yamanishi, K. Tateishi, T. Fukushima, Mining product reputation on the Web, in: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2002, pp. 341-349.
    • S. Morinaga, K. Yamanishi, K. Tateishi, T. Fukushima, Mining product reputation on the Web, in: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2002, pp. 341-349.
  • 35
    • 56249092847 scopus 로고    scopus 로고
    • K. Murphy, Y. Weiss, M. Jordan, Loopy belief propagation for approximate inference: an empirical study, in: Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence (UAI), 1999, pp. 467-475.
    • K. Murphy, Y. Weiss, M. Jordan, Loopy belief propagation for approximate inference: an empirical study, in: Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence (UAI), 1999, pp. 467-475.
  • 37
    • 80053270803 scopus 로고    scopus 로고
    • A. Popescu, O. Etzioni, Extracting product features and opinions from reviews, in: Proceedings of the Human Language Technology Conference on Empirical Methods in Natural Language Processing, 2005, pp. 339-346.
    • A. Popescu, O. Etzioni, Extracting product features and opinions from reviews, in: Proceedings of the Human Language Technology Conference on Empirical Methods in Natural Language Processing, 2005, pp. 339-346.
  • 38
    • 84880915291 scopus 로고    scopus 로고
    • K. Probst, M.K.R. Ghai, A. Fano, Y. Liu, Semi-supervised learning of attribute-value pairs from product descriptions, in: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), 2007, pp. 2838-2843.
    • K. Probst, M.K.R. Ghai, A. Fano, Y. Liu, Semi-supervised learning of attribute-value pairs from product descriptions, in: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), 2007, pp. 2838-2843.
  • 39
    • 56249091145 scopus 로고    scopus 로고
    • F. Sha, F. Pereira, Shallow parsing with conditional random fields, in: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2003, pp. 213-220.
    • F. Sha, F. Pereira, Shallow parsing with conditional random fields, in: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2003, pp. 213-220.
  • 40
    • 27544470262 scopus 로고    scopus 로고
    • Combining information extraction systems using voting and stacked generalization
    • Sigletos G., Paliouras G., and Spyropoulos C. Combining information extraction systems using voting and stacked generalization. Journal of Machine Learning Research 6 (2005) 1751-1782
    • (2005) Journal of Machine Learning Research , vol.6 , pp. 1751-1782
    • Sigletos, G.1    Paliouras, G.2    Spyropoulos, C.3
  • 41
    • 84939181118 scopus 로고    scopus 로고
    • S. Tatikonda, Convergence of the sum-product algorithm, in: Proceedings of the 2003 IEEE Information Theory Workshop, 2003, pp. 222-225.
    • S. Tatikonda, Convergence of the sum-product algorithm, in: Proceedings of the 2003 IEEE Information Theory Workshop, 2003, pp. 222-225.
  • 42
    • 84885677547 scopus 로고    scopus 로고
    • P. Viola, M. Narasimhan, Learning to extract information from semi-structured text using a discriminative context free grammar, in: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2005, pp. 330-337.
    • P. Viola, M. Narasimhan, Learning to extract information from semi-structured text using a discriminative context free grammar, in: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2005, pp. 330-337.
  • 43
    • 56249146748 scopus 로고    scopus 로고
    • B. Wellner, A. McCallum, F. Peng, M. Hay, An integrated, conditional model of information extraction and coreference with application to citation matching, in: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), 2004, pp. 593-601.
    • B. Wellner, A. McCallum, F. Peng, M. Hay, An integrated, conditional model of information extraction and coreference with application to citation matching, in: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), 2004, pp. 593-601.
  • 44
    • 19544378318 scopus 로고    scopus 로고
    • T.L. Wong, W. Lam, A probabilistic approach for adapting information extraction wrappers and discovering new attributes, in: Proceedings of the 2004 IEEE International Conference on Data Mining (ICDM), 2004, pp. 257-264.
    • T.L. Wong, W. Lam, A probabilistic approach for adapting information extraction wrappers and discovering new attributes, in: Proceedings of the 2004 IEEE International Conference on Data Mining (ICDM), 2004, pp. 257-264.
  • 45
    • 2942587187 scopus 로고    scopus 로고
    • T.L. Wong, W. Lam, Text mining from site invariant and dependent features for information extraction knowledge adaptation, in: Proceedings of the 2004 SIAM International Conference on Data Mining (SDM), 2004, pp. 45-56.
    • T.L. Wong, W. Lam, Text mining from site invariant and dependent features for information extraction knowledge adaptation, in: Proceedings of the 2004 SIAM International Conference on Data Mining (SDM), 2004, pp. 45-56.
  • 46
    • 33745441264 scopus 로고    scopus 로고
    • T.L. Wong, W. Lam, Hot item mining and summarization from multiple auction web sites, in: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM), 2005, pp. 797-800.
    • T.L. Wong, W. Lam, Hot item mining and summarization from multiple auction web sites, in: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM), 2005, pp. 797-800.
  • 47
    • 33745454259 scopus 로고    scopus 로고
    • T.L. Wong, W. Lam, S.K. Chan, Collaborative information extraction and mining from multiple web documents, in: Proceedings of the 2006 SIAM International Conference on Data Mining (SDM), 2006, pp. 440-450.
    • T.L. Wong, W. Lam, S.K. Chan, Collaborative information extraction and mining from multiple web documents, in: Proceedings of the 2006 SIAM International Conference on Data Mining (SDM), 2006, pp. 440-450.
  • 48
    • 33745773725 scopus 로고    scopus 로고
    • T.L. Wong, W. Lam, S.K. Chan, Extracting and summarizing hot item features across different auction web sites, in: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2006, pp. 334-345.
    • T.L. Wong, W. Lam, S.K. Chan, Extracting and summarizing hot item features across different auction web sites, in: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2006, pp. 334-345.
  • 49
    • 56249097304 scopus 로고    scopus 로고
    • World Wide Web Consortium (W3C), Semantic web, 2001. .
    • World Wide Web Consortium (W3C), Semantic web, 2001. .


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.