메뉴 건너뛰기




Volumn 15, Issue 4, 2008, Pages 297-311

Detecting data records in semi-structured web sites based on text token clustering

Author keywords

Automatic data detection; HTML tags; Semi structured web sites; Text token clustering; Web information extraction

Indexed keywords

CLUSTER ANALYSIS; CLUSTERING ALGORITHMS; HTML; INFORMATION RETRIEVAL; WEBSITES;

EID: 50249132381     PISSN: 10692509     EISSN: None     Source Type: Journal    
DOI: 10.3233/ica-2008-15403     Document Type: Conference Paper
Times cited : (13)

References (31)
  • 1
    • 37349086786 scopus 로고    scopus 로고
    • Extracting lists of data records from semi-structured web
    • Manuel Álvarez, Alberto Pan, Juan Raposo, Fernando Bellas and Fidel Cacheda, Extracting lists of data records from semi-structured web pages, Data Knowl Eng 64(2) (2008), 491-509.
    • (2008) Data Knowl Eng , vol.64 , Issue.2 , pp. 491-509
    • Álvarez, M.1    Pan, A.2    Raposo, J.3    Bellas, F.4    Cacheda, F.5
  • 2
    • 33746744370 scopus 로고    scopus 로고
    • Exploring new frontiers of web data extraction
    • The lixto project:, Springer-Verlag Berlin Heidelberg
    • Julien Carme, Michal Ceresna, Oliver Frlich, Georg Gottlob, Tamir Hassan, Marcus Herzog, Wolfgang Holzinger and Bernhard Krpl, The lixto project: Exploring new frontiers of web data extraction, in: BNCOD 2006, Springer-Verlag Berlin Heidelberg, 2006, pp. 1-15.
    • (2006) BNCOD 2006 , pp. 1-15
    • Carme, J.1    Ceresna, M.2    Frlich, O.3    Gottlob, G.4    Hassan, T.5    Herzog, M.6    Holzinger, W.7    Krpl, B.8
  • 3
    • 2442546444 scopus 로고    scopus 로고
    • Probe, cluster, and discover: Focused extraction of qa-pagelets from the deep web
    • James Caverlee, Ling Liu and David Buttler. Probe, cluster, and discover: Focused extraction of qa-pagelets from the deep web, in: ICDE, 2004, pp. 103-115.
    • (2004) ICDE , pp. 103-115
    • Caverlee, J.1    Liu, L.2    Buttler, D.3
  • 5
    • 0002607026 scopus 로고    scopus 로고
    • Bayesian classification (au-toclass): Theory and results
    • American Association for Artificial Intelligence USA
    • Peter Cheeseman and John Stutz. Bayesian classification (au-toclass): Theory and results, in: Advances in Knowledge Discovery and Data Mining, American Association for Artificial Intelligence USA, 1996, pp. 153-180.
    • (1996) Advances in Knowledge Discovery and Data Mining , pp. 153-180
    • Cheeseman, P.1    Stutz, J.2
  • 7
    • 4644340823 scopus 로고    scopus 로고
    • Davi de Castro Reis, Paulo Braz Golgher, Altigran Soares da Silva and Alberto H.F. Laender, Automatic web news extraction using tree edit distance, in: WWW, 2004, pp. 502-511.
    • Davi de Castro Reis, Paulo Braz Golgher, Altigran Soares da Silva and Alberto H.F. Laender, Automatic web news extraction using tree edit distance, in: WWW, 2004, pp. 502-511.
  • 9
    • 38149070921 scopus 로고    scopus 로고
    • Automatic data record detection in web
    • Xiaoying Gao, Le Phong Bao Vuong and Mengjie Zhang. Automatic data record detection in web pages, in: KSEM, 2007, pp. 349-361.
    • (2007) KSEM , pp. 349-361
    • Gao, X.1    Phong, L.2    Vuong, B.3    Zhang, M.4
  • 11
    • 50249174308 scopus 로고    scopus 로고
    • Overview of autofeed: An unsupervised learning system for generating webfeeds
    • Bora Gazen and Steven Minton. Overview of autofeed: An unsupervised learning system for generating webfeeds, in: AAAI, 2006.
    • (2006) AAAI
    • Gazen, B.1    Minton, S.2
  • 15
    • 0037806547 scopus 로고    scopus 로고
    • Alberto H.F. Laender, BerthierA.Ribeiro-Neto, Altigrams.da Silva and Juliana S. Teixeira, A brief survy of web data extraction tools, SIGMOD Record 31(2) (2002).
    • Alberto H.F. Laender, BerthierA.Ribeiro-Neto, Altigrams.da Silva and Juliana S. Teixeira, A brief survy of web data extraction tools, SIGMOD Record 31(2) (2002).
  • 19
    • 50249091388 scopus 로고    scopus 로고
    • Repository of online information sources used in information extraction tasks, 2005
    • Ion Muslea, Repository of online information sources used in information extraction tasks. www.isi.edu/info-agents/rise/repository.html, 2005.
    • Ion Muslea
  • 21
    • 0032684968 scopus 로고    scopus 로고
    • A hierarchical approach to wrapper induction, in: Oren Etzioni, Jörg P. Müller and Jeffrey M
    • Bradshaw, eds, Seattle, WA, USA, ACM Press, pp
    • Ion Muslea, Steve Minton and Craig Knoblock, A hierarchical approach to wrapper induction, in: Oren Etzioni, Jörg P. Müller and Jeffrey M. Bradshaw, eds, Proceedings of the Third International Conference on Autonomous Agents (Agents'99), Seattle, WA, USA, 1999. ACM Press, pp. 190-197.
    • (1999) Proceedings of the Third International Conference on Autonomous Agents (Agents'99) , pp. 190-197
    • Muslea, I.1    Minton, S.2    Knoblock, C.3
  • 23
    • 30344457312 scopus 로고    scopus 로고
    • Stavies: A system for information extraction from unknown web data sources through automatic web wrapper generation using clustering techniques
    • Nikolaos Papadakis, Dimitrios Skoutas, Konstantinos Raftopoulos and Theodora A. Varvarigou, Stavies: A system for information extraction from unknown web data sources through automatic web wrapper generation using clustering techniques, IEEE Trans Knowl Data Eng 17(12) (2005), 1638-1652.
    • (2005) IEEE Trans Knowl Data Eng , vol.17 , Issue.12 , pp. 1638-1652
    • Papadakis, N.1    Skoutas, D.2    Raftopoulos, K.3    Varvarigou, T.A.4
  • 25
    • 0016572913 scopus 로고
    • A vector space model for automatic indexing
    • G. Salton, A. Wong and C.S. Yang, A vector space model for automatic indexing, Commun ACM 18(11) (1975), 613-620.
    • (1975) Commun ACM , vol.18 , Issue.11 , pp. 613-620
    • Salton, G.1    Wong, A.2    Yang, C.S.3
  • 26
    • 0019887799 scopus 로고
    • Identification of common molecular subsequences
    • T.F. Smith and M.S. Waterman, Identification of common molecular subsequences, Journal of Molecular Biology 147 (1981), 195-197.
    • (1981) Journal of Molecular Biology , vol.147 , pp. 195-197
    • Smith, T.F.1    Waterman, M.S.2
  • 27
    • 33747656968 scopus 로고    scopus 로고
    • Jordi Turbo, Alicia Ageno and Neus Catala, Adaptive information extraction, ACM Computing Surveys 38(2) (2006).
    • Jordi Turbo, Alicia Ageno and Neus Catala, Adaptive information extraction, ACM Computing Surveys 38(2) (2006).
  • 29
    • 42549093993 scopus 로고    scopus 로고
    • Data extraction from semi-structured web pages by clustering
    • LePhong Bao Vuong, Xiaoying Gao and Mengjie Zhang, Data extraction from semi-structured web pages by clustering, in: Web Intelligence, 2006, pp. 374-377.
    • (2006) Web Intelligence , pp. 374-377
    • Bao Vuong, L.1    Gao, X.2    Zhang, M.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.