메뉴 건너뛰기




Volumn , Issue , 2009, Pages 335-347

Robust web extraction: An approach based on a probabilistic tree-edit model

Author keywords

Probabilistic tree edit model; Wrappers; Xpath

Indexed keywords

EVOLUTION PROBLEM; HIGH COSTS; HTML DOCUMENTS; PROBABILISTIC MODELS; PROBABILISTIC TREE-EDIT MODEL; PROBABILISTIC TREES; QUADRATIC TIME; SOURCE TREE; TRAINING EXAMPLE; TREE STRUCTURES; WEB EXTRACTION; WEB-PAGE;

EID: 70849104261     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1559845.1559882     Document Type: Conference Paper
Times cited : (63)

References (32)
  • 2
    • 70350652136 scopus 로고    scopus 로고
    • Xpath-wrapper induction by generating tree traversal patterns
    • Tobias Anton. Xpath-wrapper induction by generating tree traversal patterns. In LWA, pages 126-133, 2005.
    • (2005) LWA , pp. 126-133
    • Anton, T.1
  • 3
    • 70849135620 scopus 로고    scopus 로고
    • Internet archive
    • Internet archive: http://www.archive.org/.
  • 4
    • 84944318551 scopus 로고    scopus 로고
    • Visual web information extraction with lixto
    • Robert Baumgartner, Sergio Flesca, and Georg Gottlob. Visual web information extraction with lixto. In VLDB, pages 119-128, 2001.
    • (2001) VLDB , pp. 119-128
    • Baumgartner, R.1    Flesca, S.2    Gottlob, G.3
  • 5
    • 33750334601 scopus 로고    scopus 로고
    • Learning stochastic tree edit distance
    • Marc Bernard, Amaury Habrard, and Marc Sebban. Learning stochastic tree edit distance. In ECML, pages 42-53, 2006.
    • (2006) ECML , pp. 42-53
    • Bernard, M.1    Habrard, A.2    Sebban, M.3
  • 6
    • 18444373554 scopus 로고    scopus 로고
    • A survey on tree edit distance and related problems
    • Philip Bille. A survey on tree edit distance and related problems. Theor. Comput. Sci., 337(1-3):217-239, 2005.
    • (2005) Theor. Comput. Sci. , vol.337 , Issue.1-3 , pp. 217-239
    • Bille, P.1
  • 7
    • 38149042707 scopus 로고    scopus 로고
    • Learning metrics between tree structured data: Application to image recognition
    • Laurent Boyer, Amaury Habrard, and Marc Sebban. Learning metrics between tree structured data: Application to image recognition. In ECML, pages 54-66, 2007.
    • (2007) ECML , pp. 54-66
    • Boyer, L.1    Habrard, A.2    Sebban, M.3
  • 8
    • 34250670477 scopus 로고    scopus 로고
    • Documentum eci self-repairing wrappers: Performance analysis
    • Boris Chidlovskii, Bruno Roustant, and Marc Brette. Documentum eci self-repairing wrappers: performance analysis. In SIGMOD, pages 708-717, 2006.
    • (2006) SIGMOD , pp. 708-717
    • Chidlovskii, B.1    Roustant, B.2    Brette, M.3
  • 9
    • 77953046656 scopus 로고    scopus 로고
    • A flexible learning system for wrapping tables and lists in html documents
    • New York, NY, USA, ACM
    • William W. Cohen, Matthew Hurst, and Lee S. Jensen. A flexible learning system for wrapping tables and lists in html documents. In WWW, pages 232-241, New York, NY, USA, 2002. ACM.
    • (2002) WWW , pp. 232-241
    • Cohen, W.W.1    Hurst, M.2    Jensen, L.S.3
  • 10
    • 84944327150 scopus 로고    scopus 로고
    • Roadrunner: Towards automatic data extraction from large web sites
    • Valter Crescenzi, Giansalvatore Mecca, and Paolo Merialdo. Roadrunner: Towards automatic data extraction from large web sites. In VLDB, pages 109-118, 2001.
    • (2001) VLDB , pp. 109-118
    • Crescenzi, V.1    Mecca, G.2    Merialdo, P.3
  • 11
    • 33746264524 scopus 로고    scopus 로고
    • Training tree transducers
    • J. Graehl and K. Knight. Training tree transducers. In HLT-NAACL, pages 105-112, 2004.
    • (2004) HLT-NAACL , pp. 105-112
    • Graehl, J.1    Knight, K.2
  • 12
    • 0002985122 scopus 로고    scopus 로고
    • Wrapping web data into XML
    • Wei Han, David Buttler, and Calton Pu. Wrapping web data into XML. SIGMOD Record, 30(3):33-38, 2001.
    • (2001) SIGMOD Record , vol.30 , Issue.3 , pp. 33-38
    • Han, W.1    Buttler, D.2    Pu, C.3
  • 13
    • 0032309862 scopus 로고    scopus 로고
    • Generating finite-state transducers for semi-structured data extraction from the web
    • Chun-Nan Hsu and Ming-Tzung Dung. Generating finite-state transducers for semi-structured data extraction from the web. Information Systems, 23(8):521-538, 1998.
    • (1998) Information Systems , vol.23 , Issue.8 , pp. 521-538
    • Hsu, C.-N.1    Dung, M.-T.2
  • 14
    • 70849085944 scopus 로고    scopus 로고
    • Internet movie database
    • Internet movie database: http://www.imdb.com/.
  • 15
    • 34848824805 scopus 로고    scopus 로고
    • An overview of probabilistic tree transducers for natural language processing
    • Kevin Knight and JonathaGraehl. An overview of probabilistic tree transducers for natural language processing. In Proceedings of the CICLing, 2005.
    • (2005) Proceedings of the CICLing
    • Knight, K.1    Graehl, J.2
  • 16
    • 62749172221 scopus 로고    scopus 로고
    • Myportal: Robust extraction and aggregation of web content
    • VLDB Endowment
    • Marek Kowalkiewicz, Tomasz Kaczmarek, and Witold Abramowicz. Myportal: Robust extraction and aggregation of web content. In VLDB, pages 1219-1222. VLDB Endowment, 2006.
    • (2006) VLDB , pp. 1219-1222
    • Kowalkiewicz, M.1    Kaczmarek, T.2    Abramowicz, W.3
  • 18
    • 0001776223 scopus 로고    scopus 로고
    • Wrapper induction for information extraction
    • Nickolas Kushmerick, Daniel S. Weld, and Robert B. Doorenbos. Wrapper induction for information extraction. In IJCAI, pages 729-737, 1997.
    • (1997) IJCAI , pp. 729-737
    • Kushmerick, N.1    Weld, D.S.2    Doorenbos, R.B.3
  • 20
    • 44849098451 scopus 로고    scopus 로고
    • A conditional random field for discriminatively-trained finite-state string edit distance
    • A. McCallum, K. Bellare, and P. Pereira. A conditional random field for discriminatively-trained finite-state string edit distance. In UAI, 2005.
    • (2005) UAI
    • McCallum, A.1    Bellare, K.2    Pereira, P.3
  • 21
    • 33745626486 scopus 로고    scopus 로고
    • Mapping maintenance for data integration systems
    • Robert McCann, Bedoor AlShebli, Quoc Le, Hoa Nguyen, Long Vu, and AnHai Doan. Mapping maintenance for data integration systems. In VLDB, pages 1018-1029, 2005.
    • (2005) VLDB , pp. 1018-1029
    • McCann, R.1    AlShebli, B.2    Le, Q.3    Nguyen, H.4    Vu, L.5    Doan, A.6
  • 24
    • 59549087165 scopus 로고    scopus 로고
    • On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes
    • Andrew Ng and Michael Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In NIPS, 2002.
    • (2002) NIPS
    • Ng, A.1    Jordan, M.2
  • 26
    • 33744981648 scopus 로고    scopus 로고
    • Learning stochastic edit distance: Application in handwritten character recognition
    • Jose Oncina and Marc Sebban. Learning stochastic edit distance: Application in handwritten character recognition. Pattern Recogn., 39(9):1575-1587, 2006.
    • (2006) Pattern Recogn. , vol.39 , Issue.9 , pp. 1575-1587
    • Oncina, J.1    Sebban, M.2
  • 27
    • 0000166629 scopus 로고
    • The complexity of counting cuts and of computing the probability that a graph is connected
    • J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12(4):777-788, 1983.
    • (1983) SIAM J. Comput. , vol.12 , Issue.4 , pp. 777-788
    • Provan, J.S.1    Ball, M.O.2
  • 28
    • 0008634293 scopus 로고    scopus 로고
    • Learning string edit distance
    • Eric Sven Ristad and Peter N. Yianilos. Learning string edit distance. In ICML, pages 287-295, 1997.
    • (1997) ICML , pp. 287-295
    • Ristad, E.S.1    Yianilos, P.N.2
  • 29
    • 0008749998 scopus 로고    scopus 로고
    • Building light-weight wrappers for legacy web data-sources using w4f
    • Arnaud Sahuguet and Fabien Azavant. Building light-weight wrappers for legacy web data-sources using w4f. In VLDB, pages 738-741, 1999.
    • (1999) VLDB , pp. 738-741
    • Sahuguet, A.1    Azavant, F.2
  • 30
    • 70849123715 scopus 로고    scopus 로고
    • http://sourceforge.net/projects/jtidy.
  • 31
    • 0000142982 scopus 로고
    • The complexity of enumeration and reliability problems
    • L. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comput., 8:410-421, 1979.
    • (1979) SIAM J. Comput. , vol.8 , pp. 410-421
    • Valiant, L.1
  • 32
    • 0024889169 scopus 로고
    • Simple fast algorithms for the editing distance between trees and related problems
    • K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput., 18(6):1245-1262, 1989.
    • (1989) SIAM J. Comput. , vol.18 , Issue.6 , pp. 1245-1262
    • Zhang, K.1    Shasha, D.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.