-
2
-
-
70350652136
-
Xpath-wrapper induction by generating tree traversal patterns
-
Tobias Anton. Xpath-wrapper induction by generating tree traversal patterns. In LWA, pages 126-133, 2005.
-
(2005)
LWA
, pp. 126-133
-
-
Anton, T.1
-
3
-
-
70849135620
-
-
Internet archive
-
Internet archive: http://www.archive.org/.
-
-
-
-
4
-
-
84944318551
-
Visual web information extraction with lixto
-
Robert Baumgartner, Sergio Flesca, and Georg Gottlob. Visual web information extraction with lixto. In VLDB, pages 119-128, 2001.
-
(2001)
VLDB
, pp. 119-128
-
-
Baumgartner, R.1
Flesca, S.2
Gottlob, G.3
-
5
-
-
33750334601
-
Learning stochastic tree edit distance
-
Marc Bernard, Amaury Habrard, and Marc Sebban. Learning stochastic tree edit distance. In ECML, pages 42-53, 2006.
-
(2006)
ECML
, pp. 42-53
-
-
Bernard, M.1
Habrard, A.2
Sebban, M.3
-
6
-
-
18444373554
-
A survey on tree edit distance and related problems
-
Philip Bille. A survey on tree edit distance and related problems. Theor. Comput. Sci., 337(1-3):217-239, 2005.
-
(2005)
Theor. Comput. Sci.
, vol.337
, Issue.1-3
, pp. 217-239
-
-
Bille, P.1
-
7
-
-
38149042707
-
Learning metrics between tree structured data: Application to image recognition
-
Laurent Boyer, Amaury Habrard, and Marc Sebban. Learning metrics between tree structured data: Application to image recognition. In ECML, pages 54-66, 2007.
-
(2007)
ECML
, pp. 54-66
-
-
Boyer, L.1
Habrard, A.2
Sebban, M.3
-
8
-
-
34250670477
-
Documentum eci self-repairing wrappers: Performance analysis
-
Boris Chidlovskii, Bruno Roustant, and Marc Brette. Documentum eci self-repairing wrappers: performance analysis. In SIGMOD, pages 708-717, 2006.
-
(2006)
SIGMOD
, pp. 708-717
-
-
Chidlovskii, B.1
Roustant, B.2
Brette, M.3
-
9
-
-
77953046656
-
A flexible learning system for wrapping tables and lists in html documents
-
New York, NY, USA, ACM
-
William W. Cohen, Matthew Hurst, and Lee S. Jensen. A flexible learning system for wrapping tables and lists in html documents. In WWW, pages 232-241, New York, NY, USA, 2002. ACM.
-
(2002)
WWW
, pp. 232-241
-
-
Cohen, W.W.1
Hurst, M.2
Jensen, L.S.3
-
10
-
-
84944327150
-
Roadrunner: Towards automatic data extraction from large web sites
-
Valter Crescenzi, Giansalvatore Mecca, and Paolo Merialdo. Roadrunner: Towards automatic data extraction from large web sites. In VLDB, pages 109-118, 2001.
-
(2001)
VLDB
, pp. 109-118
-
-
Crescenzi, V.1
Mecca, G.2
Merialdo, P.3
-
11
-
-
33746264524
-
Training tree transducers
-
J. Graehl and K. Knight. Training tree transducers. In HLT-NAACL, pages 105-112, 2004.
-
(2004)
HLT-NAACL
, pp. 105-112
-
-
Graehl, J.1
Knight, K.2
-
12
-
-
0002985122
-
Wrapping web data into XML
-
Wei Han, David Buttler, and Calton Pu. Wrapping web data into XML. SIGMOD Record, 30(3):33-38, 2001.
-
(2001)
SIGMOD Record
, vol.30
, Issue.3
, pp. 33-38
-
-
Han, W.1
Buttler, D.2
Pu, C.3
-
13
-
-
0032309862
-
Generating finite-state transducers for semi-structured data extraction from the web
-
Chun-Nan Hsu and Ming-Tzung Dung. Generating finite-state transducers for semi-structured data extraction from the web. Information Systems, 23(8):521-538, 1998.
-
(1998)
Information Systems
, vol.23
, Issue.8
, pp. 521-538
-
-
Hsu, C.-N.1
Dung, M.-T.2
-
14
-
-
70849085944
-
-
Internet movie database
-
Internet movie database: http://www.imdb.com/.
-
-
-
-
15
-
-
34848824805
-
An overview of probabilistic tree transducers for natural language processing
-
Kevin Knight and JonathaGraehl. An overview of probabilistic tree transducers for natural language processing. In Proceedings of the CICLing, 2005.
-
(2005)
Proceedings of the CICLing
-
-
Knight, K.1
Graehl, J.2
-
16
-
-
62749172221
-
Myportal: Robust extraction and aggregation of web content
-
VLDB Endowment
-
Marek Kowalkiewicz, Tomasz Kaczmarek, and Witold Abramowicz. Myportal: Robust extraction and aggregation of web content. In VLDB, pages 1219-1222. VLDB Endowment, 2006.
-
(2006)
VLDB
, pp. 1219-1222
-
-
Kowalkiewicz, M.1
Kaczmarek, T.2
Abramowicz, W.3
-
17
-
-
34250684614
-
Robust web content extraction
-
Marek Kowalkiewicz, Maria E. Orlowska, Tomasz Kaczmarek, and Witold Abramowicz. Robust web content extraction. In WWW, pages 887-888, 2006.
-
(2006)
WWW
, pp. 887-888
-
-
Kowalkiewicz, M.1
Orlowska, M.E.2
Kaczmarek, T.3
Abramowicz, W.4
-
18
-
-
0001776223
-
Wrapper induction for information extraction
-
Nickolas Kushmerick, Daniel S. Weld, and Robert B. Doorenbos. Wrapper induction for information extraction. In IJCAI, pages 729-737, 1997.
-
(1997)
IJCAI
, pp. 729-737
-
-
Kushmerick, N.1
Weld, D.S.2
Doorenbos, R.B.3
-
20
-
-
44849098451
-
A conditional random field for discriminatively-trained finite-state string edit distance
-
A. McCallum, K. Bellare, and P. Pereira. A conditional random field for discriminatively-trained finite-state string edit distance. In UAI, 2005.
-
(2005)
UAI
-
-
McCallum, A.1
Bellare, K.2
Pereira, P.3
-
21
-
-
33745626486
-
Mapping maintenance for data integration systems
-
Robert McCann, Bedoor AlShebli, Quoc Le, Hoa Nguyen, Long Vu, and AnHai Doan. Mapping maintenance for data integration systems. In VLDB, pages 1018-1029, 2005.
-
(2005)
VLDB
, pp. 1018-1029
-
-
McCann, R.1
AlShebli, B.2
Le, Q.3
Nguyen, H.4
Vu, L.5
Doan, A.6
-
24
-
-
59549087165
-
On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes
-
Andrew Ng and Michael Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In NIPS, 2002.
-
(2002)
NIPS
-
-
Ng, A.1
Jordan, M.2
-
26
-
-
33744981648
-
Learning stochastic edit distance: Application in handwritten character recognition
-
Jose Oncina and Marc Sebban. Learning stochastic edit distance: Application in handwritten character recognition. Pattern Recogn., 39(9):1575-1587, 2006.
-
(2006)
Pattern Recogn.
, vol.39
, Issue.9
, pp. 1575-1587
-
-
Oncina, J.1
Sebban, M.2
-
27
-
-
0000166629
-
The complexity of counting cuts and of computing the probability that a graph is connected
-
J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12(4):777-788, 1983.
-
(1983)
SIAM J. Comput.
, vol.12
, Issue.4
, pp. 777-788
-
-
Provan, J.S.1
Ball, M.O.2
-
28
-
-
0008634293
-
Learning string edit distance
-
Eric Sven Ristad and Peter N. Yianilos. Learning string edit distance. In ICML, pages 287-295, 1997.
-
(1997)
ICML
, pp. 287-295
-
-
Ristad, E.S.1
Yianilos, P.N.2
-
29
-
-
0008749998
-
Building light-weight wrappers for legacy web data-sources using w4f
-
Arnaud Sahuguet and Fabien Azavant. Building light-weight wrappers for legacy web data-sources using w4f. In VLDB, pages 738-741, 1999.
-
(1999)
VLDB
, pp. 738-741
-
-
Sahuguet, A.1
Azavant, F.2
-
30
-
-
70849123715
-
-
http://sourceforge.net/projects/jtidy.
-
-
-
-
31
-
-
0000142982
-
The complexity of enumeration and reliability problems
-
L. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comput., 8:410-421, 1979.
-
(1979)
SIAM J. Comput.
, vol.8
, pp. 410-421
-
-
Valiant, L.1
-
32
-
-
0024889169
-
Simple fast algorithms for the editing distance between trees and related problems
-
K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput., 18(6):1245-1262, 1989.
-
(1989)
SIAM J. Comput.
, vol.18
, Issue.6
, pp. 1245-1262
-
-
Zhang, K.1
Shasha, D.2
|