-
1
-
-
1142303684
-
Extracting structured data from web
-
H. Arasu, A. Garcia-Molina. Extracting structured data from web pages. In SIGMOD'03, pages 337-348, 2003.
-
(2003)
SIGMOD'03
, pp. 337-348
-
-
Arasu, H.1
Garcia-Molina, A.2
-
2
-
-
15544389985
-
Tree-structured template generation for web
-
J. Y. Chuang, S. L. Hsu. Tree-structured template generation for web pages. In Web Intelligence'04, pages 327-333, 2004.
-
(2004)
Web Intelligence'04
, pp. 327-333
-
-
Chuang, J.Y.1
Hsu, S.L.2
-
3
-
-
84944327150
-
Roadrunner: Towards automatic data extraction from large web sites
-
G. M. P. Crescenzi, V. Mecca. Roadrunner: Towards automatic data extraction from large web sites. In VLDB'01, 2001.
-
(2001)
VLDB'01
-
-
Crescenzi, G.M.P.1
Mecca, V.2
-
4
-
-
3142764439
-
Web wrapper induction: A brief survey
-
S. Flesca. Web wrapper induction: a brief survey. AI Communications, 17(2):57-61, 2004.
-
(2004)
AI Communications
, vol.17
, Issue.2
, pp. 57-61
-
-
Flesca, S.1
-
5
-
-
84880498138
-
-
G. N. D. G. P. Gupta, S. Kaiser. Dom-based content extraction of html documents. In WWW'03, pages 207-214, 2003.
-
G. N. D. G. P. Gupta, S. Kaiser. Dom-based content extraction of html documents. In WWW'03, pages 207-214, 2003.
-
-
-
-
6
-
-
0032309862
-
Generating finite-state transducers for semi-structured data extraction from the web
-
M. T. Hsu, C. N. Dung. Generating finite-state transducers for semi-structured data extraction from the web. Information Systems, 23(8):521-538, 1998.
-
(1998)
Information Systems
, vol.23
, Issue.8
, pp. 521-538
-
-
Hsu, M.T.1
Dung, C.N.2
-
7
-
-
84885654015
-
Title extraction from bodies of html documents and its application to web page retrieval
-
Y. Hu, G. Xin, R. Song, G. Hu, S. Shi, Y. Cao, and H. Li. Title extraction from bodies of html documents and its application to web page retrieval. In SIGIR'05, pages 250-257, 2005.
-
(2005)
SIGIR'05
, pp. 250-257
-
-
Hu, Y.1
Xin, G.2
Song, R.3
Hu, G.4
Shi, S.5
Cao, Y.6
Li, H.7
-
8
-
-
34250750133
-
Interactive wrapper generation with minimal user effort
-
T. Irmak, U. Suel. Interactive wrapper generation with minimal user effort. In WWW'06, pages 553-563, 2006.
-
(2006)
WWW'06
, pp. 553-563
-
-
Irmak, T.1
Suel, U.2
-
9
-
-
34548334405
-
-
N. Jindal and B. Liu. Review spam detection. In WWW'07, pages 1189-1190, 2007.
-
N. Jindal and B. Liu. Review spam detection. In WWW'07, pages 1189-1190, 2007.
-
-
-
-
10
-
-
71049193836
-
An interactive, personalized, newspaper on the www
-
K. A. M. C. Kamba, T. Bharat. An interactive, personalized, newspaper on the www. In WWW'95, 1995.
-
(1995)
WWW'95
-
-
Kamba, K.A.M.C.1
Bharat, T.2
-
11
-
-
0034172374
-
Wrapper induction: Efficiency and expressiveness
-
N. Kushmerick. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence, 118(1-2):15-68, 2000.
-
(2000)
Artificial Intelligence
, vol.118
, Issue.1-2
, pp. 15-68
-
-
Kushmerick, N.1
-
13
-
-
71049183492
-
Web content mining (tutorial)
-
B. Liu. Web content mining (tutorial). In WWW'05, 2005.
-
(2005)
WWW'05
-
-
Liu, B.1
-
15
-
-
0003243224
-
Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods
-
J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 1999.
-
(1999)
Advances in Large Margin Classifiers
-
-
Platt, J.C.1
-
16
-
-
4644340823
-
Automatic web news extraction using tree edit distance
-
D. C. Reis, P. B. Golgher, A. S. Silva, and A. F. Laender. Automatic web news extraction using tree edit distance. In WWW'04, pages 502-511, 2004.
-
(2004)
WWW'04
, pp. 502-511
-
-
Reis, D.C.1
Golgher, P.B.2
Silva, A.S.3
Laender, A.F.4
-
17
-
-
33845883383
-
Automation in information extraction and data integration (tutorial)
-
S. Sarawagi. Automation in information extraction and data integration (tutorial). In VLDB'02, 2002.
-
(2002)
VLDB'02
-
-
Sarawagi, S.1
-
18
-
-
0001122858
-
The tree-to-tree editing problem
-
S. M. Selkow. The tree-to-tree editing problem. Information Processing Letters, 6(6):184-186, 1977.
-
(1977)
Information Processing Letters
, vol.6
, Issue.6
, pp. 184-186
-
-
Selkow, S.M.1
-
21
-
-
0040864988
-
Principles of risk minimization for learning theory
-
V. Vapnik. Principles of risk minimization for learning theory. In NIPS'91, pages 831-838, 1991.
-
(1991)
NIPS'91
, pp. 831-838
-
-
Vapnik, V.1
-
22
-
-
33744821948
-
Web data extraction based on partial tree alignment
-
B. Zhai, Y. Liu. Web data extraction based on partial tree alignment. In WWW'05, pages 76-85, 2005.
-
(2005)
WWW'05
, pp. 76-85
-
-
Zhai, B.1
Liu, Y.2
-
23
-
-
71049171156
-
-
W. W. Z. R. V. Y. C. Zhao, H. Meng. Fully automatic wrapper generation for search engines. In WWW'05, pages 66-75, 2005.
-
W. W. Z. R. V. Y. C. Zhao, H. Meng. Fully automatic wrapper generation for search engines. In WWW'05, pages 66-75, 2005.
-
-
-
-
24
-
-
36849073188
-
Mining templates from search result records of search engines
-
W. Y. C. Zhao, H. Meng. Mining templates from search result records of search engines. In SIGKDD'07, pages 884-893, 2007.
-
(2007)
SIGKDD'07
, pp. 884-893
-
-
Zhao, W.Y.C.1
Meng, H.2
-
25
-
-
54249088690
-
Template-independent news extraction based on visual consistency
-
S. Zheng, R. Song, and J. Wen. Template-independent news extraction based on visual consistency. In SAAAI'07, volume 22, pages 1507-1513, 2007.
-
(2007)
SAAAI'07
, vol.22
, pp. 1507-1513
-
-
Zheng, S.1
Song, R.2
Wen, J.3
|