-
2
-
-
0032092761
-
Nodose - A tool for semi-automatically extracting semi-structured data from text documents
-
ACM Press
-
B. Adelberg. Nodose - a tool for semi-automatically extracting semi-structured data from text documents. In SIGMOD Conference, pages 283-294. ACM Press, 1998.
-
(1998)
SIGMOD Conference
, pp. 283-294
-
-
Adelberg, B.1
-
3
-
-
77953052174
-
Template detection via data mining and its applications
-
Z. Bar-Yossef and S. Rajagopalan. Template detection via data mining and its applications. In WWW, pages 580-591, 2002.
-
(2002)
WWW
, pp. 580-591
-
-
Bar-Yossef, Z.1
Rajagopalan, S.2
-
4
-
-
0035029462
-
Accordion summarization for end-game browsing on pdas and cellular phones
-
O. Buyukkokten, H. Garcia-Molina, and A. Paepcke. Accordion summarization for end-game browsing on pdas and cellular phones. In CHI, pages 213-220, 2001.
-
(2001)
CHI
, pp. 213-220
-
-
Buyukkokten, O.1
Garcia-Molina, H.2
Paepcke, A.3
-
5
-
-
8644241107
-
Block-level link analysis
-
ACM
-
D. Cai, X. He, J.-R.Wen, andW.-Y. Ma. Block-level link analysis. In SIGIR, pages 440-447. ACM, 2004.
-
(2004)
SIGIR
, pp. 440-447
-
-
Cai, D.1
He, X.2
Wen, J.-R.3
Ma, W.-Y.4
-
6
-
-
21144444733
-
Extracting content structure for web pages based on visual representation
-
APWeb, Springer
-
D. Cai, S. Yu, J.-R.Wen, andW.-Y. Ma. Extracting content structure for web pages based on visual representation. In APWeb, volume 2642 of Lecture Notes in Computer Science, pages 406-417. Springer, 2003.
-
(2003)
Lecture Notes in Computer Science
, vol.2642
, pp. 406-417
-
-
Cai, D.1
Yu, S.2
Wen, J.-R.3
Ma, W.-Y.4
-
7
-
-
18744372151
-
Misuse detection for information retrieval systems
-
ACM
-
R. Cathey, L. Ma, N. Goharian, and D. A. Grossman. Misuse detection for information retrieval systems. In CIKM, pages 183-190. ACM, 2003.
-
(2003)
CIKM
, pp. 183-190
-
-
Cathey, R.1
Ma, L.2
Goharian, N.3
Grossman, D.A.4
-
9
-
-
33751046629
-
Template detection for large scale search engines
-
ACM
-
L. Chen, S. Ye, and X. Li. Template detection for large scale search engines. In SAC, pages 1094-1098. ACM, 2006.
-
(2006)
SAC
, pp. 1094-1098
-
-
Chen, L.1
Ye, S.2
Li, X.3
-
10
-
-
4644340823
-
Automatic web news extraction using tree edit distance
-
ACM
-
D. de Castro Reis, P. B. Golgher, A. S. da Silva, and A. H. F. Laender. Automatic web news extraction using tree edit distance. In WWW, pages 502-511. ACM, 2004.
-
(2004)
WWW
, pp. 502-511
-
-
De Castro Reis, D.1
Golgher, P.B.2
Da Silva, A.S.3
Laender, A.H.F.4
-
11
-
-
26844469211
-
Automatic extraction of informative blocks from webpages
-
ACM
-
S. Debnath, P. Mitra, and C. L. Giles. Automatic extraction of informative blocks from webpages. In SAC, pages 1722-1726. ACM, 2005.
-
(2005)
SAC
, pp. 1722-1726
-
-
Debnath, S.1
Mitra, P.2
Giles, C.L.3
-
12
-
-
26944496810
-
Identifying content blocks from web documents
-
ISMIS, Springer
-
S. Debnath, P. Mitra, and C. L. Giles. Identifying content blocks from web documents. In ISMIS, volume 3488 of Lecture Notes in Computer Science, pages 285-293. Springer, 2005.
-
(2005)
Lecture Notes in Computer Science
, vol.3488
, pp. 285-293
-
-
Debnath, S.1
Mitra, P.2
Giles, C.L.3
-
14
-
-
57849154238
-
Evaluating content extraction on html documents
-
T. Gottron. Evaluating content extraction on html documents. In ITA, pages 123-132, 2007.
-
(2007)
ITA
, pp. 123-132
-
-
Gottron, T.1
-
15
-
-
70349131454
-
Combining content extraction heuristics: The ombine system
-
ACM
-
T. Gottron. Combining content extraction heuristics: the ombine system. In iiWAS, pages 591-595. ACM, 2008.
-
(2008)
iiWAS
, pp. 591-595
-
-
Gottron, T.1
-
16
-
-
57849123029
-
Content code blurring: A new approach to content extraction
-
T. Gottron. Content code blurring: A new approach to content extraction. In DEXA Workshops [1], pages 29-33.
-
DEXA Workshops
, Issue.1
, pp. 29-33
-
-
Gottron, T.1
-
17
-
-
14844363192
-
Automating content extraction of html documents
-
S. Gupta, G. E. Kaiser, P. Grimm, M. F. Chiang, and J. Starren. Automating content extraction of html documents. World Wide Web, 8(2):179-224, 2005.
-
(2005)
World Wide Web
, vol.8
, Issue.2
, pp. 179-224
-
-
Gupta, S.1
Kaiser, G.E.2
Grimm, P.3
Chiang, M.F.4
Starren, J.5
-
18
-
-
84880498138
-
Dom-based content extraction of html documents
-
S. Gupta, G. E. Kaiser, D. Neistadt, and P. Grimm. Dom-based content extraction of html documents. In WWW, pages 207-214, 2003.
-
(2003)
WWW
, pp. 207-214
-
-
Gupta, S.1
Kaiser, G.E.2
Neistadt, D.3
Grimm, P.4
-
20
-
-
0002985122
-
Wrapping web data into xml
-
W. Han, D. Buttler, and C. Pu. Wrapping web data into xml. SIGMOD Rec., 30(3):33-38, 2001.
-
(2001)
SIGMOD Rec.
, vol.30
, Issue.3
, pp. 33-38
-
-
Han, W.1
Buttler, D.2
Pu, C.3
-
21
-
-
0742268832
-
Mining web informative structures and contents based on entropy analysis
-
H.-Y. Kao, S.-H. Lin, J.-M. Ho, and M.-S. Chen. Mining web informative structures and contents based on entropy analysis. IEEE Trans. Knowl. Data Eng., 16(1):41-55, 2004.
-
(2004)
IEEE Trans. Knowl. Data Eng.
, vol.16
, Issue.1
, pp. 41-55
-
-
Kao, H.-Y.1
Lin, S.-H.2
Ho, J.-M.3
Chen, M.-S.4
-
22
-
-
0034172374
-
Wrapper induction: Efficiency and expressiveness
-
N. Kushmerick. Wrapper induction: efficiency and expressiveness. Artificial Intelligence, 118(1-2):15-68, 2000.
-
(2000)
Artificial Intelligence
, vol.118
, Issue.1-2
, pp. 15-68
-
-
Kushmerick, N.1
-
23
-
-
0242456776
-
Discovering informative content blocks from web documents
-
ACM
-
S.-H. Lin and J.-M. Ho. Discovering informative content blocks from web documents. In KDD, pages 588-593. ACM, 2002.
-
(2002)
KDD
, pp. 588-593
-
-
Lin, S.-H.1
Ho, J.-M.2
-
25
-
-
77950332467
-
Separating xhtml content from navigation clutter using dom-structure block analysis
-
S. Reich and M. Tzagarakis, editors, ACM
-
C. Mantratzis, M. A. Orgun, and S. Cassidy. Separating xhtml content from navigation clutter using dom-structure block analysis. In S. Reich and M. Tzagarakis, editors, Hypertext, pages 145-147. ACM, 2005.
-
(2005)
Hypertext
, pp. 145-147
-
-
Mantratzis, C.1
Orgun, M.A.2
Cassidy, S.3
-
26
-
-
84938812620
-
Template detection through conditional random fields
-
M. Marek, P. Pecina, and M. Spousta. Template detection through conditional random fields. In WAC3, 2007.
-
(2007)
WAC3
-
-
Marek, M.1
Pecina, P.2
Spousta, M.3
-
27
-
-
0035587215
-
Hierarchical wrapper induction for semistructured information sources
-
I. Muslea, S. Minton, and C. A. Knoblock. Hierarchical wrapper induction for semistructured information sources. Autonomous Agents and Multi-Agent Systems, 4(1-2):93-114, 2001.
-
(2001)
Autonomous Agents and Multi-Agent Systems
, vol.4
, Issue.1-2
, pp. 93-114
-
-
Muslea, I.1
Minton, S.2
Knoblock, C.A.3
-
28
-
-
84865651487
-
Extracting article text from the web with maximum subsequence segmentation
-
ACM
-
J. Pasternack and D. Roth. Extracting article text from the web with maximum subsequence segmentation. In WWW, pages 971-980. ACM, 2009.
-
(2009)
WWW
, pp. 971-980
-
-
Pasternack, J.1
Roth, D.2
-
29
-
-
0036989234
-
Quasm: A system for question answering using semi-structured data
-
ACM
-
D. Pinto, M. Branstein, R. Coleman, W. B. Croft, M. King, W. Li, and X. Wei. Quasm: a system for question answering using semi-structured data. In JCDL, pages 46-55. ACM, 2002.
-
(2002)
JCDL
, pp. 46-55
-
-
Pinto, D.1
Branstein, M.2
Coleman, R.3
Croft, W.B.4
King, M.5
Li, W.6
Wei, X.7
-
30
-
-
0038144389
-
Content extraction from html documents
-
A. F. R. Rahman, H. Alam, and R. Hartono. Content extraction from html documents. In WDA, pages 7-10, 2001.
-
(2001)
WDA
, pp. 7-10
-
-
Rahman, A.F.R.1
Alam, H.2
Hartono, R.3
-
31
-
-
58849102735
-
Toward 2w, beyond web 2.0
-
T. V. Raman. Toward 2w, beyond web 2.0. Commun. ACM, 52(2):52-59, 2009.
-
(2009)
Commun. ACM
, vol.52
, Issue.2
, pp. 52-59
-
-
Raman, T.V.1
-
32
-
-
57849147691
-
Text extraction from the web via text-to-tag ratio
-
T. Weninger and W. H. Hsu. Text extraction from the web via text-to-tag ratio. In DEXA Workshops [1], pages 23-28.
-
DEXA Workshops
, Issue.1
, pp. 23-28
-
-
Weninger, T.1
Hsu, W.H.2
-
33
-
-
77952370025
-
Eliminating noisy information in web pages for data mining
-
ACM
-
L. Yi, B. Liu, and X. Li. Eliminating noisy information in web pages for data mining. In KDD, pages 296-305. ACM, 2003.
-
(2003)
KDD
, pp. 296-305
-
-
Yi, L.1
Liu, B.2
Li, X.3
|