-
1
-
-
0032092761
-
NoDoSE - A tool for semi-automatically extracting structured and semistructured data from text documents
-
Adelberg, B. NoDoSE-a tool for semi-automatically extracting structured and semistructured data from text documents. SIGMOD 1998.
-
(1998)
SIGMOD
-
-
Adelberg, B.1
-
2
-
-
17044366983
-
Accordion summarization for end-game browsing on PDAs and cellular phones
-
Buyukkokten, O., Garcia-Molina, H. and Paepcke, A. Accordion summarization for end-game browsing on PDAs and cellular phones. SIGCHI 2001.
-
(2001)
SIGCHI
-
-
Buyukkokten, O.1
Garcia-Molina, H.2
Paepcke, A.3
-
3
-
-
8644284554
-
Extracting content structure for web pages based on visual representation
-
Cai, D., Yu, S., Wen, J.R. and Ma, W.Y. Extracting content structure for web pages based on visual representation. APWeb 2003.
-
(2003)
APWeb
-
-
Cai, D.1
Yu, S.2
Wen, J.R.3
Ma, W.Y.4
-
4
-
-
8644236286
-
VIPS: A visionbased page segmentation algorithm
-
Cai, D., Yu, S., Wen, J.R. and Ma, W.Y. VIPS: a Visionbased Page Segmentation Algorithm. Microsoft Technical Report (MSR-TR-2003-79),2003
-
(2003)
Microsoft Technical Report (MSR-TR-2003-79)
-
-
Cai, D.1
Yu, S.2
Wen, J.R.3
Ma, W.Y.4
-
5
-
-
8644241107
-
Block-level link analysis
-
Cai, D., He, X., Wen, J.R. and Ma, W.Y. "Block-level Link Analysis", SIGIR 2004.
-
(2004)
SIGIR
-
-
Cai, D.1
He, X.2
Wen, J.R.3
Ma, W.Y.4
-
6
-
-
8644267730
-
Block-based web search
-
Cai, D., Yu, S., Wen, J.R. and Ma, W.Y. "Block-based Web Search", SIGIR 2004.
-
(2004)
SIGIR
-
-
Cai, D.1
Yu, S.2
Wen, J.R.3
Ma, W.Y.4
-
7
-
-
84865632343
-
Function-based object model towards website adaptation
-
Chen, J., Zhou, B., Shi, J., Zhang, H. and Fengwu, Q. Function-based object model towards website adaptation. WWW2001.
-
WWW2001
-
-
Chen, J.1
Zhou, B.2
Shi, J.3
Zhang, H.4
Fengwu, Q.5
-
8
-
-
34247382812
-
Template detection for large scale search engines
-
Chen, L., Ye, S. and Li, X. Template detection for large scale search engines. SAC 2006.
-
(2006)
SAC
-
-
Chen, L.1
Ye, S.2
Li, X.3
-
9
-
-
84944327150
-
Roadrunner: Towards automatic data extraction from large web sites
-
Crescenzi, V., Mecca, G. and Merialdo, P. Roadrunner: Towards automatic data extraction from large web sites. VLDB 2001.
-
(2001)
VLDB
-
-
Crescenzi, V.1
Mecca, G.2
Merialdo, P.3
-
10
-
-
0342770808
-
Experience with top gun wingman: A proxy-based graphical web browser for the 3com palmpilot
-
Fox, A., Goldberg, I., Gribble, S.D., Lee, D.C., Polito, A. and Brewer, E.A. Experience With Top Gun Wingman: A Proxy-Based Graphical Web Browser for the 3Com PalmPilot. Middleware 1998.
-
(1998)
Middleware
-
-
Fox, A.1
Goldberg, I.2
Gribble, S.D.3
Lee, D.C.4
Polito, A.5
Brewer, E.A.6
-
11
-
-
0034172374
-
Wrapper induction: Efficiency and expressiveness
-
Kushmerick, N. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence, 118, 2000.
-
(2000)
Artificial Intelligence
, vol.118
-
-
Kushmerick, N.1
-
12
-
-
0037806547
-
A brief survey of web data extraction tools
-
Laender, A., Ribeiro-Neto, B., Silva, A. and Teixeira, J. A Brief Survey of Web Data Extraction Tools, SIGMOD Record, Volume 31, Number 2, 2002.
-
(2002)
SIGMOD Record
, vol.31
, Issue.2
-
-
Laender, A.1
Ribeiro-Neto, B.2
Silva, A.3
Teixeira, J.4
-
13
-
-
0242456776
-
Discovering informative content blocks from Web documents
-
Lin, S.H. and Ho, J.M. Discovering informative content blocks from Web documents. KDD 2002.
-
(2002)
KDD
-
-
Lin, S.H.1
Ho, J.M.2
-
14
-
-
0033893885
-
XWRAP: An XML-enabled wrapper construction system for Web information sources
-
Liu, L., Pu, C. and Han, W. XWRAP: an XML-enabled wrapper construction system for Web information sources. ICDE 2000.
-
(2000)
ICDE
-
-
Liu, L.1
Pu, C.2
Han, W.3
-
15
-
-
18744404260
-
Extracting unstructured data from template generated web documents
-
Ma, L., Goharian, N., Chowdhury, A. and Chung, M. Extracting unstructured data from template generated web documents. CIKM 2003.
-
(2003)
CIKM
-
-
Ma, L.1
Goharian, N.2
Chowdhury, A.3
Chung, M.4
-
16
-
-
84938812620
-
Web page cleaning with conditional random fields
-
Marek, M., Pecina, P., Spousta, M. Web Page cleaning with Conditional Random Fields. WAC3 2007.
-
(2007)
WAC3
-
-
Marek, M.1
Pecina, P.2
Spousta, M.3
-
17
-
-
2942661100
-
Tracking and summarizing news on a daily basis with Columbia's Newsblaster
-
McKeown, K.R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J.L., Nenkova, A., Sable, C., Schiffman, B. and Sigelman, S. Tracking and summarizing news on a daily basis with Columbia's Newsblaster. HLT 2002.
-
(2002)
HLT
-
-
McKeown, K.R.1
Barzilay, R.2
Evans, D.3
Hatzivassiloglou, V.4
Klavans, J.L.5
Nenkova, A.6
Sable, C.7
Schiffman, B.8
Sigelman, S.9
-
19
-
-
84948481845
-
An algorithm for suffix stripping
-
Porter, M.F. An algorithm for suffix stripping. Program, vol. 14, no. 3, pp. 130-137, 1980.
-
(1980)
Program
, vol.14
, Issue.3
, pp. 130-137
-
-
Porter, M.F.1
-
20
-
-
56449104958
-
Learning and inference over constrained output
-
Punyakanok, V., Roth, D., Yih, W. and Zimak, D. Learning and inference over constrained output. IJCAI 2005.
-
(2005)
IJCAI
-
-
Punyakanok, V.1
Roth, D.2
Yih, W.3
Zimak, D.4
-
21
-
-
0033288922
-
A linear time algorithm for finding all maximal scoring subsequences
-
Ruzzo, W.L. and Tompa, M. A linear time algorithm for finding all maximal scoring subsequences. ISMB 1999.
-
(1999)
ISMB
-
-
Ruzzo, W.L.1
Tompa, M.2
-
22
-
-
84880811191
-
Web page cleaning for web mining through feature weighting
-
Yi, L. and Liu, B. Web Page Cleaning for Web Mining through Feature Weighting. IJCAI-03.
-
IJCAI-03
-
-
Yi, L.1
Liu, B.2
-
23
-
-
77952370025
-
Eliminating noisy information in Web pages for data mining
-
Yi, L., Liu, B. and Li, X. Eliminating noisy information in Web pages for data mining. KDD 2003.
-
(2003)
KDD
-
-
Yi, L.1
Liu, B.2
Li, X.3
-
24
-
-
84880475213
-
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
-
Yu, S., Cai, D., Wen, J.R. and Ma, W.Y. "Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation", WWW2003
-
WWW2003
-
-
Yu, S.1
Cai, D.2
Wen, J.R.3
Ma, W.Y.4
|