-
1
-
-
34247354933
-
-
Amitay, E., Carmel, D., Darlow, A., Lempel, R., & Soffer, A. (2002). Topic distillation with knowledge agents. In Proceedings of the eleventh text retrieval conference.
-
-
-
-
3
-
-
34247376881
-
-
Breuel, T. M. (2003). Information extraction from HTML documents by structural matching. In Proceedings of the second international workshop on web document analysis.
-
-
-
-
4
-
-
34247348589
-
-
Cai, D., Yu, S., Wen, J.-R., & Ma, W.-Y. (2003). VIPS: a vision-based page segmentation algorithm. Microsoft technical report (MSR-TR-2003-79).
-
-
-
-
5
-
-
85042021254
-
-
Chang, C. H. & Lui, S. C. (2001). IEPAD: information extraction based on pattern discovery. In Proceedings of the tenth international conference on World Wide Web (pp. 681-688).
-
-
-
-
6
-
-
34247387540
-
-
Chaudhuri, B. B. & Garain, U. (1999). Extraction of type style-based meta-information from imaged documents. In the Proceedings of the fifth international conference on document analysis and recognition (pp. 138-149).
-
-
-
-
7
-
-
84974668178
-
-
Chidlovskii, B., Ragetli, J., & de Rijke, M. (2000). Wrapper generation via grammar induction. In Proceedings of the eleventh european conference on machine learning (pp. 96-108).
-
-
-
-
8
-
-
34247375161
-
-
Collins-Thompson, K., Ogilvie, P., Zhang, Y., & Callan, J. (2002). Information filtering, novelty detection, and named-page finding. In Proceedings of the eleventh text retrieval conference.
-
-
-
-
9
-
-
34247400666
-
-
Craswell, N., & Hawking, D. (2003). Overview of the TREC 2003 web track. In Proceedings of the twelfth text retrieval conference (pp. 78-93).
-
-
-
-
10
-
-
18344370996
-
HTML tags as extraction cues for web page description construction
-
Craven T.C. HTML tags as extraction cues for web page description construction. Informing Science Journal 6 (2003) 1-12
-
(2003)
Informing Science Journal
, vol.6
, pp. 1-12
-
-
Craven, T.C.1
-
11
-
-
84944327150
-
-
Crescenzi, V., Mecca, G. & Merialdo, P. (2001). Roadrunner: towards automatic data extraction from large web sites. In Proceedings of the twenty-seventh international conference on very large databases (pp. 109-118).
-
-
-
-
12
-
-
0036039589
-
-
Crescenzi, V., Mecca, G., & Merialdo, P. (2002). Wrapping-oriented classification of web pages. In Proceedings of the 2002 ACM symposium on applied computing (pp. 1108-1112).
-
-
-
-
13
-
-
84921693565
-
-
Cutler, M., Shih, T., & Meng, Y. (1997). Using the structure of HTML documents to improve retrieval. In Proceedings of the USENIX symposium on internet technologies and systems (pp. 241-251).
-
-
-
-
14
-
-
34247353240
-
-
Eikvil, L. (1999). Information extraction from world wide web - a survey. Technical Report 945, Norweigan Computing Center, Oslo, Norway.
-
-
-
-
15
-
-
34247344859
-
-
Evans, D. K., Klavans, J. L., & McKeown, K. R. (2004). Columbia newsblaster: multilingual news summarization on the web. In Proceedings of human language technology conference/North American chapter of the Association for computational linguistics annual meeting (pp. 1-4).
-
-
-
-
16
-
-
0033907729
-
Machine learning for information extraction in informal domains
-
Freitag D. Machine learning for information extraction in informal domains. Machine Learning 39 2/3 (2000) 169-202
-
(2000)
Machine Learning
, vol.39
, Issue.2-3
, pp. 169-202
-
-
Freitag, D.1
-
17
-
-
34247325981
-
-
Freitag, D., & McCallum, A. (1999). Information extraction with HMMs and shrinkage. In Proceedings of the AAAI'99 workshop on machine learning for information extraction, AAAI Technical Report WS-99-11 (pp. 31-36).
-
-
-
-
18
-
-
0033650832
-
-
Giuffrida, G., Shek, E.C., & Yang, J. (2000). Knowledge-based metadata extraction from PostScript files. In Proceedings of the fifth ACM conference on digital libraries (pp. 77-84).
-
-
-
-
19
-
-
84941274546
-
-
Han, H., Giles, C. L., Manavoglu, E., Zha, H., Zhang, Z., & Fox, E. A. (2003). Automatic document metadata extraction using support vector machines. In Proceedings of the third ACM/IEEE-CS joint conference on digital libraries (pp. 37-48).
-
-
-
-
20
-
-
84885654015
-
-
Hu, Y., Xin, G., Song, R., Hu, G., Shi, S., Cao, Y., et al. (2005). Title extraction from bodies of HTML documents and its application to web page retrieval. In Proceedings of the twenty-eighth annual international ACM SIGIR conference (pp. 250-257).
-
-
-
-
21
-
-
34247337234
-
-
Hurst, M. (2002). Classifying TABLE elements in HTML. WhizBang!Labs. http://www2002.org/CDROM/poster/115/.
-
-
-
-
22
-
-
0034785267
-
-
Joachims, T. (2001). A statistical learning model of text classification with support vector machines. In Proceedings of the 24th ACM international conference on research and development in information retrieval (pp. 128-136).
-
-
-
-
23
-
-
84880812554
-
-
Kosala, R., Bruynooghe, M., Bussche, J. V., & Blockeel, H. (2003). Information extraction from web documents based on local unranked tree automaton inference. In Proceedings of the eighteenth international joint conference on artificial intelligence (pp. 403-408).
-
-
-
-
24
-
-
34247371810
-
-
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning (pp. 282-289).
-
-
-
-
25
-
-
34247395767
-
-
Li, Y., Zaragoza, H., Herbrich, R., Shawe-Taylor, J., & Kandola, J. S. (2002). The perceptron algorithm with uneven margins. In Proceedings of the nineteenth international conference on machine learning (pp. 379-386).
-
-
-
-
26
-
-
77952333945
-
-
Liu, B., Grossman, R., & Zhai, Y. (2003). Mining data records in web pages. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 601-606).
-
-
-
-
27
-
-
34247337623
-
-
MSHTML reference. http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/mshtml/reference/reference.asp.
-
-
-
-
28
-
-
0032684968
-
-
Muslea, I., Minton, S., & Knoblock C. (1999). A hierarchical approach to wrapper induction. In Proceedings of the third international conference on autonomous agents (pp. 190-197).
-
-
-
-
29
-
-
34247368436
-
-
Ogilvie, P., & Callan, J. (2003a). Combining structural information and the use of priors in mixed named-page and homepage finding. In Proceedings of the twelfth text retrieval conference (pp. 177-184).
-
-
-
-
30
-
-
1542287497
-
-
Ogilvie, P., & Callan, J. (2003b). Combining document representations for known-item search. In Proceedings of the twenty-sixth annual international ACM SIGIR conference on research and development in information retrieval (pp. 143-150).
-
-
-
-
31
-
-
1542287488
-
-
Pinto, D., McCallum, A., Wei, X., & Croft, W. B. (2003). Table extraction using conditional random fields. In Proceedings of the 26th ACM SIGIR conference (pp. 235-242).
-
-
-
-
32
-
-
4644340823
-
-
Reis, D., Golgher, P., Silva, A., & Laender, A. (2004). Automatic web news extraction using tree edit distance. In Proceedings of international WWW conference (pp. 502-511).
-
-
-
-
33
-
-
18744388867
-
-
Robertson, S., Zaragoza, H., & Taylor, M. (2004). Simple BM25 extension to multiple weighted fields. In Proceedings of ACM thirteenth conference on information and knowledge management (pp. 42-49).
-
-
-
-
34
-
-
18744381159
-
-
Song, R., Liu, H., Wen, J.-R., & Ma, W.-Y. (2004). Learning block importance models for web pages. In Proceedings of international WWW conference (pp. 203-211).
-
-
-
-
35
-
-
34247326425
-
-
W3C DOM Technical Committee. (2003). Document object model technical reports. http://www.w3.org/DOM/DOMTR.
-
-
-
-
36
-
-
34247388417
-
-
Wang, X. (1996). Tabular abstraction, editing, and formatting. Ph.D. thesis. University of Waterloo, Ont., Canada.
-
-
-
-
37
-
-
34548498700
-
-
Yau, H. S., & Hawker, J. S. (2004). SA_MetaMatch: relevant document discovery through document metadata and indexing. In Proceedings of ACM southeast regional conference (pp. 385-390).
-
-
-
-
38
-
-
34247402076
-
-
Zhang, M., Song, R., Lin, C., Ma, L., Jiang, Z., Jin, Y., et al. (2002). THU at TREC 2002: novelty, web, and filtering. In Proceedings of the eleventh text retrieval conference.
-
-
-
-
39
-
-
34247367159
-
-
Zhang, M., Song, R., & Ma, S. (2003). DF or IDF? On the use of HTML primary feature fields for Web IR. In Proceedings of the twelfth international world wide web conference, poster presented.
-
-
-
|