메뉴 건너뛰기




Volumn 43, Issue 5, 2007, Pages 1332-1347

Web page title extraction and its application

Author keywords

HTML document; Information retrieval; Metadata extraction

Indexed keywords

HTML; METADATA; PROBLEM SOLVING; SUPERVISED LEARNING; WEBSITES;

EID: 34247363313     PISSN: 03064573     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.ipm.2006.11.007     Document Type: Article
Times cited : (36)

References (39)
  • 1
    • 34247354933 scopus 로고    scopus 로고
    • Amitay, E., Carmel, D., Darlow, A., Lempel, R., & Soffer, A. (2002). Topic distillation with knowledge agents. In Proceedings of the eleventh text retrieval conference.
  • 3
    • 34247376881 scopus 로고    scopus 로고
    • Breuel, T. M. (2003). Information extraction from HTML documents by structural matching. In Proceedings of the second international workshop on web document analysis.
  • 4
    • 34247348589 scopus 로고    scopus 로고
    • Cai, D., Yu, S., Wen, J.-R., & Ma, W.-Y. (2003). VIPS: a vision-based page segmentation algorithm. Microsoft technical report (MSR-TR-2003-79).
  • 5
    • 85042021254 scopus 로고    scopus 로고
    • Chang, C. H. & Lui, S. C. (2001). IEPAD: information extraction based on pattern discovery. In Proceedings of the tenth international conference on World Wide Web (pp. 681-688).
  • 6
    • 34247387540 scopus 로고    scopus 로고
    • Chaudhuri, B. B. & Garain, U. (1999). Extraction of type style-based meta-information from imaged documents. In the Proceedings of the fifth international conference on document analysis and recognition (pp. 138-149).
  • 7
    • 84974668178 scopus 로고    scopus 로고
    • Chidlovskii, B., Ragetli, J., & de Rijke, M. (2000). Wrapper generation via grammar induction. In Proceedings of the eleventh european conference on machine learning (pp. 96-108).
  • 8
    • 34247375161 scopus 로고    scopus 로고
    • Collins-Thompson, K., Ogilvie, P., Zhang, Y., & Callan, J. (2002). Information filtering, novelty detection, and named-page finding. In Proceedings of the eleventh text retrieval conference.
  • 9
    • 34247400666 scopus 로고    scopus 로고
    • Craswell, N., & Hawking, D. (2003). Overview of the TREC 2003 web track. In Proceedings of the twelfth text retrieval conference (pp. 78-93).
  • 10
    • 18344370996 scopus 로고    scopus 로고
    • HTML tags as extraction cues for web page description construction
    • Craven T.C. HTML tags as extraction cues for web page description construction. Informing Science Journal 6 (2003) 1-12
    • (2003) Informing Science Journal , vol.6 , pp. 1-12
    • Craven, T.C.1
  • 11
    • 84944327150 scopus 로고    scopus 로고
    • Crescenzi, V., Mecca, G. & Merialdo, P. (2001). Roadrunner: towards automatic data extraction from large web sites. In Proceedings of the twenty-seventh international conference on very large databases (pp. 109-118).
  • 12
    • 0036039589 scopus 로고    scopus 로고
    • Crescenzi, V., Mecca, G., & Merialdo, P. (2002). Wrapping-oriented classification of web pages. In Proceedings of the 2002 ACM symposium on applied computing (pp. 1108-1112).
  • 13
    • 84921693565 scopus 로고    scopus 로고
    • Cutler, M., Shih, T., & Meng, Y. (1997). Using the structure of HTML documents to improve retrieval. In Proceedings of the USENIX symposium on internet technologies and systems (pp. 241-251).
  • 14
    • 34247353240 scopus 로고    scopus 로고
    • Eikvil, L. (1999). Information extraction from world wide web - a survey. Technical Report 945, Norweigan Computing Center, Oslo, Norway.
  • 15
    • 34247344859 scopus 로고    scopus 로고
    • Evans, D. K., Klavans, J. L., & McKeown, K. R. (2004). Columbia newsblaster: multilingual news summarization on the web. In Proceedings of human language technology conference/North American chapter of the Association for computational linguistics annual meeting (pp. 1-4).
  • 16
    • 0033907729 scopus 로고    scopus 로고
    • Machine learning for information extraction in informal domains
    • Freitag D. Machine learning for information extraction in informal domains. Machine Learning 39 2/3 (2000) 169-202
    • (2000) Machine Learning , vol.39 , Issue.2-3 , pp. 169-202
    • Freitag, D.1
  • 17
    • 34247325981 scopus 로고    scopus 로고
    • Freitag, D., & McCallum, A. (1999). Information extraction with HMMs and shrinkage. In Proceedings of the AAAI'99 workshop on machine learning for information extraction, AAAI Technical Report WS-99-11 (pp. 31-36).
  • 18
    • 0033650832 scopus 로고    scopus 로고
    • Giuffrida, G., Shek, E.C., & Yang, J. (2000). Knowledge-based metadata extraction from PostScript files. In Proceedings of the fifth ACM conference on digital libraries (pp. 77-84).
  • 19
    • 84941274546 scopus 로고    scopus 로고
    • Han, H., Giles, C. L., Manavoglu, E., Zha, H., Zhang, Z., & Fox, E. A. (2003). Automatic document metadata extraction using support vector machines. In Proceedings of the third ACM/IEEE-CS joint conference on digital libraries (pp. 37-48).
  • 20
    • 84885654015 scopus 로고    scopus 로고
    • Hu, Y., Xin, G., Song, R., Hu, G., Shi, S., Cao, Y., et al. (2005). Title extraction from bodies of HTML documents and its application to web page retrieval. In Proceedings of the twenty-eighth annual international ACM SIGIR conference (pp. 250-257).
  • 21
    • 34247337234 scopus 로고    scopus 로고
    • Hurst, M. (2002). Classifying TABLE elements in HTML. WhizBang!Labs. http://www2002.org/CDROM/poster/115/.
  • 22
    • 0034785267 scopus 로고    scopus 로고
    • Joachims, T. (2001). A statistical learning model of text classification with support vector machines. In Proceedings of the 24th ACM international conference on research and development in information retrieval (pp. 128-136).
  • 23
    • 84880812554 scopus 로고    scopus 로고
    • Kosala, R., Bruynooghe, M., Bussche, J. V., & Blockeel, H. (2003). Information extraction from web documents based on local unranked tree automaton inference. In Proceedings of the eighteenth international joint conference on artificial intelligence (pp. 403-408).
  • 24
    • 34247371810 scopus 로고    scopus 로고
    • Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning (pp. 282-289).
  • 25
    • 34247395767 scopus 로고    scopus 로고
    • Li, Y., Zaragoza, H., Herbrich, R., Shawe-Taylor, J., & Kandola, J. S. (2002). The perceptron algorithm with uneven margins. In Proceedings of the nineteenth international conference on machine learning (pp. 379-386).
  • 26
    • 77952333945 scopus 로고    scopus 로고
    • Liu, B., Grossman, R., & Zhai, Y. (2003). Mining data records in web pages. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 601-606).
  • 27
    • 34247337623 scopus 로고    scopus 로고
    • MSHTML reference. http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/mshtml/reference/reference.asp.
  • 28
    • 0032684968 scopus 로고    scopus 로고
    • Muslea, I., Minton, S., & Knoblock C. (1999). A hierarchical approach to wrapper induction. In Proceedings of the third international conference on autonomous agents (pp. 190-197).
  • 29
    • 34247368436 scopus 로고    scopus 로고
    • Ogilvie, P., & Callan, J. (2003a). Combining structural information and the use of priors in mixed named-page and homepage finding. In Proceedings of the twelfth text retrieval conference (pp. 177-184).
  • 30
    • 1542287497 scopus 로고    scopus 로고
    • Ogilvie, P., & Callan, J. (2003b). Combining document representations for known-item search. In Proceedings of the twenty-sixth annual international ACM SIGIR conference on research and development in information retrieval (pp. 143-150).
  • 31
    • 1542287488 scopus 로고    scopus 로고
    • Pinto, D., McCallum, A., Wei, X., & Croft, W. B. (2003). Table extraction using conditional random fields. In Proceedings of the 26th ACM SIGIR conference (pp. 235-242).
  • 32
    • 4644340823 scopus 로고    scopus 로고
    • Reis, D., Golgher, P., Silva, A., & Laender, A. (2004). Automatic web news extraction using tree edit distance. In Proceedings of international WWW conference (pp. 502-511).
  • 33
    • 18744388867 scopus 로고    scopus 로고
    • Robertson, S., Zaragoza, H., & Taylor, M. (2004). Simple BM25 extension to multiple weighted fields. In Proceedings of ACM thirteenth conference on information and knowledge management (pp. 42-49).
  • 34
    • 18744381159 scopus 로고    scopus 로고
    • Song, R., Liu, H., Wen, J.-R., & Ma, W.-Y. (2004). Learning block importance models for web pages. In Proceedings of international WWW conference (pp. 203-211).
  • 35
    • 34247326425 scopus 로고    scopus 로고
    • W3C DOM Technical Committee. (2003). Document object model technical reports. http://www.w3.org/DOM/DOMTR.
  • 36
    • 34247388417 scopus 로고    scopus 로고
    • Wang, X. (1996). Tabular abstraction, editing, and formatting. Ph.D. thesis. University of Waterloo, Ont., Canada.
  • 37
    • 34548498700 scopus 로고    scopus 로고
    • Yau, H. S., & Hawker, J. S. (2004). SA_MetaMatch: relevant document discovery through document metadata and indexing. In Proceedings of ACM southeast regional conference (pp. 385-390).
  • 38
    • 34247402076 scopus 로고    scopus 로고
    • Zhang, M., Song, R., Lin, C., Ma, L., Jiang, Z., Jin, Y., et al. (2002). THU at TREC 2002: novelty, web, and filtering. In Proceedings of the eleventh text retrieval conference.
  • 39
    • 34247367159 scopus 로고    scopus 로고
    • Zhang, M., Song, R., & Ma, S. (2003). DF or IDF? On the use of HTML primary feature fields for Web IR. In Proceedings of the twelfth international world wide web conference, poster presented.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.