메뉴 건너뛰기




Volumn 25, Issue 2, 2010, Pages 303-326

xCrawl: A high-recall crawling method for Web mining

Author keywords

Information extraction; Information retrieval; Web crawling; Web mining

Indexed keywords

INFORMATION RETRIEVAL; MINING MACHINERY; WEB CRAWLER; WEBSITES;

EID: 78049437739     PISSN: 02191377     EISSN: 02193116     Source Type: Journal    
DOI: 10.1007/s10115-009-0266-3     Document Type: Article
Times cited : (13)

References (32)
  • 1
    • 84874371227 scopus 로고    scopus 로고
    • Intelligent crawling on the World Wide Web with arbitrary predicates
    • V. Y. Shen, N. Saito, R. M. Lyu, and M. E. Zurko (Eds.), New York: ACM
    • Aggarwal CC, Al-Garawi F, Yu PS (2001) Intelligent crawling on the World Wide Web with arbitrary predicates. In: Shen VY, Saito N, Lyu RM, Zurko ME (eds) Proceedings of the 10th international world wide web conference. ACM, New York, pp 96-105.
    • (2001) Proceedings of the 10th International World Wide Web Conference , pp. 96-105
    • Aggarwal, C.C.1    Al-Garawi, F.2    Yu, P.S.3
  • 2
    • 0344065593 scopus 로고    scopus 로고
    • Querying text databases for efficient information extraction
    • U. Dayal, K. Ramamritham, and T. M. Vijayaraman (Eds.), Los Alamitos: IEEE Computer Society
    • Agichtein E, Gravano L (2003) Querying text databases for efficient information extraction. In: Dayal U, Ramamritham K, Vijayaraman TM (eds) Proceedings of the 19th IEEE international conference on data engineering. IEEE Computer Society, Los Alamitos, pp 113-124.
    • (2003) Proceedings of the 19th IEEE International Conference on Data Engineering , pp. 113-124
    • Agichtein, E.1    Gravano, L.2
  • 3
    • 34548764453 scopus 로고    scopus 로고
    • Crawling for domain-specific Hidden Web resources
    • T. Catarci, M. Mercella, J. Mylopoulos, and M. E. Orlowska (Eds.), Los Alamitos: IEEE Computer Society
    • Bergholz A, Chidlovskii B (2003) Crawling for domain-specific Hidden Web resources. In: Catarci T, Mercella M, Mylopoulos J, Orlowska ME (eds) Proceedings of the fourth international conference on web information systems engineering. IEEE Computer Society, Los Alamitos, pp 125-133.
    • (2003) Proceedings of the Fourth International Conference on Web Information Systems Engineering , pp. 125-133
    • Bergholz, A.1    Chidlovskii, B.2
  • 5
    • 0033294474 scopus 로고    scopus 로고
    • Focused crawling: a new approach to topic-specific Web resource discovery
    • Chakrabarti S, van den Berg M, Dom B (1999) Focused crawling: a new approach to topic-specific Web resource discovery. Comput Netw 31: 1623-1640.
    • (1999) Comput Netw , vol.31 , pp. 1623-1640
    • Chakrabarti, S.1    van den Berg, M.2    Dom, B.3
  • 8
    • 0034172483 scopus 로고    scopus 로고
    • Learning to construct knowledge bases from the World Wide Web
    • Craven M, DiPasquo D, Freitag D et al (2000) Learning to construct knowledge bases from the World Wide Web. Artif Intell 118: 69-113.
    • (2000) Artif Intell , vol.118 , pp. 69-113
    • Craven, M.1    Dipasquo, D.2    Freitag, D.3
  • 11
    • 29344454459 scopus 로고    scopus 로고
    • SemTag and seeker: bootstrapping the semantic Web via automated semantic annotation
    • G. Hencsey, B. White, and Y. Chen (Eds.), New York: ACM
    • Dill S, Eiron N, Gibson D et al (2003) SemTag and seeker: bootstrapping the semantic Web via automated semantic annotation. In: Hencsey G, White B, Chen Y et al (eds) Proceedings of the 12th international conference on world wide web. ACM, New York, pp 178-186.
    • (2003) Proceedings of the 12th International Conference on World Wide Web , pp. 178-186
    • Dill, S.1    Eiron, N.2    Gibson, D.3
  • 13
    • 33846227408 scopus 로고    scopus 로고
    • An integrated environment for the development of knowledge-based recommender applications
    • Felfernig A, Friedrich G, Jannach D et al (2007) An integrated environment for the development of knowledge-based recommender applications. Int J Electron Commer 11: 11-34.
    • (2007) Int J Electron Commer , vol.11 , pp. 11-34
    • Felfernig, A.1    Friedrich, G.2    Jannach, D.3
  • 14
    • 35348900845 scopus 로고    scopus 로고
    • Towards domain-independent information extraction from web tables
    • C. L. Williamson, M. E. Zurko, and P. F. Patel-Schneider (Eds.), New York: ACM
    • Gatterbauer W, Bohunsky P, Herzog M et al (2007) Towards domain-independent information extraction from web tables. In: Williamson CL, Zurko ME, Patel-Schneider PF et al (eds) Proceedings of the 16th international conference on world wide web. ACM, New York.
    • (2007) Proceedings of the 16th International Conference on World Wide Web
    • Gatterbauer, W.1    Bohunsky, P.2    Herzog, M.3
  • 15
    • 0041848443 scopus 로고    scopus 로고
    • Topic-Sensitive PageRank: a context-sensitive ranking algorithm for Web search
    • Haveliwala TH (2003) Topic-Sensitive PageRank: a context-sensitive ranking algorithm for Web search. IEEE Trans Knowl Data Eng 15: 784-796.
    • (2003) IEEE Trans Knowl Data Eng , vol.15 , pp. 784-796
    • Haveliwala, T.H.1
  • 18
    • 4243148480 scopus 로고    scopus 로고
    • Authoritative sources in a hyperlinked environment
    • Kleinberg J (1999) Authoritative sources in a hyperlinked environment. J ACM 46: 604-632.
    • (1999) J ACM , vol.46 , pp. 604-632
    • Kleinberg, J.1
  • 19
    • 84899841718 scopus 로고    scopus 로고
    • The Web as a graph: measurements, models, and methods
    • Asano T, Imai H, Lee DT et al (eds), Lecture notes in computer science, Springer, Berlin
    • Kleinberg J, Kumar R, Raghavan P et al (1999) The Web as a graph: measurements, models, and methods. In: Asano T, Imai H, Lee DT et al (eds) Proceedings of the 5th annual international conference on computing and combinatorics. Lecture notes in computer science, vol 1627. Springer, Berlin, pp 1-17.
    • (1999) Proceedings of the 5th annual international conference on computing and combinatorics , vol.1627 , pp. 1-17
    • Kleinberg, J.1    Kumar, R.2    Raghavan, P.3
  • 22
    • 51749092434 scopus 로고    scopus 로고
    • Crawling AJAX by inferring user interface state changes
    • D. Schwabe, F. Curbera, and P. Dantzig (Eds.), Los Alamitos: IEEE Computer Society
    • Mesbah A, Bozdag E, van Deursen A (2008) Crawling AJAX by inferring user interface state changes. In: Schwabe D, Curbera F, Dantzig P (eds) Proceedings of the 8th international conference on web engineering. IEEE Computer Society, Los Alamitos, pp 122-134.
    • (2008) Proceedings of the 8th International Conference on Web Engineering , pp. 122-134
    • Mesbah, A.1    Bozdag, E.2    van Deursen, A.3
  • 23
    • 51049094206 scopus 로고    scopus 로고
    • SVM based adaptive learning method for text classification from positive and unlabeled documents
    • Peng T, Zuo W, He F (2008) SVM based adaptive learning method for text classification from positive and unlabeled documents. Knowl Inf Syst 16: 281-301.
    • (2008) Knowl Inf Syst , vol.16 , pp. 281-301
    • Peng, T.1    Zuo, W.2    He, F.3
  • 24
    • 0000133751 scopus 로고    scopus 로고
    • Using reinforcement learning to spider the Web efficiently
    • I. Bratko and S. Dzeroski (Eds.), San Francisco: Morgan Kaufmann
    • Rennie J, McCallum A (1999) Using reinforcement learning to spider the Web efficiently. In: Bratko I, Dzeroski S (eds) Proceedings of the 16th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 335-343.
    • (1999) Proceedings of the 16th International Conference on Machine Learning , pp. 335-343
    • Rennie, J.1    McCallum, A.2
  • 25
    • 0001737422 scopus 로고
    • On term selection for query expansion
    • Robertson SE (1990) On term selection for query expansion. J Documentation 46: 359-364.
    • (1990) J Documentation , vol.46 , pp. 359-364
    • Robertson, S.E.1
  • 26
    • 58849118940 scopus 로고    scopus 로고
    • Do not crawl in the DUST: different URLs with similar text
    • Schonfeld U, Bar-Yossef Z, Keidar I (2009) Do not crawl in the DUST: different URLs with similar text. ACM Trans Web 3: 3-31.
    • (2009) ACM Trans Web , vol.3 , pp. 3-31
    • Schonfeld, U.1    Bar-Yossef, Z.2    Keidar, I.3
  • 29
    • 41149096059 scopus 로고    scopus 로고
    • Random walk with restart: fast solutions and applications
    • Tong H, Faloutsos C, Pan JY (2008) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14: 327-346.
    • (2008) Knowl Inf Syst , vol.14 , pp. 327-346
    • Tong, H.1    Faloutsos, C.2    Pan, J.Y.3
  • 30
    • 67349109407 scopus 로고    scopus 로고
    • Using Wikipedia knowledge to improve text classification
    • Wang P, Hu J, Zeng HJ et al (2009) Using Wikipedia knowledge to improve text classification. Knowl Inf Syst 19: 265-281.
    • (2009) Knowl Inf Syst , vol.19 , pp. 265-281
    • Wang, P.1    Hu, J.2    Zeng, H.J.3
  • 32
    • 0742268826 scopus 로고    scopus 로고
    • PEBL: web page classification without negative examples
    • Yu H, Han J, Chang KCC (2004) PEBL: web page classification without negative examples. IEEE Trans Knowl Data Eng 16: 70-81.
    • (2004) IEEE Trans Knowl Data Eng , vol.16 , pp. 70-81
    • Yu, H.1    Han, J.2    Chang, K.C.C.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.