메뉴 건너뛰기




Volumn , Issue , 2012, Pages 486-493

Building large corpora from the web using a new efficient tool chain

Author keywords

Corpus evaluation; Web corpora; Web crawling

Indexed keywords

COMPUTER SOFTWARE;

EID: 84926384407     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (186)

References (12)
  • 1
    • 34250665891 scopus 로고    scopus 로고
    • Random sampling from a search engine's index
    • Edinburgh
    • Ziv Bar-Yossef and Maxim Gurevich. 2006. Random sampling from a search engine's index. In Proceedings of WWW 2006, Edinburgh.
    • (2006) Proceedings of WWW 2006
    • Bar-Yossef, Z.1    Gurevich, M.2
  • 2
    • 84977940268 scopus 로고    scopus 로고
    • Bootcat: Bootstrapping corpora and terms from the web
    • Marco Baroni and Silvia Bernardini. 2004. Bootcat: Bootstrapping corpora and terms from the web. In Proceedings of LREC 2004, pages 1313-1316.
    • (2004) Proceedings of LREC 2004 , pp. 1313-1316
    • Baroni, M.1    Bernardini, S.2
  • 4
    • 70350686154 scopus 로고    scopus 로고
    • The wacky wide web: A collection of very large linguistically processed web-crawled corpora
    • Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta. 2009. The wacky wide web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43 (3): 209-226.
    • (2009) Language Resources and Evaluation , vol.43 , Issue.3 , pp. 209-226
    • Baroni, M.1    Bernardini, S.2    Ferraresi, A.3    Zanchetta, E.4
  • 6
    • 84926125761 scopus 로고    scopus 로고
    • Measuring web-corpus randomness: A progress report
    • Marco Baroni and Silvia Bernardini, editorss, Bologna
    • Massimiliano Ciamarita and Marco Baroni. 2006. Measuring web-corpus randomness: A progress report. In Marco Baroni and Silvia Bernardini, editors, Wacky! Working papers on the Web as Corpus. GEDIT, Bologna.
    • (2006) Wacky! Working Papers on the Web As Corpus. GEDIT
    • Ciamarita, M.1    Baroni, M.2
  • 7
    • 70350700772 scopus 로고    scopus 로고
    • Experience building a large corpus for Chinese lexicon construction
    • Marco Baroni and Silvia Bernardini, editors, Bologna
    • Thomas Emerson and John O'Neil. 2006. Experience building a large corpus for Chinese lexicon construction. In Marco Baroni and Silvia Bernardini, editors, Wacky! Working papers on the Web as Corpus. GEDIT, Bologna.
    • (2006) Wacky! Working Papers on the Web As Corpus. GEDIT
    • Emerson, T.1    O'Neil, J.2
  • 8
    • 0344154403 scopus 로고    scopus 로고
    • Introduction to the special issue on the web as corpus
    • Adam Kilgarriff and Gregory Grefenstette. 2003. Introduction to the special issue on the web as corpus. Computational Linguistics, 29: 333-347.
    • (2003) Computational Linguistics , vol.29 , pp. 333-347
    • Kilgarriff, A.1    Grefenstette, G.2
  • 10
    • 0003676885 scopus 로고
    • Technical Report TR-CSE-03-01 Center for Research in Computing Technology, Harvard University, Harvard
    • Michael O. Rabin. 1981. Fingerprinting by random polynomials. Technical Report TR-CSE-03-01, Center for Research in Computing Technology, Harvard University, Harvard.
    • (1981) Fingerprinting by Random Polynomials
    • Rabin, M.O.1
  • 11
    • 42649127636 scopus 로고    scopus 로고
    • Creating general-purpose corpora using automated search engine queries
    • Marco Baroni and Silvia Bernardini, editors. GEDIT, Bologna
    • Serge Sharoff. 2006. Creating general-purpose corpora using automated search engine queries. In Marco Baroni and Silvia Bernardini, editors, Wacky! Working papers on the Web as Corpus. GEDIT, Bologna.
    • (2006) Wacky! Working Papers on the Web As Corpus
    • Sharoff, S.1
  • 12
    • 1542310280 scopus 로고    scopus 로고
    • Text classification and segmentation using minimum cross entropy
    • William J. Tehan. 2000. Text classification and segmentation using minimum cross entropy. In In Proceeding of RIAO.
    • (2000) Proceeding of RIAO
    • Tehan, W.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.