SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012

Volumn , Issue , 2012, Pages 502-506

Building a 70 billion word corpus of English from ClueWeb

(3) Pomikalek, Jan a,b Jakubicek, Milos a,b Rychly, Pavel a,b

a MASARYK UNIVERSITY (Czech Republic)

b LEXICAL COMPUTING LTD (United Kingdom)

Author keywords

Clueweb; Corpus; Encoding; English; Word sketch

Indexed keywords

ENCODING (SYMBOLS); INDEXING (MATERIALS WORKING); QUERY LANGUAGES;

CLUEWEB; CORPUS; ENGLISH; LANGUAGE RESOURCES; MANAGEMENT SYSTEMS; NEAR- DUPLICATES; PRE-PROCESSING STEP; WORD SKETCH;

DATA HANDLING;

EID: 84907013032 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (17)

References (11)

1
- 84977940268
- BootCaT: Bootstrapping corpora and terms from the web
- Marco Baroni and Silvia Bernardini. 2004. BootCaT: Bootstrapping Corpora and Terms from the Web. In Proceedings of LREC, volume 4.
- (2004) Proceedings of LREC , vol.4
- Baroni, M.¹ Bernardini, S.²

2
- 79956075292
- Identifying and filtering near-duplicate documents
- Springer
- Andrei Broder. 2000. Identifying and Filtering Near-Duplicate Documents. In Combinatorial Pattern Matching, pages 1-10. Springer.
- (2000) Combinatorial Pattern Matching , pp. 1-10
- Broder, A.¹

3
- 0036040277
- Similarity estimation techniques from rounding algorithms
- ACM
- Moses S. Charikar. 2002. Similarity Estimation Techniques from Rounding Algorithms. In Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, pages 380-388. ACM.
- (2002) Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing , pp. 380-388
- Charikar, M.S.¹

4
- 84863863426
- Fast syntactic searching in very large corpora for many languages
- Tokyo
- Miloš Jakubí?cek, Pavel Rychlý, Adam Kilgarriff, and Diana McCarthy. 2010. Fast Syntactic Searching in Very Large Corpora for Many Languages. In PACLIC 24 Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, pages 741-747, Tokyo.
- (2010) PACLIC 24 Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation , pp. 741-747
- Jakubícek, M.¹ Rychlý, P.² Kilgarriff, A.³ McCarthy, D.⁴

5
- 33750691522
- The sketch engine
- Lorient, France. Universite de Bretagne-Sud
- A. Kilgarriff, P. Rychlý, P. Smrž, and D. Tugwell. 2004. The Sketch Engine. In Proceedings of the Eleventh EURALEX International Congress, pages 105-116, Lorient, France. Universite de Bretagne-Sud.
- (2004) Proceedings of the Eleventh EURALEX International Congress , pp. 105-116
- Kilgarriff, A.¹ Rychlý, P.² Smrž, P.³ Tugwell, D.⁴

6
- 85037339072
- A corpus factory for many languages
- Valetta, Malta: ELRA
- Adam Kilgarriff, Siva Reddy, Jan Pomikálek, and Avinesh PVS. 2010. A corpus factory for many languages. In LREC Workshop on Web Services and Processing Pipelines. Valetta, Malta: ELRA.
- (2010) LREC Workshop on Web Services and Processing Pipelines
- Kilgarriff, A.¹ Reddy, S.² Pomikálek, J.³ Pvs, A.⁴

7
- 84889592306
- Phd thesis, Masaryk University
- Jan Pomikálek. 2011. Removing Boilerplate and Duplicate Content from Web Corpora. Phd thesis, Masaryk University.
- (2011) Removing Boilerplate and Duplicate Content from Web Corpora
- Pomikálek, J.¹

8
- 84859923220
- PhD Thesis, Masaryk University, Brno
- Pavel Rychlý. 2000. Korpusové manažery a jejich efektivní implementace. PhD Thesis, Masaryk University, Brno.
- (2000) Korpusové Manažery A Jejich Efektivní Implementace
- Rychlý, P.¹

9
- 79957279652
- Manatee/bonito - A modular corpus manager
- Brno
- Pavel Rychlý. 2007. Manatee/Bonito-A Modular Corpus Manager. In 1st Workshop on Recent Advances in Slavonic Natural Language Processing, pages 65-70, Brno.
- (2007) 1st Workshop on Recent Advances in Slavonic Natural Language Processing , pp. 65-70
- Rychlý, P.¹

10
- 0002363874
- Probabilistic part-of-speech tagging using decision trees
- Manchester, UK
- Helmut Schmid. 1994. Probabilistic Part-of-Speech Tagging Using Decision Trees. In Proceedings of International Conference on New Methods in Language Processing, volume 12, pages 44-49. Manchester, UK.
- (1994) Proceedings of International Conference on New Methods in Language Processing , vol.12 , pp. 44-49
- Schmid, H.¹

11
- 0003586256
- Addison-Wesley Press
- George Kingsley Zipf. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley Press.
- (1949) Human Behavior and the Principle of Least Effort
- Kingsley Zipf, G.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.