SCOPUS 정보 검색 플랫폼

Computational Linguistics in the Netherlands Journal

Volumn 3, Issue , 2013, Pages 103-120

LeTs Preprocess: The multilingual LT3 linguistic preprocessing toolkit

(6) Van De Kauter, Marjan a Coorman, Geert a Lefever, Els a Desmet, Bart a Macken, Lieve a Hoste, Véronique a

Author keywords

[No Author keywords available]

Indexed keywords

CROSS VALIDATION; DIFFERENT DOMAINS; GOLD STANDARDS; LINGUISTIC PREPROCESSINGS; NAMED ENTITIES; PART-OF-SPEECH TAGGER; PRE-PROCESSING STEP; PREPROCESSING MODULES;

COMPUTATIONAL LINGUISTICS;

EID: 84907924798 PISSN: 22114009 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (45)

References (40)

1
- 49949109929
- Building a treebank for French
- Kluwer, Dordrecht
- Abeille, A., L. Clement, and F. Toussenel (2003), Building a treebank for French, Treebanks: Building and Using Parsed Corpora, Kluwer, Dordrecht, pp. 165-188.
- (2003) Treebanks: Building and Using Parsed Corpora , pp. 165-188
- Abeille, A.¹ Clement, L.² Toussenel, F.³

2
- 0003443729
- Baayen, R. H., R. Piepenbrock, and L. Gulikers (1995), The CELEX lexical database (CD-ROM).
- (1995) The CELEX Lexical Database (CD-ROM)
- Baayen, R.H.¹ Piepenbrock, R.² Gulikers, L.³

3
- 20344396661
- The TIGER treebank
- Sozopol, Bulgaria
- Brants, S., S. Dipper, S. Hansen, W. Lezius, and G. Smith (2002), The TIGER treebank, Proceedings of the Workshop on Treebanks and Linguistic Theories, Sozopol, Bulgaria, pp. 24-41.
- (2002) Proceedings of the Workshop on Treebanks and Linguistic Theories , pp. 24-41
- Brants, S.¹ Dipper, S.² Hansen, S.³ Lezius, W.⁴ Smith, G.⁵

4
- 77956109730
- Simple data-driven context-sensitive lemmatization
- Zaragoza, Spain
- Chrupala, G. (2006), Simple data-driven context-sensitive lemmatization, Proceedings of the Sociedad Espanola para el Procesamiento del Lenguaje Natural, volume 37, Zaragoza, Spain, pp. 121-130.
- (2006) Proceedings of the Sociedad Espanola para el Procesamiento del Lenguaje Natural, Volume 37 , pp. 121-130
- Chrupala, G.¹

5
- 84876795299
- TIGER Morphologie-Annotationsschema
- Universitat des Saarlandes - Computerlinguistik, Universität Stuttgart - Institut für Maschinelle Sprachverarbeitung, Universität Potsdam - Institut für Germanistik
- Crysmann, B., S. Hansen-Schirra, G. Smith, and D. Ziegler-Eisele (2005), TIGER Morphologie-Annotationsschema, Technical report, Universitat des Saarlandes - Computerlinguistik, Universität Stuttgart - Institut für Maschinelle Sprachverarbeitung, Universität Potsdam - Institut für Germanistik.
- (2005) Technical Report
- Crysmann, B.¹ Hansen-Schirra, S.² Smith, G.³ Ziegler-Eisele, D.⁴

6
- 0040639071
- MBT: A memory-based part of speech tagger-generator
- Copenhagen, Denmark
- Daelemans, W., J. Zavrel, P. Berck, and S. Gillis (1996), MBT: A memory-based part of speech tagger-generator, Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen, Denmark, pp. 14-27.
- (1996) Proceedings of the Fourth Workshop on Very Large Corpora , pp. 14-27
- Daelemans, W.¹ Zavrel, J.² Berck, P.³ Gillis, S.⁴

7
- 84864023184
- Dutch named entity recognition using classifier ensembles
- Netherlands Graduate School of Linguistics
- Desmet, B. and V. Hoste (2010a), Dutch named entity recognition using classifier ensembles, Computational Linguistics in the Netherlands 2010: selected papers from the twentieth CLIN meeting, Netherlands Graduate School of Linguistics, pp. 29-41.
- (2010) Computational Linguistics in the Netherlands 2010: Selected Papers from the Twentieth CLIN Meeting , pp. 29-41
- Desmet, B.¹ Hoste, V.²

8
- 84899964702
- Towards a balanced named entity corpus for Dutch
- Valletta, Malta
- Desmet, B. and V. Hoste (2010b), Towards a balanced named entity corpus for Dutch, Proceedings of the seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta, pp. 535-541.
- (2010) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10) , pp. 535-541
- Desmet, B.¹ Hoste, V.²

9
- 85035364491
- SVMTool: A general POS tagger generator based on Support Vector Machines
- Lisbon, Portugal
- Giménez, J. and L. Màrquez (2004), SVMTool: A general POS tagger generator based on Support Vector Machines, Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC'04), Lisbon, Portugal, pp. 43-46.
- (2004) Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC'04) , pp. 43-46
- Giménez, J.¹ Màrquez, L.²

10
- 0002384527
- The ATIS Spoken Language Systems pilot corpus
- Hemphill, C. T., J. J. Godfrey, and G. R. Doddington (1990), The ATIS Spoken Language Systems pilot corpus, Proceedings of the DARPA Speech and Natural Language Workshop, pp. 96-101.
- (1990) Proceedings of the DARPA Speech and Natural Language Workshop , pp. 96-101
- Hemphill, C.T.¹ Godfrey, J.J.² Doddington, G.R.³

11
- 1642296635
- Efficient Support Vector Classifiers for named entity recognition
- Taipei, Taiwan
- Isozaki, H. and H. Kazawa (2002), Efficient Support Vector Classifiers for named entity recognition, Proceedings of the 19th International Conference on Computational Linguistics (COLING'02), Taipei, Taiwan, pp. 1-7.
- (2002) Proceedings of the 19th International Conference on Computational Linguistics (COLING'02) , pp. 1-7
- Isozaki, H.¹ Kazawa, H.²

12
- 44949230930
- Europarl: A parallel corpus for statistical machine translation
- Phuket, Thailand
- Koehn, P. (2005), Europarl: A parallel corpus for statistical machine translation, Proceedings of the tenth Machine Translation Summit, Phuket, Thailand, pp. 79-86.
- (2005) Proceedings of the Tenth Machine Translation Summit , pp. 79-86
- Koehn, P.¹

13
- 0026890725
- Robust part-of-speech tagging using a hidden Markov model
- Kupiec, J. (1992), Robust part-of-speech tagging using a hidden Markov model, Computer Speech and Language 6(3), pp. 225-242.
- (1992) Computer Speech and Language , vol.6 , Issue.3 , pp. 225-242
- Kupiec, J.¹

14
- 0142192295
- Conditional random fields: Probabilistic models for segmenting and labeling sequence data
- Williamstown, Massachusetts, USA
- Lafferty, J., A. McCallum, and F. Pereira (2001), Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of the eighteenth International Conference on Machine Learning (ICML'01), Williamstown, Massachusetts, USA, pp. 282-289.
- (2001) Proceedings of the Eighteenth International Conference on Machine Learning (ICML'01) , pp. 282-289
- Lafferty, J.¹ McCallum, A.² Pereira, F.³

15
- 78650445024
- A chunk-driven bootstrapping approach to extracting translation patterns
- Springer-Verlag, Berlin Heidelberg
- Macken, L. and W. Daelemans (2010), A chunk-driven bootstrapping approach to extracting translation patterns, Proceedings of the 11th International Conference on Intelligent Text Processing and Computational Linguistics (Iaşi, Romania), Vol. 6008 of Lecture Notes in Computer Science, Springer-Verlag, Berlin Heidelberg, pp. 394-405.
- (2010) Proceedings of the 11th International Conference on Intelligent Text Processing and Computational Linguistics (Iaşi, Romania), Vol. 6008 of Lecture Notes in Computer Science , pp. 394-405
- Macken, L.¹ Daelemans, W.²

16
- 84877027694
- TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment
- John Benjamins Publishing Company
- Macken, L., E. Lefever, and V. Hoste (2013), TExSIS: bilingual terminology extraction from parallel corpora using chunk-based alignment, Terminology 19(1), pp. 1-30, John Benjamins Publishing Company.
- (2013) Terminology , vol.19 , Issue.1 , pp. 1-30
- Macken, L.¹ Lefever, E.² Hoste, V.³

17
- 34249852033
- Building a large annotated corpus of English: The Penn Treebank
- Marcus, M. P., B. Santorini, and M. A. Marcinkiewicz (1993), Building a large annotated corpus of English: The Penn Treebank, Computational Linguistics 19(2), pp. 313-330.
- (1993) Computational Linguistics , vol.19 , Issue.2 , pp. 313-330
- Marcus, M.P.¹ Santorini, B.² Marcinkiewicz, M.A.³

18
- 85121365374
- Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons
- Morristown, New Jersey, USA
- McCallum, A. and W. Li (2003), Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons, Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, Morristown, New Jersey, USA, pp. 188-191.
- (2003) Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 , pp. 188-191
- McCallum, A.¹ Li, W.²

19
- 84921920805
- Language resources as by-product of evaluation: The MULTITAG example
- Athens, Greece
- Paroubek, P. (2000), Language resources as by-product of evaluation: the MULTITAG example, Proceedings of the Second International Conference on Language Resources and Evaluation (LREC'00), Athens, Greece, pp. 151-154.
- (2000) Proceedings of the Second International Conference on Language Resources and Evaluation (LREC'00) , pp. 151-154
- Paroubek, P.¹

20
- 84897025037
- Dutch Parallel Corpus: A balanced parallel corpus for Dutch-English and Dutch-French
- Springer
- Paulussen, H., L. Macken, W. Vandeweghe, and P. Desmet (2013), Dutch Parallel Corpus: a balanced parallel corpus for Dutch-English and Dutch-French, Essential Speech and Language Technology for Dutch pp. 185-199, Springer.
- (2013) Essential Speech and Language Technology for Dutch , pp. 185-199
- Paulussen, H.¹ Macken, L.² Vandeweghe, W.³ Desmet, P.⁴

21
- 70349327885
- Standards going concrete: From LMF to Morphalou
- Geneva, Switzerland
- Romary, L., S. Salmon-Alt, and G. Francopoulo (2004), Standards going concrete: from LMF to Morphalou, Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries - COLING-2004, Geneva, Switzerland, pp. 22-28.
- (2004) Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries - COLING-2004 , pp. 22-28
- Romary, L.¹ Salmon-Alt, S.² Francopoulo, G.³

22
- 84967332741
- The Lefff, a freely available and large-coverage morphological and syntactic lexicon for French
- Valletta, Malta
- Sagot, B. (2010), The Lefff, a freely available and large-coverage morphological and syntactic lexicon for French, Proceedings of the seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta.
- (2010) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
- Sagot, B.¹

23
- 0012941680
- Part-of-speech tagging guidelines for the Penn Treebank Project
- University of Pennsylvania, Department of Computer and Information Science, Philadelphia, Pennsylvania, USA
- Santorini, B. (1990), Part-of-speech tagging guidelines for the Penn Treebank Project, Technical report, University of Pennsylvania, Department of Computer and Information Science, Philadelphia, Pennsylvania, USA.
- (1990) Technical Report
- Santorini, B.¹

24
- 33646195659
- Guidelines für das Tagging deutscher Textcorpora mit STTS (kleines und großes Tagset)
- Universität Stuttgart - Institut für maschinelle Sprachverarbeitung, Universität Tübingen - Seminar für Sprachwissenschaft
- Schiller, A., S. Teufel, C. Stöckert, and C. Thielen (1999), Guidelines für das Tagging deutscher Textcorpora mit STTS (kleines und großes Tagset), Technical report, Universität Stuttgart - Institut für maschinelle Sprachverarbeitung, Universität Tübingen - Seminar für Sprachwissenschaft.
- (1999) Technical Report
- Schiller, A.¹ Teufel, S.² Stöckert, C.³ Thielen, C.⁴

25
- 0002363874
- Probabilistic part-of-speech tagging using decision trees
- Manchester, UK
- Schmid, H. (1994), Probabilistic part-of-speech tagging using decision trees, Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK, pp. 44-49.
- (1994) Proceedings of the International Conference on New Methods in Language Processing , pp. 44-49
- Schmid, H.¹

26
- 0042096783
- Improvements in part-of-speech tagging with an application to German
- Dublin, Ireland
- Schmid, H. (1995), Improvements in part-of-speech tagging with an application to German, Proceedings of the ACL SIGDAT-Workshop, Dublin, Ireland, pp. 47-50.
- (1995) Proceedings of the ACL SIGDAT-workshop , pp. 47-50
- Schmid, H.¹

27
- 84870559977
- Cultivating trees: Adding several semantic layers to the Lassy treebank in SoNaR
- Groningen, The Netherlands
- Schuurman, I., V. Hoste, and P. Monachesi (2009), Cultivating trees: Adding several semantic layers to the Lassy treebank in SoNaR, Proceedings of the 7th International Workshop on Treebanks and Linguistic Theories, Groningen, The Netherlands, pp. 135-146.
- (2009) Proceedings of the 7th International Workshop on Treebanks and Linguistic Theories , pp. 135-146
- Schuurman, I.¹ Hoste, V.² Monachesi, P.³

28
- 0346713396
- A linguistically interpreted corpus of German newspaper text
- Saarbrücken, Germany
- Skut, W., T. Brants, B. Krenn, and H. Uszkoreit (1998), A linguistically interpreted corpus of German newspaper text, Proceedings of the Tenth European Summer School in Logic, Language and Information (ESSLLI'98). Workshop on Recent Advances in Corpus Annotation, Saarbrücken, Germany, pp. 705-711.
- (1998) Proceedings of the Tenth European Summer School in Logic, Language and Information (ESSLLI'98). Workshop on Recent Advances in Corpus Annotation , pp. 705-711
- Skut, W.¹ Brants, T.² Krenn, B.³ Uszkoreit, H.⁴

29
- 84907925004
- A brief introduction to the TIGER treebank, version 1
- Universität Potsdam
- Smith, G. (2003), A brief introduction to the TIGER treebank, version 1, Technical report, Universität Potsdam.
- (2003) Technical Report
- Smith, G.¹

30
- 84887456789
- The Tüba-D/Z treebank: Annotating German with a context-free backbone
- Lisbon, Portugal
- Telljohann, H., E. Hinrichs, and S. Kübler (2004), The Tüba-D/Z treebank: annotating German with a context-free backbone, Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC'04), Lisbon, Portugal, pp. 2229-2235.
- (2004) Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC'04) , pp. 2229-2235
- Telljohann, H.¹ Hinrichs, E.² Kübler, S.³

31
- 9444254243
- Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition
- Taipei, Taiwan
- Tjong Kim Sang, E. F. (2002a), Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition, Proceedings of the 6th Conference on Natural Language Learning (COLING'02), Taipei, Taiwan, pp. 155-158.
- (2002) Proceedings of the 6th Conference on Natural Language Learning (COLING'02) , pp. 155-158
- Tjong Kim Sang, E.F.¹

32
- 9444254243
- Memory-based named entity recognition
- Taipei, Taiwan
- Tjong Kim Sang, E. F. (2002b), Memory-based named entity recognition, Proceedings of CoNLL-2002, Taipei, Taiwan, pp. 203-206.
- (2002) Proceedings of CoNLL-2002 , pp. 203-206
- Tjong Kim Sang, E.F.¹

33
- 85099019865
- Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition
- Edmonton, Canada
- Tjong Kim Sang, E. F. and F. De Meulder (2003), Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition, Proceedings of CoNLL-2003, Edmonton, Canada, pp. 142-147.
- (2003) Proceedings of CoNLL-2003 , pp. 142-147
- Tjong Kim Sang, E.F.¹ De Meulder, F.²

34
- 0043272902
- Memory-based morphological analysis
- University of Maryland, USA
- Van Den Bosch, A. and W. Daelemans (1999), Memory-based morphological analysis, Proceedings of the 37th annual meeting of the Association for Computational Linguistics (ACL'99), University of Maryland, USA, pp. 285-292.
- (1999) Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL'99) , pp. 285-292
- Van Den Bosch, A.¹ Daelemans, W.²

35
- 78650468021
- An efficient memory-based morphosyntactic tagger and parser for Dutch
- Leuven, Belgium
- Van Den Bosch, A., B. Busser, W. Daelemans, and S. Canisius (2007), An efficient memory-based morphosyntactic tagger and parser for Dutch, Computational Linguistics in the Netherlands 2006, Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting, Leuven, Belgium, pp. 99-114.
- (2007) Computational Linguistics in the Netherlands 2006, Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting , pp. 99-114
- Van Den Bosch, A.¹ Busser, B.² Daelemans, W.³ Canisius, S.⁴

36
- 57349166605
- Part of speech tagging en lemmatisering van het D-Coi corpus
- Centrum voor Computerlingüistiek, KU Leuven
- Van Eynde, F. (2005), Part of speech tagging en lemmatisering van het D-Coi corpus, Technical report, Centrum voor Computerlingüistiek, KU Leuven.
- (2005) Technical Report
- Van Eynde, F.¹

37
- 84907946964
- Center for Language Studies, Radboud University Nijmegen
- Van Gompel, M. (2012), FoLiA: Format for Linguistic Annotation. Documentation. ILK Technical Report 12-03, Technical report, Center for Language Studies, Radboud University Nijmegen. http://ilk.uvt.nl/downloads/pub/papers/ilk.1203.pdf.
- (2012) FoLiA: Format for Linguistic Annotation. Documentation. ILK Technical Report 12-03
- Van Gompel, M.¹

38
- 84907943087
- Large scale syntactic annotation of written Dutch: Lassy
- Springer Berlin Heidelberg
- Van Noord, G., G. Bouma, F. Van Eynde, D. De Kok, J. Van Der Linde, I. Schuurman, E. Tjong Kim Sang, and V. Vandeghinste (2009), Large scale syntactic annotation of written Dutch: Lassy, Essential Speech and Language Technology for Dutch, Springer Berlin Heidelberg, pp. 147-164.
- (2009) Essential Speech and Language Technology for Dutch , pp. 147-164
- Van Noord, G.¹ Bouma, G.² Van Eynde, F.³ De Kok, D.⁴ Van Der Linde, J.⁵ Schuurman, I.⁶ Tjong Kim Sang, E.⁷ Vandeghinste, V.⁸

39
- 0004217877
- Buttersworth, London
- Van Rijsbergen, C. J. (1975), Information Retrieval, Buttersworth, London.
- (1975) Information Retrieval
- Van Rijsbergen, C.J.¹

40
- 85147755528
- Named entity recognition using an HMM-based Chunk Tagger
- Philadelphia, USA
- Zhou, GD. and J. Su (2002), Named entity recognition using an HMM-based Chunk Tagger, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL'02), Philadelphia, USA, pp. 473-480.
- (2002) Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL'02) , pp. 473-480
- Zhou, G.D.¹ Su, J.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.