메뉴 건너뛰기




Volumn 50, Issue 4, 2016, Pages 729-766

TweetLID: a benchmark for tweet language identification

Author keywords

Language identification; Multilingualism; Short texts; Similar languages; Tweets

Indexed keywords


EID: 84944548873     PISSN: 1574020X     EISSN: 15728412     Source Type: Journal    
DOI: 10.1007/s10579-015-9317-4     Document Type: Article
Times cited : (53)

References (78)
  • 1
    • 84994786175 scopus 로고    scopus 로고
    • Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (). Sentiment analysis of twitter data. (pp. 30–38). Association for Computational Linguistics
    • Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011). Sentiment analysis of twitter data. In Proceedings of the workshop on languages in social media (pp. 30–38). Association for Computational Linguistics.
    • (2011) In Proceedings of the workshop on languages in social media
  • 2
    • 84948697983 scopus 로고    scopus 로고
    • Alegria, I., Aranberri, N., Comas, P. R., Fresno, V., Gamallo, P., Padró, L., San Vicente, I., Turmo, J., & Zubiaga, A. (). Tweetnorm\_es corpus: An annotated corpus for spanish microtext normalizatio
    • Alegria, I., Aranberri, N., Comas, P. R., Fresno, V., Gamallo, P., Padró, L., San Vicente, I., Turmo, J., & Zubiaga, A. (2014). Tweetnorm\_es corpus: An annotated corpus for spanish microtext normalization. In Proceedings of the language resources and evaluation conference.
    • (2014) In Proceedings of the language resources and evaluation conference
  • 5
    • 84994795719 scopus 로고
    • Beesley, K. R. (). Language identifier: A computer program for automatic natural-language identification of on-line text. (Vol. 47, p. 54). Citeseer
    • Beesley, K. R. (1988). Language identifier: A computer program for automatic natural-language identification of on-line text. In Proceedings of the 29th annual conference of the American Translators Association (Vol. 47, p. 54). Citeseer.
    • (1988) In Proceedings of the 29th annual conference of the American Translators Association
  • 6
    • 84994773547 scopus 로고    scopus 로고
    • Bergsma, S., McNamee, P., Bagdouri, M., Fink, C., & Wilson, T. (). Language identification for creating language-specific twitter collections. (pp. 65–74). ACL
    • Bergsma, S., McNamee, P., Bagdouri, M., Fink, C., & Wilson, T. (2012). Language identification for creating language-specific twitter collections. In Workshop on language in social media (pp. 65–74). ACL.
    • (2012) In Workshop on language in social media
  • 7
    • 84864624320 scopus 로고    scopus 로고
    • Finding and identifying text in 900+ languages
    • Brown, R. D. (2012). Finding and identifying text in 900+ languages. Digital Investigation, 9, S34–S43.
    • (2012) Digital Investigation , vol.9 , pp. S34-S43
    • Brown, R.D.1
  • 8
    • 84884958218 scopus 로고    scopus 로고
    • Selecting and weighting n-grams to identify 1100 languages
    • Brown, R. D. (2013). Selecting and weighting n-grams to identify 1100 languages. Text, Speech, and Dialogue, 8082, 475–483.
    • (2013) Text, Speech, and Dialogue , vol.8082 , pp. 475-483
    • Brown, R.D.1
  • 9
    • 84921652147 scopus 로고    scopus 로고
    • Code-switching and code-mixing in internet chatting: Between ’yes’, ya’, and ’si’—A case study
    • Cárdenas-Claros, M., & Isharyanti, N. (2009). Code-switching and code-mixing in internet chatting: Between ’yes’, ya’, and ’si’—A case study. The Jalt Call Journal, 5(3), 67–78.
    • (2009) The Jalt Call Journal , vol.5 , Issue.3 , pp. 67-78
    • Cárdenas-Claros, M.1    Isharyanti, N.2
  • 10
    • 84874727608 scopus 로고    scopus 로고
    • Microblog language identification: Overcoming the limitations of short, unedited and idiomatic text
    • Carter, S., Weerkamp, W., & Tsagkias, M. (2013). Microblog language identification: Overcoming the limitations of short, unedited and idiomatic text. Language Resources and Evaluation, 47(1), 195–215.
    • (2013) Language Resources and Evaluation , vol.47 , Issue.1 , pp. 195-215
    • Carter, S.1    Weerkamp, W.2    Tsagkias, M.3
  • 12
    • 0002636321 scopus 로고
    • N-gram-based text categorization
    • Cavnar, W. B., Trenkle, J. M., et al. (1994). N-gram-based text categorization. Ann Arbor MI, 48113(2), 161–175.
    • (1994) Ann Arbor MI , vol.48113 , Issue.2 , pp. 161-175
    • Cavnar, W.B.1    Trenkle, J.M.2
  • 15
    • 83055188731 scopus 로고    scopus 로고
    • Generalized expectation criteria for lightly supervised learning. Ph.D
    • University of Massachusetts, Amhers
    • Druck, G. (2011). Generalized expectation criteria for lightly supervised learning. Ph.D. thesis, University of Massachusetts, Amherst.
    • (2011) thesis
    • Druck, G.1
  • 16
    • 0003984557 scopus 로고
    • Statistical identification of language
    • New Mexico State University, Las Cruce
    • Dunning, T. (1994). Statistical identification of language. Computing Research Laboratory, New Mexico State University, Las Cruces.
    • (1994) Computing Research Laboratory
    • Dunning, T.1
  • 18
    • 84994791223 scopus 로고    scopus 로고
    • Gella, S., Bali, K., & Choudhury, M. (). “ye word kis lang ka hai bhai?” Testing the limits of word level language identification
    • Gella, S., Bali, K., & Choudhury, M. (2014). “ye word kis lang ka hai bhai?” Testing the limits of word level language identification. In Proceedings of ICON—2014, the 11th International Conference on Natural Language Processing.
    • (2014) In Proceedings of ICON—2014, the 11th International Conference on Natural Language Processing
  • 22
    • 84889074989 scopus 로고    scopus 로고
    • To link or not to link? A study on end-to-end tweet entity linking
    • Guo, S., Chang, M. W., & Kiciman, E. (2013). To link or not to link? A study on end-to-end tweet entity linking. In HLT-NAACL (pp. 1020–1030).
    • (2013) In HLT-NAACL , pp. 1020-1030
    • Guo, S.1    Chang, M.W.2    Kiciman, E.3
  • 24
    • 85039912916 scopus 로고    scopus 로고
    • Hughes, B., Baldwin, T., Bird, S. G., Nicholson, J., & MacKinlay, A. (). Reconsidering language identification for written language resources. . European Language Resources Association
    • Hughes, B., Baldwin, T., Bird, S. G., Nicholson, J., & MacKinlay, A. (2006). Reconsidering language identification for written language resources. In Proceedings of the 5th International Conference on Language Resources and Evaluation. European Language Resources Association.
    • (2006) In Proceedings of the 5th International Conference on Language Resources and Evaluation
  • 26
    • 67650558749 scopus 로고
    • London: Technical Translation International Lt
    • Ingle, N. (1980). Language identification table. London: Technical Translation International Ltd.
    • (1980) Language identification table
    • Ingle, N.1
  • 27
    • 84994783997 scopus 로고    scopus 로고
    • Jehl, L., Hieber, F., & Riezler, S. (). Twitter translation using translation-based cross-lingual retrieval. (pp. 410–421). Association for Computational Linguistics
    • Jehl, L., Hieber, F., & Riezler, S. (2012). Twitter translation using translation-based cross-lingual retrieval. In Proceedings of the seventh workshop on statistical machine translation (pp. 410–421). Association for Computational Linguistics.
    • (2012) In Proceedings of the seventh workshop on statistical machine translation
  • 29
    • 84994728764 scopus 로고    scopus 로고
    • Kaufmann, M., & Kalita, J. (). Syntactic normalization of twitter messages. . Kharagpur, India
    • Kaufmann, M., & Kalita, J. (2010). Syntactic normalization of twitter messages. In International conference on natural language processing. Kharagpur, India.
    • (2010) In International conference on natural language processing
  • 31
    • 84994783990 scopus 로고    scopus 로고
    • Kikui, G. i. (). Identifying, the coding system and language, of on-line documents on the internet. (Vol. 2, pp. 652–657). Association for Computational Linguistics
    • Kikui, G. i. (1996). Identifying, the coding system and language, of on-line documents on the internet. In Proceedings of the 16th conference on Computational linguistics (Vol. 2, pp. 652–657). Association for Computational Linguistics.
    • (1996) In Proceedings of the 16th conference on Computational linguistics
  • 33
    • 44949230930 scopus 로고    scopus 로고
    • Europarl: A parallel corpus for statistical machine translation
    • Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In MT summit (Vol. 5, pp. 79–86).
    • (2005) In MT summit , vol.5 , pp. 79-86
    • Koehn, P.1
  • 35
    • 84877951682 scopus 로고    scopus 로고
    • Laboreiro, G., Bošnjak, M., Sarmento, L., Rodrigues, E. M., & Oliveira, E. (). Determining language variant in microblog messages. (pp. 902–907). ACM
    • Laboreiro, G., Bošnjak, M., Sarmento, L., Rodrigues, E. M., & Oliveira, E. (2013). Determining language variant in microblog messages. In Proceedings of the 28th ACM/SIGAPP symposium on applied computing (pp. 902–907). ACM.
    • (2013) In Proceedings of the 28th ACM/SIGAPP symposium on applied computing
  • 38
    • 48349136970 scopus 로고    scopus 로고
    • Ljubes̆ić, N., Mikelić, N., & Boras, D. (). Language indentification: How to distinguish similar languages? (pp. 541–546). IEEE
    • Ljubes̆ić, N., Mikelić, N., & Boras, D. (2007). Language indentification: How to distinguish similar languages? In Proceedings of the 29th international conference on information technology interfaces (pp. 541–546). IEEE.
    • (2007) In Proceedings of the 29th international conference on information technology interfaces
  • 41
    • 84994692723 scopus 로고    scopus 로고
    • Lui, M., & Baldwin, T. (). Langid. py: An off-the-shelf language identification tool. (pp. 25–30). ACL
    • Lui, M., & Baldwin, T. (2012). Langid. py: An off-the-shelf language identification tool. In Proceedings of ACL (pp. 25–30). ACL.
    • (2012) In Proceedings of ACL
  • 42
    • 84994764392 scopus 로고    scopus 로고
    • Lui, M., & Baldwin, T. (). Accurate language identification of twitter messages. (pp. 17–25). Association for Computational Linguistics, Gothenburg, Sweden
    • Lui, M., & Baldwin, T. (2014). Accurate language identification of twitter messages. In Proceedings of the 5th workshop on language analysis for social media (LASM) (pp. 17–25). Association for Computational Linguistics, Gothenburg, Sweden.
    • (2014) In Proceedings of the 5th workshop on language analysis for social media (LASM)
  • 44
    • 85015742322 scopus 로고    scopus 로고
    • Majliš, M. (). Yet another language identifier. ’12 (pp. 46–54). ACL
    • Majliš, M. (2012). Yet another language identifier. In Student Research Workshop at EACL’12 (pp. 46–54). ACL.
    • (2012) In Student Research Workshop at EACL
  • 45
    • 33644526889 scopus 로고    scopus 로고
    • Martins, B., & Silva, M. J. (). Language identification in web pages. (pp. 764–768). ACM
    • Martins, B., & Silva, M. J. (2005). Language identification in web pages. In Proceedings of SAC (pp. 764–768). ACM.
    • (2005) In Proceedings of SAC
  • 46
    • 33749648183 scopus 로고    scopus 로고
    • Language identification: A solved problem suitable for undergraduate instruction
    • McNamee, P. (2005). Language identification: A solved problem suitable for undergraduate instruction. Journal of Computing Sciences in Colleges, 20(3), 94–101.
    • (2005) Journal of Computing Sciences in Colleges , vol.20 , Issue.3 , pp. 94-101
    • McNamee, P.1
  • 47
    • 84994686252 scopus 로고    scopus 로고
    • Mendizabal, I., Carandell, J., & Horowitz, D. (). . In TweetLID@SEPLN
    • Mendizabal, I., Carandell, J., & Horowitz, D. (2014). Tweetsafa: Tweet language identification. In TweetLID@SEPLN.
    • (2014) Tweetsafa: Tweet language identification
  • 49
    • 43249161841 scopus 로고    scopus 로고
    • Language identification from small text samples
    • Murthy, K. N., & Kumar, G. B. (2006). Language identification from small text samples. Journal of Quantitative Linguistics, 13(1), 57–80.
    • (2006) Journal of Quantitative Linguistics , vol.13 , Issue.1 , pp. 57-80
    • Murthy, K.N.1    Kumar, G.B.2
  • 51
    • 84994760969 scopus 로고
    • Foreign language identification: First step in the translation process. Technical report
    • Albuquerque, NM: US
    • Newman, P. (1987). Foreign language identification: First step in the translation process. Technical report, Sandia National Labs., Albuquerque, NM, USA.
    • (1987) Sandia National Labs.
    • Newman, P.1
  • 53
    • 77952369139 scopus 로고    scopus 로고
    • Nowak, S., Lukashevich, H., Dunker, P., & Rüger, S. (). Performance measures for multilabel evaluation: A case study in the area of image classification. (pp. 35–44). ACM
    • Nowak, S., Lukashevich, H., Dunker, P., & Rüger, S. (2010). Performance measures for multilabel evaluation: A case study in the area of image classification. In Proceedings of the international conference on multimedia information retrieval (pp. 35–44). ACM.
    • (2010) In Proceedings of the international conference on multimedia information retrieval
  • 56
    • 79951803307 scopus 로고    scopus 로고
    • Comparing methods for language identification
    • Padró, M., & Padró, L. (2004). Comparing methods for language identification. Procesamiento del lenguaje natural, 33, 155–162.
    • (2004) Procesamiento del lenguaje natural , vol.33 , pp. 155-162
    • Padró, M.1    Padró, L.2
  • 57
    • 84866782819 scopus 로고    scopus 로고
    • Conversational codeswitching on usenet and internet relay chat
    • Paolillo, J. C. (2011). Conversational codeswitching on usenet and internet relay chat. Language@ Internet, 8(3), 1–2.
    • (2004) Language@ Internet , vol.8 , Issue.3 , pp. 1-2
    • Paolillo, J.C.1
  • 59
    • 0032715577 scopus 로고    scopus 로고
    • Prager, J. M. (). Linguini: Language identification for multilingual documents. , 1999 (HICSS-32) (pp. 11–pp). IEEE
    • Prager, J. M. (1999). Linguini: Language identification for multilingual documents. In Proceedings of the 32nd annual Hawaii international conference on systems sciences, 1999 (HICSS-32) (pp. 11–pp). IEEE.
    • (1999) In Proceedings of the 32nd annual Hawaii international conference on systems sciences
  • 60
    • 67650535508 scopus 로고    scopus 로고
    • R̆ehůr̆ek, R., & Kolkus, M. (). Language identification on the web: Extending the dictionary method. (pp. 357–368). Springer
    • R̆ehůr̆ek, R., & Kolkus, M. (2009). Language identification on the web: Extending the dictionary method. In Computational linguistics and intelligent text processing (pp. 357–368). Springer.
    • (2009) In Computational linguistics and intelligent text processing
  • 62
    • 0002442796 scopus 로고    scopus 로고
    • Machine learning in automated text categorization
    • Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.
    • (2002) ACM Computing Surveys (CSUR) , vol.34 , Issue.1 , pp. 1-47
    • Sebastiani, F.1
  • 65
    • 84994687194 scopus 로고
    • Sibun, P., & Spitz, A. L. (). Language determination: Natural language processing from scanned document images. (pp. 15–21). Association for Computational Linguistics
    • Sibun, P., & Spitz, A. L. (1994). Language determination: Natural language processing from scanned document images. In Proceedings of the fourth conference on applied natural language processing (pp. 15–21). Association for Computational Linguistics.
    • (1994) In Proceedings of the fourth conference on applied natural language processing
  • 66
    • 84994693294 scopus 로고    scopus 로고
    • Singh, A. K. (). Study of some distance measures for language and encoding identification. (pp. 63–72). ACL
    • Singh, A. K. (2006). Study of some distance measures for language and encoding identification. In Workshop on linguistic distances (pp. 63–72). ACL.
    • (2006) In Workshop on linguistic distances
  • 68
  • 71
  • 73
  • 74
    • 84858393694 scopus 로고    scopus 로고
    • Xia, F., Lewis, W. D., & Poon, H. (). Language id in the context of harvesting language data off the web. (pp. 870–878). Association for Computational Linguistics
    • Xia, F., Lewis, W. D., & Poon, H. (2009). Language id in the context of harvesting language data off the web. In Proceedings of the 12th conference of the European Chapter of the Association for Computational Linguistics (pp. 870–878). Association for Computational Linguistics.
    • (2009) In Proceedings of the 12th conference of the European Chapter of the Association for Computational Linguistics
  • 75
    • 84994686272 scopus 로고    scopus 로고
    • Zamora, J. D., Bruzón, A. F., & Bueno, R. O. (). . In TweetLID@SEPLN
    • Zamora, J. D., Bruzón, A. F., & Bueno, R. O. (2014). Tweets language identification using feature weighting. In TweetLID@SEPLN.
    • (2014) Tweets language identification using feature weighting
  • 77
    • 84994780247 scopus 로고    scopus 로고
    • Zubiaga, A., San Vicente, I., Gamallo, P., Pichel, J. R., Alegria, I., Aranberri, N., Ezeiza, A., & Fresno, V. (). . In TweetLID@SEPLN
    • Zubiaga, A., San Vicente, I., Gamallo, P., Pichel, J. R., Alegria, I., Aranberri, N., Ezeiza, A., & Fresno, V. (2014). Overview of tweetlid: Tweet language identification at sepln 2014. In TweetLID@SEPLN.
    • (2014) Overview of tweetlid: Tweet language identification at sepln 2014
  • 78
    • 84864046290 scopus 로고    scopus 로고
    • Zubiaga, A., Spina, D., Amigó, E., & Gonzalo, J. (2012). Towards real-time summarization of scheduled events from twitter streams. (pp. 319–320). ACM
    • Zubiaga, A., Spina, D., Amigó, E., & Gonzalo, J. (2012). Towards real-time summarization of scheduled events from twitter streams. In Proceedings of the 23rd ACM conference on hypertext and social media (pp. 319–320). ACM.
    • In Proceedings of the 23rd ACM conference on hypertext and social media


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.