메뉴 건너뛰기




Volumn , Issue , 2010, Pages 3423-3430

Language identification of short text segments with n-gram models

Author keywords

[No Author keywords available]

Indexed keywords

CLASSIFIERS; NATURAL LANGUAGE PROCESSING SYSTEMS;

EID: 84910031762     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (68)

References (27)
  • 1
    • 0242477096 scopus 로고
    • Language identifier: A computer program for automatic natural-language identification of on-line text
    • Kenneth Beesley. 1988. Language identifier: A computer program for automatic natural-language identification of on-line text. In Proceedings of the 29th ATA Annual Conference, pages 47-54.
    • (1988) Proceedings of the 29th ATA Annual Conference , pp. 47-54
    • Beesley, K.1
  • 2
    • 84861638618 scopus 로고    scopus 로고
    • Factors that affect the accuracy of text-based language identification
    • Gerrit Reinier Botha and Etienne Barnard. 2007. Factors that affect the accuracy of text-based language identification. In Proceedings of PRASA 2007, pages 7-10.
    • (2007) Proceedings of PRASA 2007 , pp. 7-10
    • Botha, G.R.1    Barnard, E.2
  • 4
    • 0033329799 scopus 로고    scopus 로고
    • An empirical study of smoothing techniques for language modeling
    • Stanley F. Chen and Joshua Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech & Language, 13(4):359-393.
    • (1999) Computer Speech & Language , vol.13 , Issue.4 , pp. 359-393
    • Chen, S.F.1    Goodman, J.2
  • 6
    • 38849173179 scopus 로고    scopus 로고
    • Identification of document language is not yet a completely solved problem
    • Joaquim Ferreira da Silva and Gabriel Pereira Lopes. 2006. Identification of document language is not yet a completely solved problem. In Proceedings of CIMCA'06, pages 212-219.
    • (2006) Proceedings of CIMCA'06 , pp. 212-219
    • Da Silva, J.F.1    Lopes, G.P.2
  • 7
    • 0028911698 scopus 로고
    • Gauging similarity with n-grams: Language-independent categorization of text
    • Marc Damashek. 1995. Gauging similarity with n-grams: Language-independent categorization of text. Science, 267(5199):843-849.
    • (1995) Science , vol.267 , Issue.5199 , pp. 843-849
    • Damashek, M.1
  • 8
    • 17444437850 scopus 로고    scopus 로고
    • Confidence scoring based on backward language models
    • Jacques Duchateau, Kris Demuynck, and Patrick Wambacq. 2002. Confidence scoring based on backward language models. In Proceedings of ICASSP 2002, Volume 1, pages 221-224.
    • (2002) Proceedings of ICASSP 2002 , vol.1 , pp. 221-224
    • Duchateau, J.1    Demuynck, K.2    Wambacq, P.3
  • 10
    • 0000803388 scopus 로고
    • The population frequencies of species and the estimation of population parameters
    • I. J. Good. 1953. The population frequencies of species and the estimation of population parameters. Biometrika, 40:237-264.
    • (1953) Biometrika , vol.40 , pp. 237-264
    • Good, I.J.1
  • 11
    • 84945903856 scopus 로고    scopus 로고
    • Language model size reduction by pruning and clustering
    • Joshua Goodman and Jianfeng Gao. 2000. Language model size reduction by pruning and clustering. In Proceedings of ICSLP, pages 16-20.
    • (2000) Proceedings of ICSLP , pp. 16-20
    • Goodman, J.1    Gao, J.2
  • 14
    • 0001152481 scopus 로고
    • Toward automatic identification of the language of an utterance. I. Preliminary methodological considerations
    • Arthur S. House and Edward P. Neuburg. 1977. Toward automatic identification of the language of an utterance. I. Preliminary methodological considerations. Journal of the Acoustical Society of America, 62(3):708-713.
    • (1977) Journal of the Acoustical Society of America , vol.62 , Issue.3 , pp. 708-713
    • House, A.S.1    Neuburg, E.P.2
  • 15
    • 0023312404 scopus 로고
    • Estimation of probabilities from sparse data for the language model component of a speech recognizer
    • Slava M. Katz. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35:400-401.
    • (1987) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.35 , pp. 400-401
    • Katz, S.M.1
  • 16
    • 0028996876 scopus 로고
    • Improved backing-off for m-gram language modeling
    • Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In Proceedings of ICASSP-95, pages 181-184.
    • (1995) Proceedings of ICASSP-95 , pp. 181-184
    • Kneser, R.1    Ney, H.2
  • 18
    • 33749648183 scopus 로고    scopus 로고
    • Language identification: A solved problem suitable for undergraduate instruction
    • Paul McNamee. 2005. Language identification: a solved problem suitable for undergraduate instruction. Journal of Computing Sciences in Colleges, 20(3):94-101.
    • (2005) Journal of Computing Sciences in Colleges , vol.20 , Issue.3 , pp. 94-101
    • McNamee, P.1
  • 19
    • 0027929445 scopus 로고
    • On structuring probabilistic dependence in stochastic language modelling
    • Hermann Ney, Ute Essen, and Reinhard Kneser. 1994. On structuring probabilistic dependence in stochastic language modelling. Computer Speech and Language, 8(1):1-38.
    • (1994) Computer Speech and Language , vol.8 , Issue.1 , pp. 1-38
    • Ney, H.1    Essen, U.2    Kneser, R.3
  • 20
    • 67650535508 scopus 로고    scopus 로고
    • Language identification on the web: Extending the dictionary method
    • Radim Řehůřek and Milan Kolkus. 2009. Language identification on the web: Extending the dictionary method. In Proceedings of CICLing 2009, pages 357-368.
    • (2009) Proceedings of CICLing 2009 , pp. 357-368
    • Řehůřek, R.1    Kolkus, M.2
  • 21
    • 0004137163 scopus 로고    scopus 로고
    • Language identification: Examining the issues
    • Penelope Sibun and Jeffrey C. Reynar. 1996. Language identification: Examining the issues. In Proceedings of SDAIR'96, pages 125-135.
    • (1996) Proceedings of SDAIR'96 , pp. 125-135
    • Sibun, P.1    Reynar, J.C.2
  • 24
    • 84891308106 scopus 로고    scopus 로고
    • SRILM - An extensible language modeling toolkit
    • Andreas Stolcke. 2002. SRILM - an extensible language modeling toolkit. In Proceedings of ICSLP, pages 901-904. http://www.speech.sri.com/projects/srilm/.
    • (2002) Proceedings of ICSLP , pp. 901-904
    • Stolcke, A.1
  • 25
    • 1542310280 scopus 로고    scopus 로고
    • Text classification and segmentation using minimum cross-entropy
    • William John Teahan. 2000. Text classification and segmentation using minimum cross-entropy. In Proceedings of RIAO'00, pages 943-961.
    • (2000) Proceedings of RIAO'00 , pp. 943-961
    • Teahan, W.J.1
  • 27
    • 0034788435 scopus 로고    scopus 로고
    • A study of smoothing methods for language models applied to ad hoc information retrieval
    • Chengxiang Zhai and John Lafferty. 2001. A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of SIGIR'01, pages 334-342.
    • (2001) Proceedings of SIGIR'01 , pp. 334-342
    • Zhai, C.1    Lafferty, J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.