메뉴 건너뛰기




Volumn , Issue , 2007, Pages 541-546

Language identification: How to distinguish similar languages?

Author keywords

Croatian language; Forbidden words method; Most frequent words method; Second order Markov model; Web corpus; Written language identification

Indexed keywords

INFORMATION RETRIEVAL SYSTEMS; INFORMATION TECHNOLOGY; LINGUISTICS; MARKOV PROCESSES; TECHNOLOGY;

EID: 48349136970     PISSN: 13301012     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ITI.2007.4283829     Document Type: Conference Paper
Times cited : (53)

References (26)
  • 3
    • 24744439287 scopus 로고
    • A learning experience: Training an artificial neural network to discriminate languages
    • Technical Report
    • Batchelder EO. A learning experience: Training an artificial neural network to discriminate languages. Technical Report, 1992.
    • (1992)
    • Batchelder, E.O.1
  • 6
    • 0028911698 scopus 로고
    • Gauging similarity with n-grams: Language independent categorization of text
    • Damashek M. Gauging similarity with n-grams: language independent categorization of text. Science 1995; 267(5199):843-848.
    • (1995) Science , vol.267 , Issue.5199 , pp. 843-848
    • Damashek, M.1
  • 7
    • 0003984557 scopus 로고
    • Statistical identification of language
    • New Mexico State University
    • Dunning T. Statistical identification of language. Technical Report MCCS. New Mexico State University, 1994. p.94-273.
    • (1994) Technical Report MCCS , pp. 94-273
    • Dunning, T.1
  • 9
    • 33644534588 scopus 로고
    • Language identification for the automatic grapheme-to-phoneme conversion of foreign words in a german text-to-speech system
    • European Speech Communication and Technology
    • Henrich P. Language identification for the automatic grapheme-to-phoneme conversion of foreign words in a german text-to-speech system. In Proceedings of Eurospeech 1989, European Speech Communication and Technology, 1989. p. 220-223.
    • (1989) Proceedings of Eurospeech , pp. 220-223
    • Henrich, P.1
  • 10
  • 11
    • 24344493806 scopus 로고
    • Solving the problem of language recognition
    • Technical report. School of Computer Studies, University of Leeds
    • Johnson, S. Solving the problem of language recognition. Technical report. School of Computer Studies, University of Leeds, 1993.
    • (1993)
    • Johnson, S.1
  • 13
    • 24744457990 scopus 로고
    • Using short words: A language identification algorithm
    • Kulikowski S. Using short words: a language identification algorithm. Unpublished technical report, 1991.
    • (1991) Unpublished technical report
    • Kulikowski, S.1
  • 15
    • 3843127500 scopus 로고    scopus 로고
    • Character N-gram Tokenization for European Language Text Retrieval
    • McNamee P, Mayfield J. Character N-gram Tokenization for European Language Text Retrieval. Information Retrieval 2004; 7:73-97.
    • (2004) Information Retrieval , vol.7 , pp. 73-97
    • McNamee, P.1    Mayfield, J.2
  • 18
    • 4243269926 scopus 로고
    • Trigram-based method of language identification,
    • October, U.S. Patent number: 5062143
    • Schmitt JC. Trigram-based method of language identification, October 1991. U.S. Patent number: 5062143.
    • (1991)
    • Schmitt, J.C.1
  • 20
    • 48349133292 scopus 로고    scopus 로고
    • Souter et al. Natural Language Identification Using Corpus-Based Models. Hermes J. Linguistics 1994; 13:183-203.
    • Souter et al. Natural Language Identification Using Corpus-Based Models. Hermes J. Linguistics 1994; 13:183-203.
  • 23
  • 24
    • 48349121457 scopus 로고    scopus 로고
    • News portals: http://www.net.hr, http://www.b92.net, http://novice.siol. net [4/26/2007]
    • News portals: http://www.net.hr, http://www.b92.net, http://novice.siol. net [4/26/2007]
  • 25
    • 48349138925 scopus 로고    scopus 로고
    • 07/01/2006
    • Xerox Language Identifier. http://www.xrce.xerox.com/competencies/ content-analysis/tools/guesser-ISO-8859-1.en.html [07/01/2006]
    • Xerox Language Identifier


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.