메뉴 건너뛰기




Volumn 52, Issue 6, 2013, Pages 1-17

The textcat package for n-gram based text categorization in R

Author keywords

Language identification; N grams; R; Text categorization; Text mining; Textcat

Indexed keywords


EID: 84873900391     PISSN: 15487660     EISSN: None     Source Type: Journal    
DOI: 10.18637/jss.v052.i06     Document Type: Article
Times cited : (67)

References (33)
  • 1
    • 85015762649 scopus 로고    scopus 로고
    • Language Identification from Text Using n-Gram Based Cumulative Frequency Addition
    • CSIS, Pace University, May 7th, 2004
    • Ahmed B, Cha SH, Tappert C (2004). "Language Identification from Text Using n-Gram Based Cumulative Frequency Addition." In Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004. URL http://www.csis.pace.edu/~ctappert/srd2004/paper12.pdf.
    • (2004) In Proceedings of Student/Faculty Research Day
    • Ahmed, B.1    Cha, S.H.2    Tappert, C.3
  • 9
    • 0003984557 scopus 로고
    • Technical Report MCCS 94-273, Computing Research Lab (CRL), New Mexico State University. URL
    • Dunning T (1994). "Statistical Identification of Language." Technical Report MCCS 94-273, Computing Research Lab (CRL), New Mexico State University. URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.1958.
    • (1994) Statistical Identification of Language
    • Dunning, T.1
  • 10
    • 0038217041 scopus 로고    scopus 로고
    • The Distribution of N-Grams
    • Egghe L (2000). "The Distribution of N-Grams." Scientometrics, 47(2), 237-252.
    • (2000) Scientometrics , vol.47 , Issue.2 , pp. 237-252
    • Egghe, L.1
  • 14
    • 85134891245 scopus 로고
    • Language Identification for the Automatic Grapheme-to-Phoneme Conversion of Foreign Words in a German Text-to-Speech System
    • (First European Conference on Speech Communication and Technology)
    • Henrich P (1989). "Language Identification for the Automatic Grapheme-to-Phoneme Conversion of Foreign Words in a German Text-to-Speech System." In EUROSPEECH-1989 (First European Conference on Speech Communication and Technology), pp. 2220-2223. URL http://www.isca-speech.org/archive/eurospeech_1989/e89_2220.html.
    • (1989) EUROSPEECH-1989 , pp. 2220-2223
    • Henrich, P.1
  • 16
    • 0345555473 scopus 로고
    • A Language Identification Table
    • Ingle NC (1976)."A Language Identification Table."The Incorporated Linguist, 15(4), 98-101.
    • (1976) The Incorporated Linguist , vol.15 , Issue.4 , pp. 98-101
    • Ingle, N.C.1
  • 18
    • 58149462758 scopus 로고    scopus 로고
    • A Machine Learning Approach for Arabic Text Classification Using n-Gram Frequency Statistics
    • doi:10.1016/j.joi. 2008.11.005
    • Khreisat L (2009). "A Machine Learning Approach for Arabic Text Classification Using n-Gram Frequency Statistics." Journal of Informetrics, 3(1), 72-77. doi:10.1016/j.joi. 2008.11.005.
    • (2009) Journal of Informetrics , vol.3 , Issue.1 , pp. 72-77
    • Khreisat, L.1
  • 19
    • 48349136970 scopus 로고    scopus 로고
    • Language Identification: How to Distinguish Similar Languages
    • In V Lužar-Stifter, VH Dobrić (eds.), SRCE University Com-puting Centre, Zagreb. URL
    • Ljubešić N, Mikelić N, Boras D (2007). "Language Identification: How to Distinguish Similar Languages." In V Lužar-Stifter, VH Dobrić (eds.), Proceedings of the 29th International Conference on Information Technology Interfaces, pp. 541-546. SRCE University Com-puting Centre, Zagreb. URL http://www.nljubesic.net/main/publications_files/ljubesic07-language.pdf.
    • (2007) Proceedings of the 29th International Conference on Information Technology Interfaces , pp. 541-546
    • Ljubešić, N.1    Mikelić, N.2    Boras, D.3
  • 20
    • 67650083674 scopus 로고    scopus 로고
    • Technical report, Cavendish Labo-ratory, Cambridge, The Inference Group. URL
    • Murray IA (2002). "Probabilistic Language Modelling." Technical report, Cavendish Labo-ratory, Cambridge, The Inference Group. URL http://www.inference.phy.cam.ac.uk/is/papers/langreport.pdf.
    • (2002) Probabilistic Language Modelling
    • Murray, I.A.1
  • 21
    • 24744447069 scopus 로고
    • Multiple Discriminant Analysis in Linguistic Problems
    • Stockholm
    • Mustonen S (1965). "Multiple Discriminant Analysis in Linguistic Problems." Statistical Methods in Linguistics, 4, 37-44. Stockholm.
    • (1965) Statistical Methods in Linguistics , vol.4 , pp. 37-44
    • Mustonen, S.1
  • 23
    • 84863304598 scopus 로고    scopus 로고
    • R Core Team, R Foun-dation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0
    • R Core Team (2012). R: A Language and Environment for Statistical Computing. R Foun-dation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.
    • (2012) R: A Language and Environment for Statistical Computing
  • 24
    • 41149154578 scopus 로고    scopus 로고
    • The Crúbadán Project: Corpus building for under-resourced languages
    • In C Fairon, H Naets, A Kilgarriff, GM de Schryver (eds.), Presses universitaires de Louvain, Louvain-la-Neuve, Belgium. URL
    • Scannell KP (2007)."The Crúbadán Project: Corpus building for under-resourced languages." In C Fairon, H Naets, A Kilgarriff, GM de Schryver (eds.), Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop, volume 4 of Cahiers du Cental, pp. 5-15. Presses universitaires de Louvain, Louvain-la-Neuve, Belgium. URL http://borel.slu.edu/pub/wac3.pdf.
    • (2007) Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop, Volume 4 of Cahiers Du Cental , pp. 5-15
    • Scannell, K.P.1
  • 25
    • 0004137163 scopus 로고    scopus 로고
    • Language Identification: Examining the Issues
    • Las Vegas, Nevada, U.S.A. URL
    • Sibun P, Reynar JC (1996). "Language Identification: Examining the Issues." In 5th Sympo-sium on Document Analysis and Information Retrieval, pp. 125-135. Las Vegas, Nevada, U.S.A. URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.4524.
    • (1996) 5th Sympo-sium on Document Analysis and Information Retrieval , pp. 125-135
    • Sibun, P.1    Reynar, J.C.2
  • 26
    • 78651342601 scopus 로고    scopus 로고
    • Study of Some Distance Measures for Language and Encoding Identification
    • Sydney, July 2006
    • Singh AK (2006). "Study of Some Distance Measures for Language and Encoding Identification." In Proceedings in the Workshop on Linguistic Distances, Sydney, July 2006, pp. 63-72. URL http://acl.ldc.upenn.edu/W/W06/W06-1109.pdf.
    • (2006) Proceedings in the Workshop on Linguistic Distances , pp. 63-72
    • Singh, A.K.1
  • 27
    • 85119187881 scopus 로고
    • Natural Language Identification Using Corpus-Based Models
    • Souter C, Churcher G, Hayes J, Hughes J, Johnson S (1994)."Natural Language Identification Using Corpus-Based Models." Hermes Journal of Linguistics, 13, 183-203. URL http://download2.hermes.asb.dk/archive/FreeH/H13_15.pdf.
    • (1994) Hermes Journal of Linguistics , vol.13 , pp. 183-203
    • Souter, C.1    Churcher, G.2    Hayes, J.3    Hughes, J.4    Johnson, S.5
  • 28
    • 20344398381 scopus 로고    scopus 로고
    • van Noord G (1997). "TextCat." URL http://odur.let.rug.nl/~vannoord/TextCat.
    • (1997) TextCat
    • van Noord, G.1
  • 29
    • 84855721130 scopus 로고    scopus 로고
    • Wikipedia, accessed 2013-01-15
    • Wikipedia (2013a). "n-Gram - Wikipedia, The Free Encyclopedia." URL http://en.wikipedia.org/wiki/N-gram, accessed 2013-01-15.
    • (2013) N-Gram - Wikipedia, the Free Encyclopedia
  • 31
    • 84855721130 scopus 로고    scopus 로고
    • Wikipedia, accessed 2013-01-15
    • Wikipedia (2013c). "XPath - Wikipedia, The Free Encyclopedia." URL http://en.wikipedia.org/wiki/XPath, accessed 2013-01-15.
    • (2013) XPath - Wikipedia, the Free Encyclopedia


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.