메뉴 건너뛰기




Volumn 7, Issue 3-4, 2004, Pages 317-345

Augmenting naive Bayes classifiers with statistical language models

Author keywords

n gram language models; Naive Bayes; Smoothing; Text classification

Indexed keywords


EID: 3843083955     PISSN: 13864564     EISSN: None     Source Type: Journal    
DOI: 10.1023/b:inrt.0000011209.19643.e2     Document Type: Conference Paper
Times cited : (199)

References (56)
  • 5
    • 0003396042 scopus 로고    scopus 로고
    • An empirical study of smoothing techniques for language modeling
    • Computer Science Group, Harvard University
    • Chen S and Goodman J (1998) An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.
    • (1998) Technical Report TR-10-98
    • Chen, S.1    Goodman, J.2
  • 6
    • 0025750735 scopus 로고
    • A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of English bigrams
    • Church K and Gale W (1991) A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of English bigrams. Computer Speech and Language, 5(1).
    • (1991) Computer Speech and Language , vol.5 , Issue.1
    • Church, K.1    Gale, W.2
  • 7
    • 0001345686 scopus 로고    scopus 로고
    • Context-sensitive learning methods for text categorization
    • Cohen W and Singer Y (1999) Context-sensitive learning methods for text categorization. ACM Transactions on Information Systems, 17:141-173.
    • (1999) ACM Transactions on Information Systems , vol.17 , pp. 141-173
    • Cohen, W.1    Singer, Y.2
  • 8
    • 0028911698 scopus 로고
    • Gauging similarity with N-grams: Language-independent categorization of text?
    • Damashek M (1995) Gauging similarity with N-grams: Language-independent categorization of text? Science, 267:843-848.
    • (1995) Science , vol.267 , pp. 843-848
    • Damashek, M.1
  • 9
    • 0031269184 scopus 로고    scopus 로고
    • Beyond independence: Conditions for the optimality of the simple bayesian classifier
    • Domingos P and Pazzani M (1997) Beyond independence: Conditions for the optimality of the simple bayesian classifier. Machine Learning, 29:103-130.
    • (1997) Machine Learning , vol.29 , pp. 103-130
    • Domingos, P.1    Pazzani, M.2
  • 13
    • 3843094229 scopus 로고    scopus 로고
    • Language modelling
    • De Mori R, Ed., Academy Press, London, UK, chapter 7
    • Federico M and De Mori R (1998) Language modelling. In: De Mori R, Ed. Spoken Dialogues with Computers, Academy Press, London, UK, chapter 7.
    • (1998) Spoken Dialogues with Computers
    • Federico, M.1    De Mori, R.2
  • 18
    • 0003754573 scopus 로고    scopus 로고
    • PhD thesis, Centre for Telematics and Information Technology, University of Twente
    • Hiemstra D (2001) Using language models for information retrieval. PhD thesis, Centre for Telematics and Information Technology, University of Twente.
    • (2001) Using Language Models for Information Retrieval
    • Hiemstra, D.1
  • 19
    • 0011358177 scopus 로고
    • The federalist revisited: New directions in authorship attribution
    • Holmes D and Forsyth R (1995) The federalist revisited: New directions in authorship attribution. Literary and Linguistic Computing, 10:111-127.
    • (1995) Literary and Linguistic Computing , vol.10 , pp. 111-127
    • Holmes, D.1    Forsyth, R.2
  • 21
    • 0346346046 scopus 로고
    • Acquaintance: Language-independent document categorization by N-grams
    • Harman DK and Voorhees EM, Eds.
    • Huffman S (1995) Acquaintance: Language-independent document categorization by N-grams. In: Harman DK and Voorhees EM, Eds. Proceedings of TREC-4, 4th Text Retrieval Conference, pp. 359-371.
    • (1995) Proceedings of TREC-4, 4th Text Retrieval Conference , pp. 359-371
    • Huffman, S.1
  • 22
    • 0001882615 scopus 로고
    • Self-organized language modeling for speech recognition
    • Weibel A and Lee K-F, Eds., Morgan Kaufmann, Los Altos, CA
    • Jelinek F (1990) Self-organized language modeling for speech recognition. In: Weibel A and Lee K-F, Eds., Readings in Speech Recognition, Morgan Kaufmann, Los Altos, CA, pp. 450-505.
    • (1990) Readings in Speech Recognition , pp. 450-505
    • Jelinek, F.1
  • 24
    • 0023312404 scopus 로고
    • Estimation of probabilities from sparse data for the language model component of a speech recognizer
    • Katz S (1987) Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-35(3):400-401.
    • (1987) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.ASSP-35 , Issue.3 , pp. 400-401
    • Katz, S.1
  • 36
    • 0025516650 scopus 로고
    • Implementing the PPM data compression scheme
    • Moffat A (1990) Implementing the PPM data compression scheme. IEEE Transactions on Communications, 31(11):1917-1921.
    • (1990) IEEE Transactions on Communications , vol.31 , Issue.11 , pp. 1917-1921
    • Moffat, A.1
  • 37
    • 0027929445 scopus 로고
    • On structuring probabilistic dependencies in stochastic language Modeling
    • Ney H, Essen U and Kneser R (1994) On structuring probabilistic dependencies in stochastic language Modeling. Computer Speech and Language, 8(1):1-28.
    • (1994) Computer Speech and Language , vol.8 , Issue.1 , pp. 1-28
    • Ney, H.1    Essen, U.2    Kneser, R.3
  • 43
    • 3843124574 scopus 로고    scopus 로고
    • (1991) Trigram-based method of language identification. U.S. Patent No. 5,062,143
    • Schmitt JC (1991) Trigram-based method of language identification. U.S. Patent No. 5,062,143.
    • Schmitt, J.C.1
  • 45
    • 0002442796 scopus 로고    scopus 로고
    • Machine learning in automated text categorization
    • Sebastiani F (2002) Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1-47.
    • (2002) ACM Computing Surveys , vol.34 , Issue.1 , pp. 1-47
    • Sebastiani, F.1
  • 46
  • 50
    • 3042705201 scopus 로고    scopus 로고
    • also appear, Kluwer
    • Teahan W and Harper D (2001) Using compression-based language models for text categorization. In: Proceedings of Workshop on Language Modeling and Information Retrieval (LMIR). (also appear in Language Modeling and Information Retrieval, Kluwer, 2003).
    • (2003) Language Modeling and Information Retrieval
  • 53
    • 0026187945 scopus 로고
    • The zero-frequency Problem: Estimating the probabilities of novel events in adaptive text compression
    • Witten I and Bell T (1991) The zero-frequency Problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37(4), 37(4).
    • (1991) IEEE Transactions on Information Theory , vol.37 , Issue.4
    • Witten, I.1    Bell, T.2
  • 55
    • 27144441097 scopus 로고    scopus 로고
    • An evaluation of statistical approaches to text categorization
    • Yang Y (1999) An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1/2):67-88.
    • (1999) Information Retrieval , vol.1 , Issue.1-2 , pp. 67-88
    • Yang, Y.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.