메뉴 건너뛰기




Volumn , Issue , 2012, Pages 1482-1486

Feature selection based on term frequency and t-test for text categorization

Author keywords

feature selection; T test; term frequency; text classification

Indexed keywords

CHI-SQUARE STATISTICS; COMPARATIVE EXPERIMENTS; DOCUMENT FREQUENCY; FEATURE SELECTION METHODS; HIGH FREQUENCY HF; INFORMATION GAIN; SELECTION FUNCTION; T-TEST; TERM FREQUENCY; TEXT CATEGORIZATION; TEXT CLASSIFICATION; TEXT CORPORA;

EID: 84871054166     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/2396761.2398457     Document Type: Conference Paper
Times cited : (28)

References (21)
  • 3
    • 34249753618 scopus 로고
    • Support-vector networks
    • C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 1995, (20), 273-297.
    • (1995) Machine Learning , Issue.20 , pp. 273-297
    • Cortes, C.1    Vapnik, V.2
  • 4
    • 85055298348 scopus 로고
    • Accurate methods for the statistics of surprise and coincidence
    • T. Dunning. Accurate methods for the statistics of surprise and coincidence. Comput. Linguist., 1993, 19(1), 61-74.
    • (1993) Comput. Linguist. , vol.19 , Issue.1 , pp. 61-74
    • Dunning, T.1
  • 5
    • 2942731012 scopus 로고    scopus 로고
    • An extensive empirical study of feature selection metrics for text classification
    • G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 2003, 3, 1289-1305.
    • (2003) Journal of Machine Learning Research , vol.3 , pp. 1289-1305
    • Forman, G.1
  • 7
    • 14344263547 scopus 로고    scopus 로고
    • Centroid-based document classification: Analysis & experimental results
    • E.-H. Han and G. Karypis. Centroid-based document classification: Analysis & experimental results. In: Proceedings of PKDD, 2000.
    • (2000) Proceedings of PKDD
    • Han, E.-H.1    Karypis, G.2
  • 8
    • 0002346866 scopus 로고    scopus 로고
    • Hierarchically classifying documents using very few words
    • D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In: Proceedings of ICML, 1997, 170-178.
    • (1997) Proceedings of ICML , pp. 170-178
    • Koller, D.1    Sahami, M.2
  • 11
    • 0002551285 scopus 로고    scopus 로고
    • Feature selection for unbalanced class distribution and naive bayes
    • D. Mladenic and M. Grobelnik. Feature selection for unbalanced class distribution and naive bayes. In: Proceedings of ICML, 1999.
    • (1999) Proceedings of ICML
    • Mladenic, D.1    Grobelnik, M.2
  • 12
    • 45549117987 scopus 로고
    • Term-weighting approaches in automatic text retrieval
    • G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 1988, 24(5), 513-523.
    • (1988) Information Processing & Management , vol.24 , Issue.5 , pp. 513-523
    • Salton, G.1    Buckley, C.2
  • 13
    • 0002442796 scopus 로고    scopus 로고
    • Machine learning in automated text categorization
    • F. Sebastiani. Machine learning in automated text categorization. ACM Comput Surv, 2002, 34(1), 1-47.
    • (2002) ACM Comput Surv , vol.34 , Issue.1 , pp. 1-47
    • Sebastiani, F.1
  • 14
    • 0037076272 scopus 로고    scopus 로고
    • Diagnosis of multiple cancer types by shrunken centroids of gene expression
    • R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci., 2002, 99: 6567-6572.
    • (2002) Proc. Natl. Acad. Sci. , vol.99 , pp. 6567-6572
    • Tibshirani, R.1    Hastie, T.2    Narasimhan, B.3    Chu, G.4
  • 15
    • 80955181170 scopus 로고    scopus 로고
    • A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm
    • H. Uguz. A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl.-Based Syst., 2011, 24(7): 1024-1032.
    • (2011) Knowl.-Based Syst. , vol.24 , Issue.7 , pp. 1024-1032
    • Uguz, H.1
  • 16
    • 77956611003 scopus 로고    scopus 로고
    • mr2pso: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification
    • A. Unler, A. Murat, and R. B. Chinnam. mr2pso: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf. Sci., 2011, 181(20):4625-4641.
    • (2011) Inf. Sci. , vol.181 , Issue.20 , pp. 4625-4641
    • Unler, A.1    Murat, A.2    Chinnam, R.B.3
  • 17
  • 18
    • 0345399126 scopus 로고
    • The probable error of a mean
    • S. William. The probable error of a mean. Biometrika, 1908, 6(1), 1-25.
    • (1908) Biometrika , vol.6 , Issue.1 , pp. 1-25
    • William, S.1
  • 19
    • 0036930581 scopus 로고    scopus 로고
    • Relative term-frequency based feature selection for text categorization
    • S.-M. Yang, X. Wu, and Z. Deng. Relative term-frequency based feature selection for text categorization. In: Proceedings of ICMLC, 2002.
    • (2002) Proceedings of ICMLC
    • Yang, S.-M.1    Wu, X.2    Deng, Z.3
  • 20
    • 0003141935 scopus 로고    scopus 로고
    • A comparative study on feature selection in text categorization
    • Y.-M. Yang and J.-P. Pedersen. A comparative study on feature selection in text categorization. In: Proceedings of ICML, 1997, 412-420.
    • (1997) Proceedings of ICML , pp. 412-420
    • Yang, Y.-M.1    Pedersen, J.-P.2
  • 21
    • 38949156621 scopus 로고    scopus 로고
    • A modified t-test feature selection method and its application on the hapmap genotype data
    • N.-N. Zhou and L.-P. Wang. A modified t-test feature selection method and its application on the hapmap genotype data. Geno. Prot. Bioinfo., 2007, 5(3-4), 242-249.
    • (2007) Geno. Prot. Bioinfo. , vol.5 , Issue.3-4 , pp. 242-249
    • Zhou, N.-N.1    Wang, L.-P.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.