메뉴 건너뛰기




Volumn 50, Issue 1, 2014, Pages 104-112

The impact of preprocessing on text classification

Author keywords

Pattern recognition; Text categorization; Text classification; Text preprocessing

Indexed keywords

BENCHMARK DATASETS; CLASSIFICATION ACCURACY; DIMENSION REDUCTION; EXPERIMENTAL ANALYSIS; FEATURE DIMENSIONS; TEXT CATEGORIZATION; TEXT CLASSIFICATION; TEXT PREPROCESSING;

EID: 84884546254     PISSN: 03064573     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.ipm.2013.08.006     Document Type: Article
Times cited : (520)

References (34)
  • 1
    • 36948999941 scopus 로고    scopus 로고
    • C. A. Irvine (Ed.), University of California, Department of Information and Computer Science
    • Asuncion, A.; Newman, D. J. (2007). UCI machine learning repository. In C. A. Irvine (Ed.), University of California, Department of Information and Computer Science.
    • (2007) UCI Machine Learning Repository
    • Asuncion, A.1    Newman, D.J.2
  • 3
    • 78650694104 scopus 로고    scopus 로고
    • Using chi-square statistics to measure similarities for text categorization
    • Y.-T. Chen, and M.C. Chen Using chi-square statistics to measure similarities for text categorization Expert Systems with Applications 38 2011 3085 3090
    • (2011) Expert Systems with Applications , vol.38 , pp. 3085-3090
    • Chen, Y.-T.1    Chen, M.C.2
  • 8
    • 2942731012 scopus 로고    scopus 로고
    • An extensive empirical study of feature selection metrics for text classification
    • G. Forman An extensive empirical study of feature selection metrics for text classification Journal of Machine Learning Research 3 2003 1289 1305
    • (2003) Journal of Machine Learning Research , vol.3 , pp. 1289-1305
    • Forman, G.1
  • 9
    • 84861198947 scopus 로고    scopus 로고
    • Automated text classification using a dynamic artificial neural network model
    • M. Ghiassi, M. Olschimke, B. Moon, and P. Arnaudo Automated text classification using a dynamic artificial neural network model Expert Systems with Applications 39 2012 10967 10976
    • (2012) Expert Systems with Applications , vol.39 , pp. 10967-10976
    • Ghiassi, M.1    Olschimke, M.2    Moon, B.3    Arnaudo, P.4
  • 11
    • 48049093694 scopus 로고    scopus 로고
    • Subspace based feature selection for pattern recognition
    • S. Gunal, and R. Edizkan Subspace based feature selection for pattern recognition Information Sciences 178 2008 3716 3726
    • (2008) Information Sciences , vol.178 , pp. 3716-3726
    • Gunal, S.1    Edizkan, R.2
  • 13
    • 0002409860 scopus 로고    scopus 로고
    • A probabilistic analysis of the Rocchio algorithm with tfidf for text categorization
    • Morgan Kaufmann Publishers Inc
    • T. Joachims A probabilistic analysis of the Rocchio algorithm with tfidf for text categorization 14th international conference on machine learning 1997 Morgan Kaufmann Publishers Inc 143 151
    • (1997) 14th International Conference on Machine Learning , pp. 143-151
    • Joachims, T.1
  • 14
    • 77953129520 scopus 로고    scopus 로고
    • A comparison study on multiple binary-class SVM methods for unilabel text categorization
    • M.A. Kumar, and M. Gopal A comparison study on multiple binary-class SVM methods for unilabel text categorization Pattern Recognition Letters 31 2010 1437 1444
    • (2010) Pattern Recognition Letters , vol.31 , pp. 1437-1444
    • Kumar, M.A.1    Gopal, M.2
  • 15
    • 23744432473 scopus 로고    scopus 로고
    • Information gain and divergence-based feature selection for machine learning-based text categorization
    • DOI 10.1016/j.ipm.2004.08.006, PII S0306457304000962
    • C. Lee, and G.G. Lee Information gain and divergence-based feature selection for machine learning-based text categorization Information Processing & Management 42 2006 155 165 (Pubitemid 41119082)
    • (2006) Information Processing and Management , vol.42 , Issue.1 SPEC. ISS , pp. 155-165
    • Lee, C.1    Lee, G.G.2
  • 16
    • 62349118015 scopus 로고    scopus 로고
    • Feature selection with dynamic mutual information
    • H. Liu, J. Sun, L. Liu, and H. Zhang Feature selection with dynamic mutual information Pattern Recognition 42 2009 1330 1339
    • (2009) Pattern Recognition , vol.42 , pp. 1330-1339
    • Liu, H.1    Sun, J.2    Liu, L.3    Zhang, H.4
  • 17
    • 84865577263 scopus 로고    scopus 로고
    • A lexicon model for deep sentiment analysis and opinion mining applications
    • I. Maks, and P. Vossen A lexicon model for deep sentiment analysis and opinion mining applications Decision Support Systems 53 2012 680 688
    • (2012) Decision Support Systems , vol.53 , pp. 680-688
    • Maks, I.1    Vossen, P.2
  • 20
    • 78650716116 scopus 로고    scopus 로고
    • A web page classification system based on a genetic algorithm using tagged-terms as features
    • S.A. Ozel A web page classification system based on a genetic algorithm using tagged-terms as features Expert Systems with Applications 38 2011 3407 3415
    • (2011) Expert Systems with Applications , vol.38 , pp. 3407-3415
    • Ozel, S.A.1
  • 22
    • 84948481845 scopus 로고
    • An algorithm for suffix stripping
    • M.F. Porter An algorithm for suffix stripping Program 14 1980 130 137
    • (1980) Program , vol.14 , pp. 130-137
    • Porter, M.F.1
  • 24
    • 33845622338 scopus 로고    scopus 로고
    • A novel feature selection algorithm for text categorization
    • DOI 10.1016/j.eswa.2006.04.001, PII S095741740600114X
    • W. Shang, H. Huang, H. Zhu, Y. Lin, Y. Qu, and Z. Wang A novel feature selection algorithm for text categorization Expert Systems with Applications 33 2007 1 5 (Pubitemid 44959912)
    • (2007) Expert Systems with Applications , vol.33 , Issue.1 , pp. 1-5
    • Shang, W.1    Huang, H.2    Zhu, H.3    Lin, Y.4    Qu, Y.5    Wang, Z.6
  • 25
    • 27844480603 scopus 로고    scopus 로고
    • A comparative study on text representation schemes in text categorization
    • F.X. Song, S.H. Liu, and J.Y. Yang A comparative study on text representation schemes in text categorization Pattern Analysis and Applications 8 2005 199 209
    • (2005) Pattern Analysis and Applications , vol.8 , pp. 199-209
    • Song, F.X.1    Liu, S.H.2    Yang, J.Y.3
  • 26
    • 79953696504 scopus 로고    scopus 로고
    • Adapting centroid classifier for document categorization
    • S. Tan, Y. Wang, and G. Wu Adapting centroid classifier for document categorization Expert Systems with Applications 38 2011 10264 10273
    • (2011) Expert Systems with Applications , vol.38 , pp. 10264-10273
    • Tan, S.1    Wang, Y.2    Wu, G.3
  • 32
    • 84867846144 scopus 로고    scopus 로고
    • A novel probabilistic feature selection method for text classification
    • A.K. Uysal, and S. Gunal A novel probabilistic feature selection method for text classification Knowledge-Based Systems 36 2012 226 235
    • (2012) Knowledge-Based Systems , vol.36 , pp. 226-235
    • Uysal, A.K.1    Gunal, S.2
  • 33
    • 0003141935 scopus 로고    scopus 로고
    • A comparative study on feature selection in text categorization
    • Morgan Kaufmann Publishers Inc
    • Y. Yang, and J.O. Pedersen A comparative study on feature selection in text categorization 14th international conference on machine learning 1997 Morgan Kaufmann Publishers Inc 412 420
    • (1997) 14th International Conference on Machine Learning , pp. 412-420
    • Yang, Y.1    Pedersen, J.O.2
  • 34
    • 84884545102 scopus 로고    scopus 로고
    • Zemberek. (Accessed January 2013)
    • Zemberek. < http://code.google.com/p/zemberek/ > (Accessed January 2013).


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.