메뉴 건너뛰기




Volumn 22, Issue 10, 2010, Pages 1388-1400

Combating the small sample class imbalance problem using feature selection

Author keywords

bioinformatics; Class imbalance problem; feature evaluation and selection; machine learning; pattern recognition; text mining.

Indexed keywords

CLASS IMBALANCE PROBLEMS; FEATURE EVALUATION AND SELECTION; FEATURE SELECTION; IMBALANCED DATA; IMBALANCED DATA SETS; MACHINE-LEARNING; PROBLEM DOMAIN; REAL-WORLD APPLICATION; RECEIVER OPERATING CHARACTERISTICS; RESAMPLING; SIGNAL TO NOISE; SMALL SAMPLES; SPECIFIC PROBLEMS; SUB-OPTIMAL PERFORMANCE; SYSTEMATIC STUDY; TEXT CLASSIFICATION; TEXT MINING;

EID: 77956023732     PISSN: 10414347     EISSN: None     Source Type: Journal    
DOI: 10.1109/TKDE.2009.187     Document Type: Article
Times cited : (326)

References (59)
  • 2
    • 0036161259 scopus 로고    scopus 로고
    • Gene selection for cancer classification using support vector machines
    • I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol.46, nos. 1-3, pp. 389-422, 2002.
    • (2002) Machine Learning , vol.46 , Issue.1-3 , pp. 389-422
    • Guyon, I.1    Weston, J.2    Barnhill, S.3    Vapnik, V.4
  • 3
  • 4
    • 2942731012 scopus 로고    scopus 로고
    • An extensive empirical study of feature selection metrics for text classification
    • G. Forman, "An Extensive Empirical Study of Feature Selection Metrics for Text Classification," J. Machine Learning Research, vol.3, pp. 1289-1305, 2003.
    • (2003) J. Machine Learning Research , vol.3 , pp. 1289-1305
    • Forman, G.1
  • 5
    • 16644402628 scopus 로고    scopus 로고
    • Feature selection for text categorization on imbalanced data
    • Z. Zheng, X. Wu, and R. Srihari, "Feature Selection for Text Categorization on Imbalanced Data," ACM SIGKDD Explorations Newsletter, vol.6, pp. 80-89, 2004.
    • (2004) ACM SIGKDD Explorations Newsletter , vol.6 , pp. 80-89
    • Zheng, Z.1    Wu, X.2    Srihari, R.3
  • 6
    • 0742268532 scopus 로고    scopus 로고
    • Feature reduction and morphological processing for hyperspectral image data
    • D. Casasent and X. Chen, "Feature Reduction and Morphological Processing for Hyperspectral Image Data," Applied Optics, vol.43, no.2, pp. 1-10, 2004.
    • (2004) Applied Optics , vol.43 , Issue.2 , pp. 1-10
    • Casasent, D.1    Chen, X.2
  • 11
    • 20844458491 scopus 로고    scopus 로고
    • Mining with rarity: A unifying framework
    • G. Weiss, "Mining with Rarity: A Unifying Framework," ACM SIGKDD Explorations Newsletter, vol.6, no.1, pp. 7-19, 2004.
    • (2004) ACM SIGKDD Explorations Newsletter , vol.6 , Issue.1 , pp. 7-19
    • Weiss, G.1
  • 12
    • 1442275185 scopus 로고    scopus 로고
    • Learning when training data are costly: The effect of class distribution on tree induction
    • G. Weiss and F. Provost, "Learning when Training Data Are Costly: The Effect of Class Distribution on Tree Induction," J. Artificial Intelligence Research, vol.19, pp. 315-354, 2003.
    • (2003) J. Artificial Intelligence Research , vol.19 , pp. 315-354
    • Weiss, G.1    Provost, F.2
  • 13
    • 27144549260 scopus 로고    scopus 로고
    • Editorial: Special issue on learning from imbalanced data sets
    • N. Chawla, N. Japkowicz, and A. Kotcz, "Editorial: Special Issue on Learning from Imbalanced Data Sets," ACM SIGKDD Explorations Newsletter, vol.6, no.1, pp. 1-6, 2004.
    • (2004) ACM SIGKDD Explorations Newsletter , vol.6 , Issue.1 , pp. 1-6
    • Chawla, N.1    Japkowicz, N.2    Kotcz, A.3
  • 14
    • 0001972236 scopus 로고    scopus 로고
    • Addressing the curse of imbalanced data sets: One sided sampling
    • M. Kubat and S. Matwin, "Addressing the Curse of Imbalanced Data Sets: One Sided Sampling," Proc. 14th Int'l Conf. Machine Learning, pp. 179-186, 1997.
    • (1997) Proc. 14th Int'l Conf. Machine Learning , pp. 179-186
    • Kubat, M.1    Matwin, S.2
  • 18
    • 1442356040 scopus 로고    scopus 로고
    • A multiple resampling method for learning from imbalanced data sets
    • A. Estabrooks, T. Jo, and N. Japkowicz, "A Multiple Resampling Method for Learning from Imbalanced Data Sets," Computational Intelligence, vol.20, no.1, pp. 18-36, 2004.
    • (2004) Computational Intelligence , vol.20 , Issue.1 , pp. 18-36
    • Estabrooks, A.1    Jo, T.2    Japkowicz, N.3
  • 19
    • 0004708854 scopus 로고    scopus 로고
    • Exploiting the cost (In)sensitivity of decision tree splitting criteria
    • C. Drummond and R.C. Holte, "Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria," Proc. 17th Int'l Conf. Machine Learning, pp. 239-246, 2000.
    • (2000) Proc. 17th Int'l Conf. Machine Learning , pp. 239-246
    • Drummond, C.1    Holte, R.C.2
  • 20
    • 0034825091 scopus 로고    scopus 로고
    • Supervised versus unsupervised binary learning by feedforward neural networks
    • N. Japkowicz, "Supervised versus Unsupervised Binary Learning by Feedforward Neural Networks," Machine Learning, vol.42, nos. 1/2, pp. 97-122, 2001.
    • (2001) Machine Learning , vol.42 , Issue.1-2 , pp. 97-122
    • Japkowicz, N.1
  • 22
    • 85096855936 scopus 로고    scopus 로고
    • One-Class SVMs for document classification
    • L.M. Manevitz and M. Yousef, "One-Class SVMs for Document Classification," J. Machine Learning Research, vol.2, pp. 139-154, 2001.
    • (2001) J. Machine Learning Research , vol.2 , pp. 139-154
    • Manevitz, L.M.1    Yousef, M.2
  • 23
    • 58149180961 scopus 로고    scopus 로고
    • Learning classifiers from only positive and unlabeled data
    • C. Elkan and K. Noto, "Learning Classifiers from Only Positive and Unlabeled Data," Proc. ACM SIGKDD '08, pp. 213-220, 2008.
    • (2008) Proc. ACM SIGKDD '08 , pp. 213-220
    • Elkan, C.1    Noto, K.2
  • 26
    • 67049152595 scopus 로고    scopus 로고
    • Boosting for learning multiple classes with imbalanced class distribution
    • Y. Sun, M. Kamel, and Y. Wang, "Boosting for Learning Multiple Classes with Imbalanced Class Distribution," Proc. Sixth Int'l Conf. Data Mining, pp. 592-602, 2006.
    • (2006) Proc. Sixth Int'l Conf. Data Mining , pp. 592-602
    • Sun, Y.1    Kamel, M.2    Wang, Y.3
  • 27
    • 0002106691 scopus 로고    scopus 로고
    • MetaCost: A general method for making classifiers cost-sensitive
    • P. Domingos, "MetaCost: A General Method for Making Classifiers Cost-Sensitive," Proc. ACM SIGKDD '99, pp. 155-164, 1999.
    • (1999) Proc. ACM SIGKDD '99 , pp. 155-164
    • Domingos, P.1
  • 29
    • 12244287068 scopus 로고    scopus 로고
    • An iterative method for multi-class cost-sensitive learning
    • N. Abe, B. Zadrozny, and J. Langford, "An Iterative Method for Multi-Class Cost-Sensitive Learning," Proc. ACM SIGKDD '04, pp. 3-11, 2004.
    • (2004) Proc. ACM SIGKDD '04 , pp. 3-11
    • Abe, N.1    Zadrozny, B.2    Langford, J.3
  • 32
  • 34
    • 33749563073 scopus 로고    scopus 로고
    • Training linear SVMs in linear time
    • T. Joachims, "Training Linear SVMs in Linear Time," Proc. ACM SIGKDD '06, pp. 217-226, 2006.
    • (2006) Proc. ACM SIGKDD '06 , pp. 217-226
    • Joachims, T.1
  • 35
    • 33746131974 scopus 로고    scopus 로고
    • Kernel-based distance metric learning for microarray data classification
    • H. Xiong and X. Chen, "Kernel-Based Distance Metric Learning for Microarray Data Classification," BMC Bioinformatics, vol.7, no.299, pp. 1-11, 2006.
    • (2006) BMC Bioinformatics , vol.7 , Issue.299 , pp. 1-11
    • Xiong, H.1    Chen, X.2
  • 36
    • 3242765279 scopus 로고    scopus 로고
    • A Bias-variance analysis of a real world learning problem: The coil challenge 2000
    • P.V. der Putten and M. van Someren, "A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000," Machine Learning, vol.57, nos. 1/2, pp. 177-195, 2004.
    • (2004) Machine Learning , vol.57 , Issue.1-2 , pp. 177-195
    • Der Putten, P.V.1    Van Someren, M.2
  • 37
    • 58049141286 scopus 로고    scopus 로고
    • FAST: A ROC-based feature selection metric for small samples and imbalanced data classification problems
    • X. Chen and M. Wasikowski, "FAST: A ROC-Based Feature Selection Metric for Small Samples and Imbalanced Data Classification Problems," Proc. ACM SIGKDD '08, pp. 124-133, 2008.
    • (2008) Proc. ACM SIGKDD '08 , pp. 124-133
    • Chen, X.1    Wasikowski, M.2
  • 38
    • 0035789256 scopus 로고    scopus 로고
    • Magical thinking in data mining: Lessons from coil challenge 2000
    • C. Elkan, "Magical Thinking in Data Mining: Lessons from CoIL Challenge 2000," Proc. ACM SIGKDD '01, pp. 426-431, 2001.
    • (2001) Proc. ACM SIGKDD '01 , pp. 426-431
    • Elkan, C.1
  • 39
    • 33745561205 scopus 로고    scopus 로고
    • An introduction to variable and feature selection
    • I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection," J. Machine Learning Research, vol.3, pp. 1157- 1182, 2003.
    • (2003) J. Machine Learning Research , vol.3 , pp. 1157-1182
    • Guyon, I.1    Elisseeff, A.2
  • 42
    • 0038329332 scopus 로고    scopus 로고
    • An improved branch and bound algorithm for feature selection
    • X. Chen, "An Improved Branch and Bound Algorithm for Feature Selection," Pattern Recognition Letters, vol.24, no.12, pp. 1925-1933, 2003.
    • (2003) Pattern Recognition Letters , vol.24 , Issue.12 , pp. 1925-1933
    • Chen, X.1
  • 43
    • 34547984193 scopus 로고    scopus 로고
    • Minimum reference set based feature selection for small sample classifications
    • X. Chen and J.C. Jeong, "Minimum Reference Set Based Feature Selection for Small Sample Classifications," Proc. 24th Int'l Conf. Machine Learning, pp. 153-160, 2006.
    • (2006) Proc. 24th Int'l Conf. Machine Learning , pp. 153-160
    • Chen, X.1    Jeong, J.C.2
  • 44
    • 25144492516 scopus 로고    scopus 로고
    • Efficient feature selection via analysis of relevance and redundancy
    • L. Yu and H. Liu, "Efficient Feature Selection via Analysis of Relevance and Redundancy," J. Machine Learning Research, vol.5, pp. 1205-1224, 2004.
    • (2004) J. Machine Learning Research , vol.5 , pp. 1205-1224
    • Yu, L.1    Liu, H.2
  • 50
    • 27344448597 scopus 로고    scopus 로고
    • Feature selection and the class imbalance problem in predicting protein function from sequence
    • A. Al Shahib, R. Breitling, and D. Gilbert, "Feature Selection and the Class Imbalance Problem in Predicting Protein Function from Sequence," Applied Bioinformatics, vol.4, pp. 195-203, 2005.
    • (2005) Applied Bioinformatics , vol.4 , pp. 195-203
    • Al Shahib, A.1    Breitling, R.2    Gilbert, D.3
  • 51
    • 85146422424 scopus 로고
    • The feature selection problem: Traditional methods and new algorithm
    • K. Kira and L. Rendell, "The Feature Selection Problem: Traditional Methods and New Algorithm," Proc. Ninth Int'l Conf. Machine Learning, pp. 249-256, 1992.
    • (1992) Proc. Ninth Int'l Conf. Machine Learning , pp. 249-256
    • Kira, K.1    Rendell, L.2
  • 58
    • 33645690579 scopus 로고    scopus 로고
    • Fast binary feature selection with conditional mutual information
    • F. Fleuret, "Fast Binary Feature Selection with Conditional Mutual Information," J. Machine Learning Research, vol.5, pp. 1531-1555, 2004.
    • (2004) J. Machine Learning Research , vol.5 , pp. 1531-1555
    • Fleuret, F.1
  • 59
    • 24344458137 scopus 로고    scopus 로고
    • Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy
    • Aug.
    • H. Peng, F. Long, and C. Ding, "Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy, " IEEE Trans. Pattern Analysis and Machine Intelligence, vol.27, no.8, pp. 1226-1238, Aug. 2005.
    • (2005) IEEE Trans. Pattern Analysis and Machine Intelligence , vol.27 , Issue.8 , pp. 1226-1238
    • Peng, H.1    Long, F.2    Ding, C.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.