메뉴 건너뛰기




Volumn 25, Issue 1, 2012, Pages 13-21

On the effectiveness of preprocessing methods when dealing with different levels of class imbalance

Author keywords

Classification; Imbalance; Multi dimensional scaling; Performance measures; Resampling

Indexed keywords

ARTIFICIAL INTELLIGENCE; KNOWLEDGE BASED SYSTEMS; SOFTWARE ENGINEERING;

EID: 80052394779     PISSN: 09507051     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.knosys.2011.06.013     Document Type: Article
Times cited : (311)

References (56)
  • 2
    • 33845536164 scopus 로고    scopus 로고
    • The class imbalance problem: A systematic study
    • N. Japkowicz, and S. Stephen The class imbalance problem: a systematic study Intelligent Data Analysis 6 5 2002 429 449
    • (2002) Intelligent Data Analysis , vol.6 , Issue.5 , pp. 429-449
    • Japkowicz, N.1    Stephen, S.2
  • 3
    • 54949132937 scopus 로고    scopus 로고
    • A comparative study on rough set based class imbalance learning
    • J. Liu, Q. Hu, and D. Yu A comparative study on rough set based class imbalance learning Knowledge-Based Systems 21 8 2008 753 763
    • (2008) Knowledge-Based Systems , vol.21 , Issue.8 , pp. 753-763
    • Liu, J.1    Hu, Q.2    Yu, D.3
  • 6
    • 51149102669 scopus 로고    scopus 로고
    • An application of supervised and unsupervised learning approaches to telecommunications fraud detection
    • C.S. Hilas, and P.A. Mastorocostas An application of supervised and unsupervised learning approaches to telecommunications fraud detection Knowledge-Based Systems 21 7 2008 721 726
    • (2008) Knowledge-Based Systems , vol.21 , Issue.7 , pp. 721-726
    • Hilas, C.S.1    Mastorocostas, P.A.2
  • 8
    • 0031998121 scopus 로고    scopus 로고
    • Machine learning for the detection of oil spills in satellite radar images
    • M. Kubat, R.C. Holte, and S. Matwin Machine learning for the detection of oil spills in satellite radar images Machine Learning 30 2-3 1998 195 215
    • (1998) Machine Learning , vol.30 , Issue.23 , pp. 195-215
    • Kubat, M.1    Holte, R.C.2    Matwin, S.3
  • 9
    • 17844387127 scopus 로고    scopus 로고
    • Neighbor-weighted K-nearest neighbor for unbalanced text corpus
    • DOI 10.1016/j.eswa.2004.12.023, PII S0957417404001708
    • S. Tan Neighbor-weighted K-nearest neighbor for unbalanced text corpus Expert Systems with Applications 28 4 2005 667 671 (Pubitemid 40583844)
    • (2005) Expert Systems with Applications , vol.28 , Issue.4 , pp. 667-671
    • Tan, S.1
  • 10
    • 16644402628 scopus 로고    scopus 로고
    • Feature selection for text categorization on imbalanced data
    • Z. Zheng, X. Wu, and R. Srihari Feature selection for text categorization on imbalanced data SIGKDD Explorations Newsletter 6 1 2004 80 89
    • (2004) SIGKDD Explorations Newsletter , vol.6 , Issue.1 , pp. 80-89
    • Zheng, Z.1    Wu, X.2    Srihari, R.3
  • 11
    • 33646142788 scopus 로고    scopus 로고
    • Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem
    • Y.-M. Huang, C.-M. Hung, and H.C. Jiau Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem Nonlinear Analysis: Real World Applications 7 4 2006 720 757
    • (2006) Nonlinear Analysis: Real World Applications , vol.7 , Issue.4 , pp. 720-757
    • Huang, Y.-M.1    Hung, C.-M.2    Jiau, H.C.3
  • 12
    • 33646421421 scopus 로고    scopus 로고
    • Evaluation of classifiers for an uneven class distribution problem
    • S. Daskalaki, I. Kopanas, and N. Avouris Evaluation of classifiers for an uneven class distribution problem Applied Artificial Intelligence 20 5 2006 381 417
    • (2006) Applied Artificial Intelligence , vol.20 , Issue.5 , pp. 381-417
    • Daskalaki, S.1    Kopanas, I.2    Avouris, N.3
  • 16
    • 80052414830 scopus 로고    scopus 로고
    • Evolutionary-based selection of generalized instances for imbalanced classification
    • S. García, J. Derrac, I. Triguero, C.J. Carmona, and F. Herrera Evolutionary-based selection of generalized instances for imbalanced classification Knowledge-Based Systems 25 1 2012 3 12
    • (2012) Knowledge-Based Systems , vol.25 , Issue.1 , pp. 3-12
    • García, S.1    Derrac, J.2    Triguero, I.3    Carmona, C.J.4    Herrera, F.5
  • 17
    • 14644390912 scopus 로고    scopus 로고
    • Using AUC and accuracy in evaluating learning algorithms
    • DOI 10.1109/TKDE.2005.50
    • H. Jin, and C.X. Ling Using AUC and accuracy in evaluating learning algorithms IEEE Transactions on Knowledge and Data Engineering 17 3 2005 299 310 (Pubitemid 40320164)
    • (2005) IEEE Transactions on Knowledge and Data Engineering , vol.17 , Issue.3 , pp. 299-310
    • Huang, J.1    Ling, C.X.2
  • 18
    • 34547372256 scopus 로고    scopus 로고
    • Optimized precision - A new measure for classifier performance evaluation
    • 1688586, 2006 IEEE Congress on Evolutionary Computation, CEC 2006
    • R. Ranawana, V. Palade, Optimized precision - a new measure for classifier performance evaluation, in: Proceedings of the IEEE Congress on Computational Intelligence, Vancouver, Canada, 2006, pp. 2254-2261. (Pubitemid 47130775)
    • (2006) 2006 IEEE Congress on Evolutionary Computation, CEC 2006 , pp. 2254-2261
    • Ranawana, R.1    Palade, V.2
  • 23
    • 27144531570 scopus 로고    scopus 로고
    • A study of the behavior of several methods for balancing machine learning training data
    • G.E.A.P.A. Batista, R.C. Prati, and M.C. Monard A study of the behavior of several methods for balancing machine learning training data ACM SIGKDD Explorations Newsletter 6 1 2004 20 29
    • (2004) ACM SIGKDD Explorations Newsletter , vol.6 , Issue.1 , pp. 20-29
    • Batista, G.E.A.P.A.1    Prati, R.C.2    Monard, M.C.3
  • 25
    • 1442356040 scopus 로고    scopus 로고
    • A multiple resampling method for learning from imbalanced data sets
    • A. Estabrooks, T. Jo, and N. Japkowicz A multiple resampling method for learning from imbalanced data sets Computational Intelligence 20 1 2004 18 36
    • (2004) Computational Intelligence , vol.20 , Issue.1 , pp. 18-36
    • Estabrooks, A.1    Jo, T.2    Japkowicz, N.3
  • 26
    • 79957446849 scopus 로고    scopus 로고
    • A novel virtual sample generation method based on Gaussian distribution
    • J. Yang, X. Yu, Z.-Q. Xie, and J.-P. Zhang A novel virtual sample generation method based on Gaussian distribution Knowledge-Based Systems 24 6 2011 740 748
    • (2011) Knowledge-Based Systems , vol.24 , Issue.6 , pp. 740-748
    • Yang, J.1    Yu, X.2    Xie, Z.-Q.3    Zhang, J.-P.4
  • 27
    • 1442275185 scopus 로고    scopus 로고
    • Learning when training data are costly: The effect of class distribution on tree induction
    • G.M. Weiss, and F.J. Provost Learning when training data are costly: the effect of class distribution on tree induction Journal of Artificial Intelligence Research 19 2003 315 354 (Pubitemid 41525924)
    • (2003) Journal of Artificial Intelligence Research , vol.19 , pp. 315-354
    • Weiss, G.M.1    Provost, F.2
  • 31
    • 27144501672 scopus 로고    scopus 로고
    • Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning
    • Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005. Proceedings
    • H. Han, W.-Y. Wang, B.-H. Mao, Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, in: Proceedings of the 1th International Conference on Intelligent Computing, Hefei, China, 2005, pp. 878-887. (Pubitemid 41491129)
    • (2005) Lecture Notes in Computer Science , vol.3644 , Issue.PART I , pp. 878-887
    • Han, H.1    Wang, W.-Y.2    Mao, B.-H.3
  • 33
    • 27144479454 scopus 로고    scopus 로고
    • Learning from imbalanced data sets with boosting and data generation: The DataBoost-IM approach
    • G. Hongyu, and V.L. Herna Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach SIGKDD Explorations Newsletter 6 1 2004 30 39
    • (2004) SIGKDD Explorations Newsletter , vol.6 , Issue.1 , pp. 30-39
    • Hongyu, G.1    Herna, V.L.2
  • 34
    • 34547659409 scopus 로고    scopus 로고
    • KNN approach to unbalanced data distributions: A case study involving information extraction
    • Washington DC, USA
    • J. Zhang, I. Mani, kNN approach to unbalanced data distributions: a case study involving information extraction, in: Proceedings of the Workshop on Learning from Imbalanced Datasets, Washington DC, USA, 2003.
    • (2003) Proceedings of the Workshop on Learning from Imbalanced Datasets
    • Zhang, J.1    Mani, I.2
  • 38
    • 84947425690 scopus 로고    scopus 로고
    • Improving Identification of Difficult Small Classes by Balancing Class Distribution
    • Artificial Intelligence in Medicine
    • J. Laurikkala, Improving identification of difficult small classes by balancing class distribution, in: Proceedings of the 8th Conference on Artificial Intelligence in Medicine, Cascais, Portugal, 2001, pp. 63-66. (Pubitemid 33301585)
    • (2001) Lecture Notes in Computer Science , Issue.2101 , pp. 63-66
    • Laurikkala, J.1
  • 39
    • 0015361129 scopus 로고
    • Asymptotic properties of nearest neighbour rules using edited data
    • D.L. Wilson Asymptotic properties of nearest neighbour rules using edited data IEEE Transactions on Systems, Man and Cybernetics 2 1972 408 421
    • (1972) IEEE Transactions on Systems, Man and Cybernetics , vol.2 , pp. 408-421
    • Wilson, D.L.1
  • 42
    • 70349617264 scopus 로고    scopus 로고
    • Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy
    • S. García, and F. Herrera Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy Evolutionary Computation 17 3 2009 275 306
    • (2009) Evolutionary Computation , vol.17 , Issue.3 , pp. 275-306
    • García, S.1    Herrera, F.2
  • 43
    • 33750117549 scopus 로고    scopus 로고
    • Pruning support vectors for imbalanced data classification
    • DOI 10.1109/IJCNN.2005.1556167, 1556167, Proceedings of the International Joint Conference on Neural Networks, IJCNN 2005
    • X. Chen, B. Gerlach, D. Casasent, Pruning support vectors for imbalanced data classification, in: Proceedings of the International Joint Conference on Neural Networks, Montreal, Canada, 2005, pp. 1883-1888. (Pubitemid 44591487)
    • (2005) Proceedings of the International Joint Conference on Neural Networks , vol.3 , pp. 1883-1888
    • Chen, X.-W.1    Gerlach, B.2    Casasent, D.3
  • 44
    • 0035283313 scopus 로고    scopus 로고
    • Robust classification for imprecise environments
    • DOI 10.1023/A:1007601015854
    • F. Provost, and T. Fawcett Robust classification for imprecise environments Machine Learning 42 3 2001 203 231 (Pubitemid 32188799)
    • (2001) Machine Learning , vol.42 , Issue.3 , pp. 203-231
    • Provost, F.1    Fawcett, T.2
  • 45
    • 0031191630 scopus 로고    scopus 로고
    • The use of the area under the ROC curve in the evaluation of machine learning algorithms
    • PII S0031320396001422
    • A.P. Bradley The use of the area under the ROC curve in the evaluation of machine learning algorithms Pattern Recognition 30 7 1997 1145 1159 (Pubitemid 127406521)
    • (1997) Pattern Recognition , vol.30 , Issue.7 , pp. 1145-1159
    • Bradley, A.P.1
  • 48
    • 78149483936 scopus 로고    scopus 로고
    • Theoretical analysis of a performance measure for imbalanced data
    • Istanbul, Turkey
    • V. García, R.A. Mollineda, J.S. Sánchez, Theoretical analysis of a performance measure for imbalanced data, in: Proceedings of the 20th International Conference on Pattern Recognition, Istanbul, Turkey, 2010, pp. 617-620.
    • (2010) Proceedings of the 20th International Conference on Pattern Recognition , pp. 617-620
    • V. García1
  • 50
    • 56049098839 scopus 로고    scopus 로고
    • A visualization-based exploratory technique for classifier comparison with respect to multiple metrics and multiple domains
    • Antwerp, Belgium
    • R. Alaiz-Rodríguez, N. Japkowicz, P. Tischer, A visualization-based exploratory technique for classifier comparison with respect to multiple metrics and multiple domains, in: Proceedings of the 15th European Conference on Machine Learning, Antwerp, Belgium, 2008, pp. 660-665.
    • (2008) Proceedings of the 15th European Conference on Machine Learning , pp. 660-665
    • Alaiz-Rodríguez, R.1
  • 52
    • 50549093573 scopus 로고    scopus 로고
    • On the k-NN performance in a challenging scenario of imbalance and overlapping
    • V. García, R.A. Mollineda, and J.S. Sánchez On the k-NN performance in a challenging scenario of imbalance and overlapping Pattern Analysis and Applications 11 3 2008 269 280
    • (2008) Pattern Analysis and Applications , vol.11 , Issue.3 , pp. 269-280
    • García, V.1    Mollineda, R.A.2    Sánchez, J.S.3
  • 53
    • 71749101234 scopus 로고    scopus 로고
    • Knowledge discovery from imbalanced and noisy data
    • J.V. Hulse, and T. Khoshgoftaar Knowledge discovery from imbalanced and noisy data Data & Knowledge Engineering 68 12 2009 1513 1542
    • (2009) Data & Knowledge Engineering , vol.68 , Issue.12 , pp. 1513-1542
    • Hulse, J.V.1    Khoshgoftaar, T.2
  • 54
  • 55
    • 77956198600 scopus 로고    scopus 로고
    • The impact of small disjuncts on classifier learning
    • R. Stahlbock, S.F. Crone, S. Lessmann, Annals of Information Systems Springer US (chapter 9)
    • G.M. Weiss The impact of small disjuncts on classifier learning R. Stahlbock, S.F. Crone, S. Lessmann, Data Mining Annals of Information Systems vol. 8 2010 Springer US 193 226 (chapter 9)
    • (2010) Data Mining , vol.8 , pp. 193-226
    • Weiss, G.M.1
  • 56
    • 79957915328 scopus 로고    scopus 로고
    • Addressing data complexity for imbalanced data sets: Analysis of SMOTE-based oversampling and evolutionary undersampling
    • doi:10.1007/s00500-010-0625-8 in press
    • J. Luengo, A. Fernández, S. García, F. Herrera, Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling, Soft Computing - A Fusion of Foundations, Methodologies and Applications, in press, doi:10.1007/s00500-010-0625-8.
    • Soft Computing - A Fusion of Foundations, Methodologies and Applications
    • J. Luengo1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.