메뉴 건너뛰기




Volumn 39, Issue 7, 2012, Pages 6585-6608

Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics

Author keywords

Class overlap; Classification; Cost sensitive learning; Dataset shift; Imbalanced datasets; Preprocessing

Indexed keywords

CLASS OVERLAP; COST-SENSITIVE LEARNING; DATASET SHIFT; IMBALANCED DATA-SETS; PREPROCESSING;

EID: 84856964446     PISSN: 09574174     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.eswa.2011.12.043     Document Type: Article
Times cited : (270)

References (86)
  • 7
    • 27144531570 scopus 로고    scopus 로고
    • A study of the behaviour of several methods for balancing machine learning training data
    • G.E.A.P.A. Batista, R.C. Prati, and M.C. Monard A study of the behaviour of several methods for balancing machine learning training data SIGKDD Explorations 6 2004 20 29
    • (2004) SIGKDD Explorations , vol.6 , pp. 20-29
    • Batista, G.E.A.P.A.1    Prati, R.C.2    Monard, M.C.3
  • 8
    • 78049529700 scopus 로고    scopus 로고
    • Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets
    • P. Bermejo, J. Gámez, and J. Puerta Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets Expert Systems with Applications 38 2011 2072 2080
    • (2011) Expert Systems with Applications , vol.38 , pp. 2072-2080
    • Bermejo, P.1    Gámez, J.2    Puerta, J.3
  • 11
    • 0031191630 scopus 로고    scopus 로고
    • The use of the area under the ROC curve in the evaluation of machine learning algorithms
    • PII S0031320396001422
    • A.P. Bradley The use of the area under the roc curve in the evaluation of machine learning algorithms Pattern Recognition 30 1997 1145 1159 (Pubitemid 127406521)
    • (1997) Pattern Recognition , vol.30 , Issue.7 , pp. 1145-1159
    • Bradley, A.P.1
  • 14
    • 0026408256 scopus 로고
    • Fuzzy art: Fast stable learning and categorization of analog patterns by an adaptive resonance system
    • G.A. Carpenter, S. Grossberg, and D.B. Rosen Fuzzy art: Fast stable learning and categorization of analog patterns by an adaptive resonance system Neural Networks 4 1991 759 771
    • (1991) Neural Networks , vol.4 , pp. 759-771
    • Carpenter, G.A.1    Grossberg, S.2    Rosen, D.B.3
  • 17
    • 27144549260 scopus 로고    scopus 로고
    • Editorial: Special issue on learning from imbalanced data sets
    • N.V. Chawla, N. Japkowicz, and A. Kotcz Editorial: Special issue on learning from imbalanced data sets SIGKDD Explorations 6 2004 1 6
    • (2004) SIGKDD Explorations , vol.6 , pp. 1-6
    • Chawla, N.V.1    Japkowicz, N.2    Kotcz, A.3
  • 18
    • 45849098303 scopus 로고    scopus 로고
    • An information granulation based data mining approach for classifying imbalanced data
    • M.-C. Chen, L.-S. Chen, C.-C. Hsu, and W.-R. Zeng An information granulation based data mining approach for classifying imbalanced data Information Sciences 178 2008 3214 3227
    • (2008) Information Sciences , vol.178 , pp. 3214-3227
    • Chen, M.-C.1    Chen, L.-S.2    Hsu, C.-C.3    Zeng, W.-R.4
  • 19
    • 78650905163 scopus 로고    scopus 로고
    • Graph-based feature selection for object-oriented classification in vhr airborne imagery
    • X. Chen, T. Fang, H. Huo, and D. Li Graph-based feature selection for object-oriented classification in vhr airborne imagery IEEE Transactions on Geoscience and Remote Sensing 49 2011 353 365
    • (2011) IEEE Transactions on Geoscience and Remote Sensing , vol.49 , pp. 353-365
    • Chen, X.1    Fang, T.2    Huo, H.3    Li, D.4
  • 20
    • 58549104329 scopus 로고    scopus 로고
    • A framework for monitoring classifiers performance: When and why failure occurs?
    • D.A. Cieslak, and N.V. Chawla A framework for monitoring classifiers performance: When and why failure occurs? Knowledge and Information Systems 18 2009 83 108
    • (2009) Knowledge and Information Systems , vol.18 , pp. 83-108
    • Cieslak, D.A.1    Chawla, N.V.2
  • 21
    • 34249753618 scopus 로고
    • Support vector networks
    • C. Cortes, and V. Vapnik Support vector networks Machine Learning 20 1995 273 297
    • (1995) Machine Learning , vol.20 , pp. 273-297
    • Cortes, C.1    Vapnik, V.2
  • 23
    • 29644438050 scopus 로고    scopus 로고
    • Statistical comparisons of classifiers over multiple data sets
    • J. Demšar Statistical comparisons of classifiers over multiple data sets Journal of Machine Learning Research 7 2006 1 30 (Pubitemid 43022939)
    • (2006) Journal of Machine Learning Research , vol.7 , pp. 1-30
    • Demsar, J.1
  • 24
  • 26
    • 77952875468 scopus 로고    scopus 로고
    • Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets
    • P. Ducange, B. Lazzerini, and F. Marcelloni Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets Soft Computing 14 2010 713 728
    • (2010) Soft Computing , vol.14 , pp. 713-728
    • Ducange, P.1    Lazzerini, B.2    Marcelloni, F.3
  • 28
    • 1442356040 scopus 로고    scopus 로고
    • A multiple resampling method for learning from imbalanced data sets
    • A. Estabrooks, T. Jo, and N. Japkowicz A multiple resampling method for learning from imbalanced data sets Computational Intelligence 20 2004 18 36
    • (2004) Computational Intelligence , vol.20 , pp. 18-36
    • Estabrooks, A.1    Jo, T.2    Japkowicz, N.3
  • 29
    • 46849096083 scopus 로고    scopus 로고
    • A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets
    • A. Fernández, S. García, M.J. del Jesus, and F. Herrera A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets Fuzzy Sets and Systems 159 2008 2378 2398
    • (2008) Fuzzy Sets and Systems , vol.159 , pp. 2378-2398
    • Fernández, A.1    García, S.2    Del Jesus, M.J.3    Herrera, F.4
  • 30
    • 60849127572 scopus 로고    scopus 로고
    • Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets
    • A. Fernández, M.J. del Jesus, and F. Herrera Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets International Journal of Approximate Reasoning 50 2009 561 577
    • (2009) International Journal of Approximate Reasoning , vol.50 , pp. 561-577
    • Fernández, A.1    Del Jesus, M.J.2    Herrera, F.3
  • 31
    • 75149159107 scopus 로고    scopus 로고
    • On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets
    • A. Fernández, M.J. del Jesús, and F. Herrera On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets Information Sciences 180 2010 1268 1291
    • (2010) Information Sciences , vol.180 , pp. 1268-1291
    • Fernández, A.1    Del Jesús, M.J.2    Herrera, F.3
  • 32
    • 84944811700 scopus 로고
    • The use of ranks to avoid the assumption of normality implicit in the analysis of variance
    • M. Friedman The use of ranks to avoid the assumption of normality implicit in the analysis of variance Journal of the American Statistical Association 32 1937 675 701
    • (1937) Journal of the American Statistical Association , vol.32 , pp. 675-701
    • Friedman, M.1
  • 33
    • 3543051838 scopus 로고    scopus 로고
    • Functional trees
    • J. Gama Functional trees Machine Learning 55 2004 219 250
    • (2004) Machine Learning , vol.55 , pp. 219-250
    • Gama, J.1
  • 34
    • 64549120231 scopus 로고    scopus 로고
    • A study of statistical techniques and performance measures for genetics-based machine learning: Accuracy and interpretability
    • S. García, A. Fernández, J. Luengo, and F. Herrera A study of statistical techniques and performance measures for genetics-based machine learning: Accuracy and interpretability Soft Computing 13 2009 959 977
    • (2009) Soft Computing , vol.13 , pp. 959-977
    • García, S.1    Fernández, A.2    Luengo, J.3    Herrera, F.4
  • 35
    • 77549084648 scopus 로고    scopus 로고
    • Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power
    • S. García, A. Fernández, J. Luengo, and F. Herrera Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power Information Sciences 180 2010 2044 2064
    • (2010) Information Sciences , vol.180 , pp. 2044-2064
    • García, S.1    Fernández, A.2    Luengo, J.3    Herrera, F.4
  • 36
    • 58149287952 scopus 로고    scopus 로고
    • An extension on "statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons
    • S. García, and F. Herrera An extension on "statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons Journal of Machine Learning Research 9 2008 2607 2624
    • (2008) Journal of Machine Learning Research , vol.9 , pp. 2607-2624
    • García, S.1    Herrera, F.2
  • 37
    • 50549093573 scopus 로고    scopus 로고
    • On the k-NN performance in a challenging scenario of imbalance and overlapping
    • V. García, R. Mollineda, and J.S. Sánchez On the k-NN performance in a challenging scenario of imbalance and overlapping Pattern Analysis Applications 11 2008 269 280
    • (2008) Pattern Analysis Applications , vol.11 , pp. 269-280
    • García, V.1    Mollineda, R.2    Sánchez, J.S.3
  • 38
    • 80052660213 scopus 로고    scopus 로고
    • An adversarial view of covariate shift and a minimax approach
    • J. Quiñonero Candela, M. Sugiyama, A. Schwaighofer, N.D. Lawrence, The MIT Press
    • A. Globerson, C.H. Teo, A. Smola, and S. Roweis An adversarial view of covariate shift and a minimax approach J. Quiñonero Candela, M. Sugiyama, A. Schwaighofer, N.D. Lawrence, Dataset shift in machine learning 2009 The MIT Press 179 198
    • (2009) Dataset Shift in Machine Learning , pp. 179-198
    • Globerson, A.1    Teo, C.H.2    Smola, A.3    Roweis, S.4
  • 40
    • 27144501672 scopus 로고    scopus 로고
    • Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning
    • Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005. Proceedings
    • Han, H.; Wang, W.; & Mao, B. (2005). Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Proceedings of the international conference on intelligent computing (pp. 878-887). (Pubitemid 41491129)
    • (2005) Lecture Notes in Computer Science , vol.3644 , Issue.PART I , pp. 878-887
    • Han, H.1    Wang, W.-Y.2    Mao, B.-H.3
  • 41
    • 0037410687 scopus 로고    scopus 로고
    • Choosing k for two-class nearest neighbour classifiers with unbalanced classes
    • DOI 10.1016/S0167-8655(02)00394-X
    • D.J. Hand, and V. Vinciotti Choosing k for two-class nearest neighbour classifiers with unbalanced classes Pattern Recognition Letters 24 2003 1555 1562 (Pubitemid 36225860)
    • (2003) Pattern Recognition Letters , vol.24 , Issue.9-10 , pp. 1555-1562
    • Hand, D.J.1    Vinciotti, V.2
  • 46
    • 0035415473 scopus 로고    scopus 로고
    • Effect of rule weights in fuzzy rule-based classification systems
    • DOI 10.1109/91.940964, PII S1063670601065353, Fuzzy Logic at the Turn of the Millennium
    • H. Ishibuchi, and T. Nakashima Effect of rule weights in fuzzy rule-based classification systems IEEE Transactions on Fuzzy Systems 9 2001 506 515 (Pubitemid 32935694)
    • (2001) IEEE Transactions on Fuzzy Systems , vol.9 , Issue.4 , pp. 506-515
    • Ishibuchi, H.1    Nakashima, T.2
  • 47
    • 26844469668 scopus 로고    scopus 로고
    • Rule weight specification in fuzzy rule-based classification systems
    • DOI 10.1109/TFUZZ.2004.841738
    • H. Ishibuchi, and T. Yamamoto Rule weight specification in fuzzy rule-based classification systems IEEE Transactions on Fuzzy Systems 13 2005 428 435 (Pubitemid 41461439)
    • (2005) IEEE Transactions on Fuzzy Systems , vol.13 , Issue.4 , pp. 428-435
    • Ishibuchi, H.1    Yamamoto, T.2
  • 50
    • 34347391248 scopus 로고    scopus 로고
    • Granular enhancement of fuzzy art/som neural classifiers based on lattice theory
    • Kaburlasos, V. G. (2007). Granular enhancement of fuzzy art/som neural classifiers based on lattice theory. In Computational intelligence based on lattice theory (pp. 3-23).
    • (2007) Computational Intelligence Based on Lattice Theory , pp. 3-23
    • Kaburlasos, V.G.1
  • 51
    • 0001972236 scopus 로고    scopus 로고
    • Addressing the curse of imbalanced training sets: One-sided selection
    • Kubat, M.; & Matwin, S. (1997). Addressing the curse of imbalanced training sets: One-sided selection. In International conference on machine learning (pp. 179-186).
    • (1997) International Conference on Machine Learning , pp. 179-186
    • Kubat, M.1    Matwin, S.2
  • 52
    • 84947425690 scopus 로고    scopus 로고
    • Improving Identification of Difficult Small Classes by Balancing Class Distribution
    • Artificial Intelligence in Medicine
    • Laurikkala, J. (2001). Improving identification of difficult small classes by balancing class distribution. In Proceedings of the conference on AI in medicine in Europe: Artificial intelligence medicine (pp. 63-66). (Pubitemid 33301585)
    • (2001) Lecture Notes in Computer Science , Issue.2101 , pp. 63-66
    • Laurikkala, J.1
  • 55
    • 79957915328 scopus 로고    scopus 로고
    • Addressing data complexity for imbalanced data sets: Analysis of SMOTE-based oversampling and evolutionary undersampling
    • J. Luengo, A. Fernández, S. García, and F. Herrera Addressing data complexity for imbalanced data sets: Analysis of SMOTE-based oversampling and evolutionary undersampling Soft Computing 15 2011 1909 1936
    • (2011) Soft Computing , vol.15 , pp. 1909-1936
    • Luengo, J.1    Fernández, A.2    García, S.3    Herrera, F.4
  • 56
    • 77955046700 scopus 로고    scopus 로고
    • On the suitability of combining feature selection and resampling to manage data complexity
    • Martín-Félez, R.; & Mollineda, R. (2010). On the suitability of combining feature selection and resampling to manage data complexity. In CAEPIA 2009, LNAI (Vol. 5988, pp. 141-150).
    • (2010) CAEPIA 2009, LNAI , vol.5988 , pp. 141-150
    • Martín-Félez, R.1    Mollineda, R.2
  • 58
    • 84870054779 scopus 로고    scopus 로고
    • Repairing fractures between data using genetic programming-based feature extraction: A case study in cancer diagnosis
    • doi:10.1016/j.ins.2010.09.018 in press
    • Moreno-Torres, J. G.; Llorà, X.; Goldberg, D. E.; & Bhargava, R. (in press). Repairing fractures between data using genetic programming-based feature extraction: A case study in cancer diagnosis. Information Sciences, doi:10.1016/j.ins.2010.09.018.
    • Information Sciences
    • Moreno-Torres, J.G.1    Llorà, X.2    Goldberg, D.E.3    Bhargava, R.4
  • 59
    • 55549116330 scopus 로고    scopus 로고
    • Evolutionary rule-based systems for imbalanced datasets
    • A. Orriols-Puig, and E. Bernadó-Mansilla Evolutionary rule-based systems for imbalanced datasets Soft Computing 13 2009 213 225
    • (2009) Soft Computing , vol.13 , pp. 213-225
    • Orriols-Puig, A.1    Bernadó-Mansilla, E.2
  • 60
    • 9444270977 scopus 로고    scopus 로고
    • Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior
    • MICAI 2004: Advances in Artificial Intelligence Third Mexican International Conference on Artificial Intelligence Mexico City, Mexico, April 26-30, 2004 Proceedings
    • Prati, R. C.; & Batista, G. E. A. P. A. (2004). Class imbalances versus class overlapping: An analysis of a learning system behavior. In Proceedings of Mexican international conference on artificial intelligence (MICAI) (pp. 312-321). (Pubitemid 38716795)
    • (2004) Lecture Notes in Computer Science , Issue.2972 , pp. 312-321
    • Prati, R.C.1    Batista, G.E.A.P.A.2    Monard, M.C.3
  • 62
    • 0028202408 scopus 로고
    • Representation design and brute-force induction in a boeing manufacturing domain
    • P. Riddle, R. Segal, and O. Etzioni Representation design and brute-force induction in a boeing manufacturing domain Applied Artificial Intelligence 8 1994 125 147
    • (1994) Applied Artificial Intelligence , vol.8 , pp. 125-147
    • Riddle, P.1    Segal, R.2    Etzioni, O.3
  • 65
    • 0037527188 scopus 로고    scopus 로고
    • Improving predictive inference under covariate shift by weighting the log-likelihood function
    • H. Shimodaira Improving predictive inference under covariate shift by weighting the log-likelihood function Journal of Statistical Planning and Inference 90 2000 227 244
    • (2000) Journal of Statistical Planning and Inference , vol.90 , pp. 227-244
    • Shimodaira, H.1
  • 66
    • 79951576546 scopus 로고    scopus 로고
    • Induction and pruning of classification rules for prediction of microseismic hazards in coal mines
    • M. Sikora Induction and pruning of classification rules for prediction of microseismic hazards in coal mines Expert Systems with Applications 38 2011 6748 6758
    • (2011) Expert Systems with Applications , vol.38 , pp. 6748-6758
    • Sikora, M.1
  • 67
  • 68
    • 34547673383 scopus 로고    scopus 로고
    • Cost-sensitive boosting for classification of imbalanced data
    • DOI 10.1016/j.patcog.2007.04.009, PII S0031320307001835
    • Y. Sun, M.S. Kamel, A.K.C. Wong, and Y. Wang Cost-sensitive boosting for classification of imbalanced data Pattern Recognition 40 2007 3358 3378 (Pubitemid 47223287)
    • (2007) Pattern Recognition , vol.40 , Issue.12 , pp. 3358-3378
    • Sun, Y.1    Kamel, M.S.2    Wong, A.K.C.3    Wang, Y.4
  • 70
    • 0036565589 scopus 로고    scopus 로고
    • An instance-weighting method to induce cost-sensitive trees
    • DOI 10.1109/TKDE.2002.1000348
    • K.M. Ting An instance-weighting method to induce cost-sensitive trees IEEE Transactions on Knowledge and Data Engineering 14 2002 659 665 (Pubitemid 34669622)
    • (2002) IEEE Transactions on Knowledge and Data Engineering , vol.14 , Issue.3 , pp. 659-665
    • Ting, K.M.1
  • 72
  • 78
    • 1442275185 scopus 로고    scopus 로고
    • Learning when training data are costly: The effect of class distribution on tree induction
    • G.M. Weiss, and F.J. Provost Learning when training data are costly: The effect of class distribution on tree induction Journal of Artificial Intelligence Research 19 2003 315 354 (Pubitemid 41525924)
    • (2003) Journal of Artificial Intelligence Research , vol.19 , pp. 315-354
    • Weiss, G.M.1    Provost, F.2
  • 79
  • 80
    • 20844441675 scopus 로고    scopus 로고
    • KBA: Kernel boundary alignment considering imbalanced data distribution
    • DOI 10.1109/TKDE.2005.95
    • G. Wu, and E.Y. Chang Kba: Kernel boundary alignment considering imbalanced data distribution IEEE Transactions on Knowledge and Data Engineering 17 2005 786 795 (Pubitemid 40860458)
    • (2005) IEEE Transactions on Knowledge and Data Engineering , vol.17 , Issue.6 , pp. 786-795
    • Wu, G.1    Chang, E.Y.2
  • 81
    • 33947326736 scopus 로고    scopus 로고
    • Power distribution fault cause identification with imbalanced data using the data mining-based fuzzy classification E-Algorithm
    • DOI 10.1109/TPWRS.2006.888990
    • L. Xu, M.-Y. Chow, and L.S. Taylor Power distribution fault cause identification with imbalanced data using the data mining-based fuzzy classification e-algorithm IEEE Transactions on Power Systems 22 2007 164 171 (Pubitemid 46437725)
    • (2007) IEEE Transactions on Power Systems , vol.22 , Issue.1 , pp. 164-171
    • Xu, L.1    Chow, M.-Y.2    Taylor, L.S.3
  • 83
    • 54349101200 scopus 로고    scopus 로고
    • Conceptual equivalence for contrast mining in classification learning
    • Y. Yang, X. Wu, and X. Zhu Conceptual equivalence for contrast mining in classification learning Data and Knowledge Engineering 67 2008 413 429
    • (2008) Data and Knowledge Engineering , vol.67 , pp. 413-429
    • Yang, Y.1    Wu, X.2    Zhu, X.3
  • 86
    • 31344442851 scopus 로고    scopus 로고
    • Training cost-sensitive neural networks with methods addressing the class imbalance problem
    • DOI 10.1109/TKDE.2006.17
    • Z.-H. Zhou, and X.-Y. Liu Training cost-sensitive neural networks with methods addressing the class imbalance problem IEEE Transactions on Knowledge and Data Engineering 18 2006 63 77 (Pubitemid 43145089)
    • (2006) IEEE Transactions on Knowledge and Data Engineering , vol.18 , Issue.1 , pp. 63-77
    • Zhou, Z.-H.1    Liu, X.-Y.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.