메뉴 건너뛰기




Volumn 13, Issue 4, 2018, Pages 59-76

Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches [Research Frontier]

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE;

EID: 85055271044     PISSN: 1556603X     EISSN: 15566048     Source Type: Journal    
DOI: 10.1109/MCI.2018.2866730     Document Type: Article
Times cited : (295)

References (47)
  • 1
    • 68549133155 scopus 로고    scopus 로고
    • Learning from imbalanced data
    • June
    • H. He and E. A. Garcia, "Learning from imbalanced data," IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263-1284, June 2009.
    • (2009) IEEE Trans. Knowl. Data Eng. , vol.21 , Issue.9 , pp. 1263-1284
    • He, H.1    Garcia, E.A.2
  • 2
    • 84856964446 scopus 로고    scopus 로고
    • Analysis of preprocessing vs. Cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics
    • June
    • V. López, A. Fernández, J. G. Moreno-Torres, and F. Herrera, "Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics," Expert Syst. Appl., vol. 39, no. 7, pp. 6585-6608, June 2012.
    • (2012) Expert Syst. Appl. , vol.39 , Issue.7 , pp. 6585-6608
    • López, V.1    Fernández, A.2    Moreno-Torres, J.G.3    Herrera, F.4
  • 3
    • 27144549260 scopus 로고    scopus 로고
    • Special issue on learning from imbalanced data sets
    • June
    • N. V. Chawla, N. Japkowicz, and A. Kotcz, "Special issue on learning from imbalanced data sets," ACM SIGKDD Explorations Newslett., vol. 6, no. 1, pp. 1-6, June 2004.
    • (2004) ACM SIGKDD Explorations Newslett. , vol.6 , Issue.1 , pp. 1-6
    • Chawla, N.V.1    Japkowicz, N.2    Kotcz, A.3
  • 5
    • 84906788607 scopus 로고    scopus 로고
    • An overview of classification algorithms for imbalanced datasets
    • Apr.
    • V. Ganganwar, "An overview of classification algorithms for imbalanced datasets," Int. J. Emerging Technol. Adv. Eng., vol. 2, no. 4, pp. 42-47, Apr. 2012.
    • (2012) Int. J. Emerging Technol. Adv. Eng. , vol.2 , Issue.4 , pp. 42-47
    • Ganganwar, V.1
  • 6
    • 84878394942 scopus 로고    scopus 로고
    • Evolving diverse ensembles using genetic programming for classification with unbalanced data
    • May
    • U. Bhowan, M. Johnston, M. Zhang, and X. Yao, "Evolving diverse ensembles using genetic programming for classification with unbalanced data," IEEE Trans. Evol. Comput., vol. 17, no. 3, pp. 368-386, May 2013.
    • (2013) IEEE Trans. Evol. Comput. , vol.17 , Issue.3 , pp. 368-386
    • Bhowan, U.1    Johnston, M.2    Zhang, M.3    Yao, X.4
  • 8
    • 85009165593 scopus 로고    scopus 로고
    • Learning from class-imbalanced data: Review of methods and applications
    • May
    • G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, "Learning from class-imbalanced data: Review of methods and applications," Expert Syst. Appl., vol. 73, pp. 220-239, May 2017.
    • (2017) Expert Syst. Appl. , vol.73 , pp. 220-239
    • Haixiang, G.1    Yijing, L.2    Shang, J.3    Mingyun, G.4    Yuanyue, H.5    Bing, G.6
  • 9
    • 84886615058 scopus 로고    scopus 로고
    • Prediction of preterm deliveries from EHG signals using machine learning
    • Oct
    • P. Fergus, P. Cheung, A. Hussain, D. Al-Jumeily, C. Dobbins, and S. Iram, "Prediction of preterm deliveries from EHG signals using machine learning," PloS One, vol. 8, no. 10, p. e77154, Oct. 2013.
    • (2013) PloS One , vol.8 , Issue.10 , pp. e77154
    • Fergus, P.1    Cheung, P.2    Hussain, A.3    Al-Jumeily, D.4    Dobbins, C.5    Iram, S.6
  • 11
    • 85018509302 scopus 로고    scopus 로고
    • Automated detection of premature delivery using empirical mode and wavelet packet decomposition techniques with uterine electromyogram signals
    • May
    • U. R. Acharya, V. K. Sudarshan, S. Q. Rong, Z. Tan, C. M. Lim, J. E. Koh, S. Nayak, and S. V. Bhandary, "Automated detection of premature delivery using empirical mode and wavelet packet decomposition techniques with uterine electromyogram signals," Comput. Biol. Med., vol. 85, pp. 33-42, May 2017.
    • (2017) Comput. Biol. Med. , vol.85 , pp. 33-42
    • Acharya, U.R.1    Sudarshan, V.K.2    Rong, S.Q.3    Tan, Z.4    Lim, C.M.5    Koh, J.E.6    Nayak, S.7    Bhandary, S.V.8
  • 12
    • 84997541772 scopus 로고    scopus 로고
    • Classifying Alzheimer's disease, Lewy body dementia, and normal controls using 3D texture analysis in magnetic resonance images
    • Mar
    • K. Oppedal, K. Engan, T. Eftestol, M. Beyer, and D. Aarsland, "Classifying Alzheimer's disease, Lewy body dementia, and normal controls using 3D texture analysis in magnetic resonance images," Biomed. Signal Process. Control, vol. 33, pp. 19-29, Mar. 2017.
    • (2017) Biomed. Signal Process. Control , vol.33 , pp. 19-29
    • Oppedal, K.1    Engan, K.2    Eftestol, T.3    Beyer, M.4    Aarsland, D.5
  • 13
    • 84946407304 scopus 로고    scopus 로고
    • Joint use of over-and undersampling techniques and cross-validation for the development and assessment of prediction models
    • Nov.
    • R. Blagus and L. Lusa, "Joint use of over-and undersampling techniques and cross-validation for the development and assessment of prediction models," BMC Bioinf., vol. 16, no. 1, pp. 1-10, Nov. 2015.
    • (2015) BMC Bioinf. , vol.16 , Issue.1 , pp. 1-10
    • Blagus, R.1    Lusa, L.2
  • 14
    • 27144531570 scopus 로고    scopus 로고
    • A study of the behavior of several methods for balancing machine learning training data
    • ACM June
    • G. E. Batista, R. C. Prati, and M. C. Monard, "A study of the behavior of several methods for balancing machine learning training data," ACM SIGKDD Explorations Newslett., vol. 6, no. 1, pp. 20-29, June 2004.
    • (2004) SIGKDD Explorations Newslett. , vol.6 , Issue.1 , pp. 20-29
    • Batista, G.E.1    Prati, R.C.2    Monard, M.C.3
  • 16
    • 56349089205 scopus 로고    scopus 로고
    • Adasyn: Adaptive synthetic sampling approach for imbalanced learning
    • June
    • H. He, Y. Bai, E. A. Garcia, and S. Li, "Adasyn: Adaptive synthetic sampling approach for imbalanced learning," in Proc. IEEE Int. Joint Conf. Neural Networks, June 2008, pp. 1322-1328.
    • (2008) Proc. IEEE Int. Joint Conf. Neural Networks , pp. 1322-1328
    • He, H.1    Bai, Y.2    Garcia, E.A.3    Li, S.4
  • 17
    • 27144501672 scopus 로고    scopus 로고
    • Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning
    • Aug
    • H. Han, W. Wang, and B. Mao, "Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning," in Advances in Intelligent Computing, Aug. 2005, pp. 878-887.
    • (2005) Advances in Intelligent Computing , pp. 878-887
    • Han, H.1    Wang, W.2    Mao, B.3
  • 18
    • 67650694660 scopus 로고    scopus 로고
    • Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem
    • Apr
    • C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, "Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem," in Advances in Knowledge Discovery and Data Mining, Apr. 2009, pp. 475-482.
    • (2009) In Advances in Knowledge Discovery and Data Mining , pp. 475-482
    • Bunkhumpornpat, C.1    Sinapiromsaran, K.2    Lursinsap, C.3
  • 20
    • 0015361129 scopus 로고
    • Asymptotic properties of nearest neighbour rules using edited data
    • July
    • D. L. Wilson, "Asymptotic properties of nearest neighbour rules using edited data," IEEE Trans. Syst., Man, Cybern.(1971-1995), vol. 2, no. 3, pp. 408-421, July 1972.
    • (1972) IEEE Trans. Syst., Man, Cybern.(1971-1995) , vol.2 , Issue.3 , pp. 408-421
    • Wilson, D.L.1
  • 22
    • 27144540575 scopus 로고    scopus 로고
    • Class imbalances versus small disjuncts
    • June
    • T. Jo and N. Japkowicz, "Class imbalances versus small disjuncts," ACM SIGKDD Explorations Newslett., vol. 6, no. 1, pp. 40-49, June 2004.
    • (2004) ACM SIGKDD Explorations Newslett. , vol.6 , Issue.1 , pp. 40-49
    • Jo, T.1    Japkowicz, N.2
  • 23
    • 33646107181 scopus 로고    scopus 로고
    • Learning from imbalanced data in surveillance of nosocomial infection
    • May
    • G. Cohen, M. Hilario, H. Sax, S. Hugonnet, and A. Geissbuhler, "Learning from imbalanced data in surveillance of nosocomial infection," Artif. Intell. Med., vol. 37, no. 1, pp. 7-18, May 2006.
    • (2006) Artif. Intell. Med. , vol.37 , Issue.1 , pp. 7-18
    • Cohen, G.1    Hilario, M.2    Sax, H.3    Hugonnet, S.4    Geissbuhler, A.5
  • 24
    • 84891807032 scopus 로고    scopus 로고
    • MWMOTE: Majority weighted minority oversampling technique for imbalanced data set learning
    • Nov.
    • S. Barua, M. M. Islam, X. Yao, and K. Murase, "MWMOTE: Majority weighted minority oversampling technique for imbalanced data set learning," IEEE Trans. Knowl. Data Eng., vol. 26, no. 2, pp. 405-425, Nov. 2014.
    • (2014) IEEE Trans. Knowl. Data Eng. , vol.26 , Issue.2 , pp. 405-425
    • Barua, S.1    Islam, M.M.2    Yao, X.3    Murase, K.4
  • 25
    • 52949093937 scopus 로고    scopus 로고
    • Selective pre-processing of imbalanced data for improving classification performance
    • Sept
    • J. Stefanowski and S. Wilk, "Selective pre-processing of imbalanced data for improving classification performance," Lecture Notes Comput. Sci., vol. 5182, pp. 283-292, Sept. 2008.
    • (2008) Lecture Notes Comput. Sci. , vol.5182 , pp. 283-292
    • Stefanowski, J.1    Wilk, S.2
  • 26
    • 79956276876 scopus 로고    scopus 로고
    • Learning from imbalanced data in presence of noisy and borderline examples
    • New York, NY, USA: Springer, June
    • K. Napiera?a, J. Stefanowski, and S. Wilk, "Learning from imbalanced data in presence of noisy and borderline examples," in Rough Sets and Current Trends in Computing. New York, NY, USA: Springer, June 2010, pp. 158-167.
    • (2010) Rough Sets and Current Trends in Computing , pp. 158-167
    • Napieraa, K.1    Stefanowski, J.2    Wilk, S.3
  • 27
    • 0036522441 scopus 로고    scopus 로고
    • Complexity measures of supervised classification problems
    • Mar
    • T. K. Ho and M. Basu, "Complexity measures of supervised classification problems," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp. 289-300, Mar. 2002.
    • (2002) IEEE Trans. Pattern Anal. Mach. Intell. , vol.24 , Issue.3 , pp. 289-300
    • Ho, T.K.1    Basu, M.2
  • 28
    • 84993982815 scopus 로고    scopus 로고
    • Predicting breast cancer recurrence using machine learning techniques: A systematic
    • Dec.
    • P. H. Abreu, M. S. Santos, M. H. Abreu, B. Andrade, and D. C. Silva, "Predicting breast cancer recurrence using machine learning techniques: A systematic review," ACM Comput. Surv., vol. 49, no. 3, pp. 1-40, Dec. 2016.
    • (2016) ACM Comput. Surv. , vol.49 , Issue.3 , pp. 1-40
    • Abreu, P.H.1    Santos, M.S.2    Abreu, M.H.3    Andrade, B.4    Silva, D.C.5
  • 31
    • 0020083498 scopus 로고
    • The meaning and use of the area under a receiver operating characteristic (ROC) curve
    • Apr
    • J. A. Hanley and B. J. McNeil, "The meaning and use of the area under a receiver operating characteristic (ROC) curve," Radiology, vol. 143, no. 1, pp. 29-36, Apr. 1982.
    • (1982) Radiology , vol.143 , Issue.1 , pp. 29-36
    • Hanley, J.A.1    McNeil, B.J.2
  • 32
    • 84962631241 scopus 로고    scopus 로고
    • Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases
    • Jan
    • O. Loyola-González, J. F. Martínez-Trinidad, J. A. Carrasco-Ochoa, and M. Garciá-Borroto, "Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases," Neurocomputing, vol. 175, pp. 935-947, Jan. 2016.
    • (2016) Neurocomputing , vol.175 , pp. 935-947
    • Loyola-González, O.1    Martínez-Trinidad, J.F.2    Carrasco-Ochoa, J.A.3    Garciá-Borroto, M.4
  • 34
    • 84987968822 scopus 로고    scopus 로고
    • A priori synthetic over-sampling methods for increasing classification sensitivity in imbalanced data sets
    • Dec
    • W. A. Rivera and P. Xanthopoulos, "A priori synthetic over-sampling methods for increasing classification sensitivity in imbalanced data sets," Expert Syst. Appl., vol. 66, pp. 124-135, Dec. 2016.
    • (2016) Expert Syst. Appl. , vol.66 , pp. 124-135
    • Rivera, W.A.1    Xanthopoulos, P.2
  • 35
    • 84979464666 scopus 로고    scopus 로고
    • Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets
    • Sept
    • J. A. Saéz, B. Krawczyk, and M. Wózniak, "Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets," Pattern Recog., vol. 57, pp. 164-178, Sept. 2016.
    • (2016) Pattern Recog. , vol.57 , pp. 164-178
    • Saéz, J.A.1    Krawczyk, B.2    Wózniak, M.3
  • 36
    • 85017142343 scopus 로고    scopus 로고
    • Self-organizing map oversampling (SOMO) for imbalanced data set learning
    • Oct
    • G. Douzas and F. Bacao, "Self-organizing map oversampling (SOMO) for imbalanced data set learning," Expert Syst. Appl., vol. 82, pp. 40-52, Oct. 2017.
    • (2017) Expert Syst. Appl. , vol.82 , pp. 40-52
    • Douzas, G.1    Bacao, F.2
  • 37
    • 85008323075 scopus 로고    scopus 로고
    • Medical decision support system for extremely imbalanced datasets
    • Apr
    • S. Shilaskar, A. Ghatol, and P. Chatur, "Medical decision support system for extremely imbalanced datasets," Inf. Sci., vol. 384, pp. 205-219, Apr. 2017.
    • (2017) Inf. Sci. , vol.384 , pp. 205-219
    • Shilaskar, S.1    Ghatol, A.2    Chatur, P.3
  • 38
    • 84996799416 scopus 로고    scopus 로고
    • A SVM framework for fault detection of the braking system in a high speed train
    • Mar
    • J. Liu, Y. Li, and E. Zio, "A SVM framework for fault detection of the braking system in a high speed train," Mech. Syst. Signal Process., vol. 87, pp. 401-409, Mar. 2017.
    • (2017) Mech. Syst. Signal Process. , vol.87 , pp. 401-409
    • Liu, J.1    Li, Y.2    Zio, E.3
  • 39
    • 79957915328 scopus 로고    scopus 로고
    • Addressing data complexity for imbalanced data sets: Analysis of SMOTE-based oversampling and evolutionary undersampling
    • Oct.
    • J. Luengo, A. Fernández, S. Garciá, and F. Herrera, "Addressing data complexity for imbalanced data sets: Analysis of SMOTE-based oversampling and evolutionary undersampling," Soft Comput., vol. 15, no. 10, pp. 1909-1936, Oct. 2011.
    • (2011) Soft Comput. , vol.15 , Issue.10 , pp. 1909-1936
    • Luengo, J.1    Fernández, A.2    Garciá, S.3    Herrera, F.4
  • 40
    • 84883447718 scopus 로고    scopus 로고
    • An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics
    • Nov
    • V. López, A. Fernández, S. Garciá, V. Palade, and F. Herrera, "An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics," Inf. Sci., vol. 250, pp. 113-141, Nov. 2013.
    • (2013) Inf. Sci. , vol.250 , pp. 113-141
    • López, V.1    Fernández, A.2    Garciá, S.3    Palade, V.4    Herrera, F.5
  • 41
    • 84947934366 scopus 로고    scopus 로고
    • A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients
    • Dec
    • M. S. Santos, P. H. Abreu, P. J. Garciá-Laencina, A. Simao, and A. Carvalho, "A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients," J. Biomed. Inform., vol. 58, pp. 49-59, Dec. 2015.
    • (2015) J. Biomed. Inform. , vol.58 , pp. 49-59
    • Santos, M.S.1    Abreu, P.H.2    Garciá-Laencina, P.J.3    Simao, A.4    Carvalho, A.5
  • 42
    • 84972893020 scopus 로고
    • A dendrite method for cluster analysis
    • June
    • T. Calínski and J. Harabasz, "A dendrite method for cluster analysis," Commun. Stat.-Theory Methods, vol. 3, no. 1, pp. 1-27, June 1974.
    • (1974) Commun. Stat.-Theory Methods , vol.3 , Issue.1 , pp. 1-27
    • Calínski, T.1    Harabasz, J.2
  • 46
    • 84876917722 scopus 로고    scopus 로고
    • Study on the impact of partition-induced dataset shift on k-fold cross-validation
    • June
    • J. G. Moreno-Torres, J. A. Saéz, and F. Herrera, "Study on the impact of partition-induced dataset shift on k-fold cross-validation," IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 8, pp. 1304-1312, June 2012.
    • (2012) IEEE Trans. Neural Netw. Learn. Syst. , vol.23 , Issue.8 , pp. 1304-1312
    • Moreno-Torres, J.G.1    Saéz, J.A.2    Herrera, F.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.