메뉴 건너뛰기




Volumn 465, Issue , 2018, Pages 1-20

Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE

Author keywords

Class imbalanced learning; Classification; Clustering; Oversampling; Supervised learning; Within class imbalance

Indexed keywords

CLASSIFICATION (OF INFORMATION); PROBLEM ORIENTED LANGUAGES; SUPERVISED LEARNING;

EID: 85049450664     PISSN: 00200255     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.ins.2018.06.056     Document Type: Article
Times cited : (767)

References (49)
  • 2
    • 27144531570 scopus 로고    scopus 로고
    • A study of the behavior of several methods for balancing machine learning training data
    • Batista, G.E.A.P.A., Prati, R.C., Monard, M.C., A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6:1 (2004), 20–29, 10.1145/1007730.1007735.
    • (2004) ACM SIGKDD Explor. Newslett. , vol.6 , Issue.1 , pp. 20-29
    • Batista, G.E.A.P.A.1    Prati, R.C.2    Monard, M.C.3
  • 5
    • 50549101751 scopus 로고    scopus 로고
    • Automatically countering imbalance and its empirical relationship to cost
    • Chawla, N.V., Cieslak, D.A., Hall, L.O., Joshi, A., Automatically countering imbalance and its empirical relationship to cost. Data Min. Knowl. Discov. 17:2 (2008), 225–252, 10.1007/s10618-008-0087-0.
    • (2008) Data Min. Knowl. Discov. , vol.17 , Issue.2 , pp. 225-252
    • Chawla, N.V.1    Cieslak, D.A.2    Hall, L.O.3    Joshi, A.4
  • 6
    • 27144549260 scopus 로고    scopus 로고
    • Editorial: special issue on learning from imbalanced data sets
    • Chawla, N.V., Japkowicz, N., Drive, P., Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newslett. 6:1 (2004), 1–6, 10.1145/1007730.1007733.
    • (2004) ACM SIGKDD Explor. Newslett. , vol.6 , Issue.1 , pp. 1-6
    • Chawla, N.V.1    Japkowicz, N.2    Drive, P.3
  • 8
    • 84856621489 scopus 로고    scopus 로고
    • Hellinger distance decision trees are robust and skew-insensitive
    • Cieslak, D.A., Hoens, T.R., Chawla, N.V., Kegelmeyer, W.P., Hellinger distance decision trees are robust and skew-insensitive. Data Min. Knowl. Discov. 24:1 (2012), 136–158, 10.1007/s10618-011-0222-1.
    • (2012) Data Min. Knowl. Discov. , vol.24 , Issue.1 , pp. 136-158
    • Cieslak, D.A.1    Hoens, T.R.2    Chawla, N.V.3    Kegelmeyer, W.P.4
  • 9
    • 29644438050 scopus 로고    scopus 로고
    • Statistical comparisons of classifiers over multiple data sets
    • Demšar, J., Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 (2006), 1–30.
    • (2006) J. Mach. Learn. Res. , vol.7 , pp. 1-30
    • Demšar, J.1
  • 11
    • 85017142343 scopus 로고    scopus 로고
    • Self-organizing map oversampling (SOMO) for imbalanced data set learning
    • Douzas, G., Bacao, F., Self-organizing map oversampling (SOMO) for imbalanced data set learning. Expert Syst. Appl. 82 (2017), 40–52, 10.1016/j.eswa.2017.03.073.
    • (2017) Expert Syst. Appl. , vol.82 , pp. 40-52
    • Douzas, G.1    Bacao, F.2
  • 12
    • 84941559528 scopus 로고    scopus 로고
    • Diversity techniques improve the performance of the best imbalance learning ensembles
    • Díez-Pastor, J.F., Rodrí-guez, J.J., García-Osorio, C.I., Kuncheva, L.I., Diversity techniques improve the performance of the best imbalance learning ensembles. Inf. Sci. (Ny) 325 (2015), 98–117, 10.1016/j.ins.2015.07.025.
    • (2015) Inf. Sci. (Ny) , vol.325 , pp. 98-117
    • Díez-Pastor, J.F.1    Rodrí-guez, J.J.2    García-Osorio, C.I.3    Kuncheva, L.I.4
  • 13
    • 1442356040 scopus 로고    scopus 로고
    • A multiple resampling method for learning from imbalanced data sets
    • Estabrooks, A., Jo, T., Japkowicz, N., A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20:1 (2004), 18–36, 10.1111/j.0824-7935.2004.t01-1-00228.x.
    • (2004) Comput. Intell. , vol.20 , Issue.1 , pp. 18-36
    • Estabrooks, A.1    Jo, T.2    Japkowicz, N.3
  • 14
    • 84874667219 scopus 로고    scopus 로고
    • Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches
    • Fernández, A., López, V., Galar, M., Del Jesus, M.J., Herrera, F., Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl. Based Syst. 42 (2013), 97–110, 10.1016/j.knosys.2013.01.018.
    • (2013) Knowl. Based Syst. , vol.42 , pp. 97-110
    • Fernández, A.1    López, V.2    Galar, M.3    Del Jesus, M.J.4    Herrera, F.5
  • 15
    • 85049479610 scopus 로고
    • Discriminatory analysis - nonparametric discrimination: Consistency properties. California Univ Berkeley.
    • E. Fix, J.L. Hodges Jr., Discriminatory analysis - nonparametric discrimination: Consistency properties. California Univ Berkeley, 1951.
    • (1951)
    • Fix, E.1    Hodges, J.L.2
  • 16
    • 0035470889 scopus 로고    scopus 로고
    • Greedy function approximation: a gradient boosting machine
    • Friedman, J.H., Greedy function approximation: a gradient boosting machine. Ann. Stat. 29:5 (2001), 1189–1232, 10.1214/aos/1013203451.
    • (2001) Ann. Stat. , vol.29 , Issue.5 , pp. 1189-1232
    • Friedman, J.H.1
  • 17
    • 84944811700 scopus 로고
    • The use of ranks to avoid the assumption of normality implicit in the analysis of variance
    • Friedman, M., The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc., 32(200), 1937, 675, 10.2307/2279372.
    • (1937) J. Am. Stat. Assoc. , vol.32 , Issue.200 , pp. 675
    • Friedman, M.1
  • 18
    • 84862515469 scopus 로고    scopus 로고
    • A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. doi
    • M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera, A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, 2012. doi: 10.1109/TSMCC.2011.2161285.
    • (2012)
    • Galar, M.1    Fernandez, A.2    Barrenechea, E.3    Bustince, H.4    Herrera, F.5
  • 19
    • 84962359556 scopus 로고    scopus 로고
    • Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets
    • Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F., Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets. Inf. Sci. (Ny) 354 (2016), 178–196, 10.1016/j.ins.2016.02.056.
    • (2016) Inf. Sci. (Ny) , vol.354 , pp. 178-196
    • Galar, M.1    Fernandez, A.2    Barrenechea, E.3    Bustince, H.4    Herrera, F.5
  • 20
    • 85049478221 scopus 로고    scopus 로고
    • Design of experiments of the NIPS 2003 variable selection benchmark.
    • I. Guyon, Design of experiments of the NIPS 2003 variable selection benchmark, 2003.
    • (2003)
    • Guyon, I.1
  • 21
    • 27144501672 scopus 로고    scopus 로고
    • Borderline-smote: a new over-sampling method in imbalanced data sets learning
    • Han, H., Wang, W.-Y., Mao, B.-H., Borderline-smote: a new over-sampling method in imbalanced data sets learning. Adv. Intell. Comput. 17:12 (2005), 878–887, 10.1007/11538059_91.
    • (2005) Adv. Intell. Comput. , vol.17 , Issue.12 , pp. 878-887
    • Han, H.1    Wang, W.-Y.2    Mao, B.-H.3
  • 22
    • 1642380461 scopus 로고    scopus 로고
    • The problem of overfitting
    • Hawkins, D.M., The problem of overfitting. J. Chem. Inf. Comput. Sci. 44:1 (2004), 1–12, 10.1002/chin.200419274.
    • (2004) J. Chem. Inf. Comput. Sci. , vol.44 , Issue.1 , pp. 1-12
    • Hawkins, D.M.1
  • 23
    • 68549133155 scopus 로고    scopus 로고
    • Learning from imbalanced data
    • He, H., Garcia, E.A., Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21:9 (2009), 1263–1284, 10.1109/TKDE.2008.239.
    • (2009) IEEE Trans. Knowl. Data Eng. , vol.21 , Issue.9 , pp. 1263-1284
    • He, H.1    Garcia, E.A.2
  • 24
    • 0002294347 scopus 로고
    • A simple sequentially rejective multiple test procedure
    • Holm, S., A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6:2 (1979), 65–70.
    • (1979) Scand. J. Stat. , vol.6 , Issue.2 , pp. 65-70
    • Holm, S.1
  • 25
    • 0002448383 scopus 로고
    • Concept learning and the problem of small disjuncts
    • Holte, R.C., Acker, L., Porter, B.W., et al. Concept learning and the problem of small disjuncts. Proceedings of the IJCAI, vol. 89, 1989, 813–818.
    • (1989) Proceedings of the IJCAI , vol.89 , pp. 813-818
    • Holte, R.C.1    Acker, L.2    Porter, B.W.3
  • 26
    • 85076269272 scopus 로고    scopus 로고
    • Assessment metrics for imbalanced learning
    • H. He Y. Ma John Wiley & Sons
    • Japkowicz, N., Assessment metrics for imbalanced learning. He, H., Ma, Y., (eds.) Imbalanced Learning, 2013, John Wiley & Sons, 187–206, 10.1002/9781118646106.ch8.
    • (2013) Imbalanced Learning , pp. 187-206
    • Japkowicz, N.1
  • 27
    • 27144540575 scopus 로고    scopus 로고
    • Class imbalances versus small disjuncts
    • Jo, T., Japkowicz, N., Class imbalances versus small disjuncts. ACM SIGKDD Explor. Newslett. 6:1 (2004), 40–49.
    • (2004) ACM SIGKDD Explor. Newslett. , vol.6 , Issue.1 , pp. 40-49
    • Jo, T.1    Japkowicz, N.2
  • 28
    • 35348935140 scopus 로고    scopus 로고
    • Handling imbalanced datasets: a review
    • Kotsiantis, S., Kanellopoulos, D., Pintelas, P., Handling imbalanced datasets: a review. Science 30:1 (2006), 25–36, 10.1007/978-0-387-09823-4_45.
    • (2006) Science , vol.30 , Issue.1 , pp. 25-36
    • Kotsiantis, S.1    Kanellopoulos, D.2    Pintelas, P.3
  • 29
    • 85049458647 scopus 로고    scopus 로고
    • Robustness of learning techniques in handling class noise in imbalanced datasets, doi
    • S. Kotsiantis, P. Pintelas, D. Anyfantis, M. Karagiannopoulos, Robustness of learning techniques in handling class noise in imbalanced datasets, 2007, doi: 10.1007/978-0-387-74161-1_3.
    • (2007)
    • Kotsiantis, S.1    Pintelas, P.2    Anyfantis, D.3    Karagiannopoulos, M.4
  • 30
    • 85016274615 scopus 로고    scopus 로고
    • Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning
    • Lemaître, G., Nogueira, F., Aridas, C.K., Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18:17 (2017), 1–5.
    • (2017) J. Mach. Learn. Res. , vol.18 , Issue.17 , pp. 1-5
    • Lemaître, G.1    Nogueira, F.2    Aridas, C.K.3
  • 31
    • 85049446968 scopus 로고    scopus 로고
    • Uci machine learning repository.
    • M. Lichman, Uci machine learning repository, 2013.
    • (2013)
    • Lichman, M.1
  • 32
    • 85019061365 scopus 로고    scopus 로고
    • Clustering-based undersampling in class-imbalanced data
    • Lin, W.-C., Tsai, C.-F., Hu, Y.-H., Jhang, J.-S., Clustering-based undersampling in class-imbalanced data. Inf. Sci. (Ny) 409–410 (2017), 17–26, 10.1016/j.ins.2017.05.008.
    • (2017) Inf. Sci. (Ny) , vol.409-410 , pp. 17-26
    • Lin, W.-C.1    Tsai, C.-F.2    Hu, Y.-H.3    Jhang, J.-S.4
  • 33
    • 85015659687 scopus 로고    scopus 로고
    • Cure-smote algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests
    • Ma, L., Fan, S., Cure-smote algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinf., 18(1), 2017, 169, 10.1186/s12859-017-1578-z.
    • (2017) BMC Bioinf. , vol.18 , Issue.1 , pp. 169
    • Ma, L.1    Fan, S.2
  • 35
    • 0008680770 scopus 로고
    • Generalized linear models
    • McCullagh, P., Generalized linear models. Eur. J. Oper. Res. 16:3 (1984), 285–292, 10.1016/0377-2217(84)90282-0.
    • (1984) Eur. J. Oper. Res. , vol.16 , Issue.3 , pp. 285-292
    • McCullagh, P.1
  • 36
    • 84947569019 scopus 로고    scopus 로고
    • Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets
    • Nekooeimehr, I., Lai-Yuen, S.K., Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Syst. Appl. 46 (2016), 405–416, 10.1016/j.eswa.2015.10.031.
    • (2016) Expert Syst. Appl. , vol.46 , pp. 405-416
    • Nekooeimehr, I.1    Lai-Yuen, S.K.2
  • 37
    • 8344227981 scopus 로고    scopus 로고
    • Using unsupervised learning to guide resampling in imbalanced data sets
    • Nickerson, A., Japkowicz, N., Milios, E.E., Using unsupervised learning to guide resampling in imbalanced data sets. Proceedings of the AISTATS, 2001, 261–265.
    • (2001) Proceedings of the AISTATS , pp. 261-265
    • Nickerson, A.1    Japkowicz, N.2    Milios, E.E.3
  • 39
    • 35048878309 scopus 로고    scopus 로고
    • Learning with class skews and small disjuncts
    • Prati, R.C., Batista, G., Monard, M.C., Learning with class skews and small disjuncts. Proceedings of the SBIA, 2004, 296–306, 10.1007/978-3-540-28645-5_30.
    • (2004) Proceedings of the SBIA , pp. 296-306
    • Prati, R.C.1    Batista, G.2    Monard, M.C.3
  • 41
    • 85018316823 scopus 로고    scopus 로고
    • Noise reduction a priori synthetic over-sampling for class imbalanced data sets
    • Rivera, W.A., Noise reduction a priori synthetic over-sampling for class imbalanced data sets. Inf. Sci. (Ny) 408 (2017), 146–161, 10.1016/j.ins.2017.04.046.
    • (2017) Inf. Sci. (Ny) , vol.408 , pp. 146-161
    • Rivera, W.A.1
  • 42
    • 84947934366 scopus 로고    scopus 로고
    • A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients
    • Santos, M.S., Abreu, P.H., García-Laencina, P.J., Simão, A., Carvalho, A., A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. J. Biomed. Inf. 58 (2015), 49–59, 10.1016/j.jbi.2015.09.012.
    • (2015) J. Biomed. Inf. , vol.58 , pp. 49-59
    • Santos, M.S.1    Abreu, P.H.2    García-Laencina, P.J.3    Simão, A.4    Carvalho, A.5
  • 45
    • 84923328437 scopus 로고    scopus 로고
    • SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering
    • Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F., SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. (Ny) 291 (2015), 184–203, 10.1016/j.ins.2014.08.051.
    • (2015) Inf. Sci. (Ny) , vol.291 , pp. 184-203
    • Sáez, J.A.1    Luengo, J.2    Stefanowski, J.3    Herrera, F.4
  • 46
    • 85021059612 scopus 로고    scopus 로고
    • Imbalanced classification in sparse and large behaviour datasets
    • Vanhoeyveld, J., Martens, D., Imbalanced classification in sparse and large behaviour datasets. Data Min. Knowl. Discov., 2017, 1–58, 10.1007/s10618-017-0517-y.
    • (2017) Data Min. Knowl. Discov. , pp. 1-58
    • Vanhoeyveld, J.1    Martens, D.2
  • 47
    • 50549087624 scopus 로고    scopus 로고
    • Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs?
    • Weiss, G.M., McCarthy, K., Zabar, B., Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs?. DMIN 7 (2007), 35–41.
    • (2007) DMIN , vol.7 , pp. 35-41
    • Weiss, G.M.1    McCarthy, K.2    Zabar, B.3
  • 48
    • 77649273505 scopus 로고    scopus 로고
    • COG: local decomposition for rare class analysis
    • Wu, J., Xiong, H., Chen, J., COG: local decomposition for rare class analysis. Data Min. Knowl. Discov. 20:2 (2010), 191–220, 10.1007/s10618-009-0146-1.
    • (2010) Data Min. Knowl. Discov. , vol.20 , Issue.2 , pp. 191-220
    • Wu, J.1    Xiong, H.2    Chen, J.3
  • 49
    • 85018384906 scopus 로고    scopus 로고
    • An empirical comparison of techniques for the class imbalance problem in churn prediction
    • Zhu, B., Baesens, B., vanden Broucke, S.K.L.M., An empirical comparison of techniques for the class imbalance problem in churn prediction. Inf. Sci. (Ny) 408 (2017), 84–99, 10.1016/j.ins.2017.04.015.
    • (2017) Inf. Sci. (Ny) , vol.408 , pp. 84-99
    • Zhu, B.1    Baesens, B.2    vanden Broucke, S.K.L.M.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.