메뉴 건너뛰기




Volumn 291, Issue C, 2015, Pages 184-203

SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering

Author keywords

Borderline examples; Imbalanced classification; Noise filters; Noisy data; SMOTE

Indexed keywords

DATA HANDLING; ITERATIVE METHODS;

EID: 84923328437     PISSN: 00200255     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.ins.2014.08.051     Document Type: Article
Times cited : (526)

References (57)
  • 4
    • 27144531570 scopus 로고    scopus 로고
    • A study of the behavior of several methods for balancing machine learning training data
    • G. Batista, R. Prati, M. Monard, A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explor. Newslett. 6 (2004) 20-29.
    • (2004) ACM SIGKDD Explor. Newslett. , vol.6 , pp. 20-29
    • Batista, G.1    Prati, R.2    Monard, M.3
  • 5
    • 84859001991 scopus 로고    scopus 로고
    • Developing new fitness functions in genetic programming for classification with unbalanced data
    • U. Bhowan, M. Johnston, M. Zhang, Developing new fitness functions in genetic programming for classification with unbalanced data, IEEE Trans. Syst. Man Cybern., Part B: Cybern. 42 (2012) 406-421.
    • (2012) IEEE Trans. Syst. Man Cybern., Part B: Cybern. , vol.42 , pp. 406-421
    • Bhowan, U.1    Johnston, M.2    Zhang, M.3
  • 6
    • 0031191630 scopus 로고    scopus 로고
    • The use of the area under the ROC curve in the evaluation of machine learning algorithms
    • A.P. Bradley, The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recogn. 30 (1997) 1145-1159.
    • (1997) Pattern Recogn. , vol.30 , pp. 1145-1159
    • Bradley, A.P.1
  • 10
    • 50549101751 scopus 로고    scopus 로고
    • Automatically countering imbalance and its empirical relationship to cost
    • N.V. Chawla, D.A. Cieslak, L.O. Hall, A. Joshi, Automatically countering imbalance and its empirical relationship to cost, Data Min. Knowl. Discov. 17 (2008) 225-252.
    • (2008) Data Min. Knowl. Discov. , vol.17 , pp. 225-252
    • Chawla, N.V.1    Cieslak, D.A.2    Hall, L.O.3    Joshi, A.4
  • 11
    • 27144549260 scopus 로고    scopus 로고
    • Editorial: Special issue on learning from imbalanced data sets
    • N.V. Chawla, N. Japkowicz, A. Kotcz, Editorial: special issue on learning from imbalanced data sets, SIGKDD Explor. 6 (2004) 1-6.
    • (2004) SIGKDD Explor. , vol.6 , pp. 1-6
    • Chawla, N.V.1    Japkowicz, N.2    Kotcz, A.3
  • 13
    • 34249753618 scopus 로고
    • Support vector networks
    • C. Cortes, V. Vapnik, Support vector networks, Mach. Learn. 20 (1995) 273-297.
    • (1995) Mach. Learn. , vol.20 , pp. 273-297
    • Cortes, C.1    Vapnik, V.2
  • 14
    • 29644438050 scopus 로고    scopus 로고
    • Statistical comparisons of classifiers over multiple data sets
    • J. Demšar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res. 7 (2006) 1-30.
    • (2006) J. Mach. Learn. Res. , vol.7 , pp. 1-30
    • Demšar, J.1
  • 15
    • 75149159107 scopus 로고    scopus 로고
    • On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets
    • A. Fernández, M.J. del Jesus, F. Herrera, On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets, Inf. Sci. 180 (2010) 1268-1291.
    • (2010) Inf. Sci. , vol.180 , pp. 1268-1291
    • Fernández, A.1    Del Jesus, M.J.2    Herrera, F.3
  • 18
    • 0034143132 scopus 로고    scopus 로고
    • Noise detection and elimination in data preprocessing: Experiments in medical domains
    • D. Gamberger, N. Lavrac, S. Dzeroski, Noise detection and elimination in data preprocessing: experiments in medical domains, Appl. Artif. Intell. 14 (2000) 205-223.
    • (2000) Appl. Artif. Intell. , vol.14 , pp. 205-223
    • Gamberger, D.1    Lavrac, N.2    Dzeroski, S.3
  • 19
    • 77549084648 scopus 로고    scopus 로고
    • Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power
    • S. García, A. Fernández, J. Luengo, F. Herrera, Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power, Inf. Sci. 180 (2010) 2044-2064.
    • (2010) Inf. Sci. , vol.180 , pp. 2044-2064
    • García, S.1    Fernández, A.2    Luengo, J.3    Herrera, F.4
  • 21
    • 50549093573 scopus 로고    scopus 로고
    • On the k-NN performance in a challenging scenario of imbalance and overlapping
    • V. García, R. Mollineda, J. Sánchez, On the k-NN performance in a challenging scenario of imbalance and overlapping, Pattern Anal. Appl. 11 (2008) 269-280.
    • (2008) Pattern Anal. Appl. , vol.11 , pp. 269-280
    • García, V.1    Mollineda, R.2    Sánchez, J.3
  • 22
    • 38449101377 scopus 로고    scopus 로고
    • An empirical study of the behavior of classifiers on imbalanced and overlapped data sets
    • L. Rueda, D. Mery, J. Kittler (Eds.) , Springer, Heidelberg
    • V. García, J. Sánchez, R. Mollineda, An empirical study of the behavior of classifiers on imbalanced and overlapped data sets, in: L. Rueda, D. Mery, J. Kittler (Eds.), CIARP 2007, LNCS, vol. 4756, Springer, Heidelberg, 2007, pp. 397-406.
    • (2007) CIARP 2007, LNCS , vol.4756 , pp. 397-406
    • García, V.1    Sánchez, J.2    Mollineda, R.3
  • 25
    • 33645762226 scopus 로고
    • A sharper Bonferroni procedure for multiple tests of significance
    • Y. Hochberg, A sharper Bonferroni procedure for multiple tests of significance, Biometrika 75 (1988) 800-803.
    • (1988) Biometrika , vol.75 , pp. 800-803
    • Hochberg, Y.1
  • 26
    • 0001589146 scopus 로고
    • Ranks methods for combination of independent experiments in analysis of variance
    • J. Hodges, E. Lehmann, Ranks methods for combination of independent experiments in analysis of variance, Ann. Math. Stat. 33 (1962) 482-497.
    • (1962) Ann. Math. Stat. , vol.33 , pp. 482-497
    • Hodges, J.1    Lehmann, E.2
  • 28
    • 9444231883 scopus 로고    scopus 로고
    • Class imbalance: Are we focusing on the right issue
    • Morgan Kaufmann Publishers Inc
    • N. Japkowicz, Class imbalance: are we focusing on the right issue, in: II Workshop on Learning from Imbalanced Data Sets, ICML, Morgan Kaufmann Publishers Inc, 2003, pp. 17-23.
    • (2003) II Workshop on Learning from Imbalanced Data Sets, ICML , pp. 17-23
    • Japkowicz, N.1
  • 29
    • 27144540575 scopus 로고    scopus 로고
    • Class Imbalances versus small disjuncts
    • T. Jo, N. Japkowicz, Class Imbalances versus small disjuncts, SIGKDD Explor. 6 (2004) 40-49.
    • (2004) SIGKDD Explor. , vol.6 , pp. 40-49
    • Jo, T.1    Japkowicz, N.2
  • 30
    • 61449189918 scopus 로고    scopus 로고
    • The effect of borderline examples on language learning
    • K.L. Kermanidis, The effect of borderline examples on language learning, J. Exp. Theor. Artif. Intell. 21 (2009) 19-42.
    • (2009) J. Exp. Theor. Artif. Intell. , vol.21 , pp. 19-42
    • Kermanidis, K.L.1
  • 31
    • 84862131568 scopus 로고    scopus 로고
    • A cascaded classifier approach for improving detection rates on rare attack categories in network intrusion detection
    • K.C. Khor, C.Y. Ting, S. Phon-Amnuaisuk, A cascaded classifier approach for improving detection rates on rare attack categories in network intrusion detection, Appl. Intell. 36 (2012) 320-329.
    • (2012) Appl. Intell. , vol.36 , pp. 320-329
    • Khor, K.C.1    Ting, C.Y.2    Phon-Amnuaisuk, S.3
  • 32
    • 34249859248 scopus 로고    scopus 로고
    • Improving software quality prediction by noise filtering techniques
    • T.M. Khoshgoftaar, P. Rebours, Improving software quality prediction by noise filtering techniques, J. Comput. Sci. Technol. 22 (2007) 387-396.
    • (2007) J. Comput. Sci. Technol. , vol.22 , pp. 387-396
    • Khoshgoftaar, T.M.1    Rebours, P.2
  • 34
    • 24144490154 scopus 로고    scopus 로고
    • Diversity in multiple classifier systems
    • L.I. Kuncheva, Diversity in multiple classifier systems, Inf. Fus. 6 (2005) 3-4.
    • (2005) Inf. Fus. , vol.6 , pp. 3-4
    • Kuncheva, L.I.1
  • 35
    • 84888645340 scopus 로고    scopus 로고
    • On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed
    • V. López, A. Fernández, F. Herrera, On the importance of the validation technique for classification with imbalanced datasets: addressing covariate shift when data is skewed, Inf. Sci. 257 (2014) 1-13.
    • (2014) Inf. Sci. , vol.257 , pp. 1-13
    • López, V.1    Fernández, A.2    Herrera, F.3
  • 36
    • 84887616457 scopus 로고    scopus 로고
    • Addressing imbalanced classification with instance generation techniques: IPADE-ID
    • V. López, I. Triguero, C.J. Carmona, S. García, F. Herrera, Addressing imbalanced classification with instance generation techniques: IPADE-ID, Neurocomputing 126 (2014) 15-28.
    • (2014) Neurocomputing , vol.126 , pp. 15-28
    • López, V.1    Triguero, I.2    Carmona, C.J.3    García, S.4    Herrera, F.5
  • 38
    • 58849098571 scopus 로고    scopus 로고
    • A semi-deterministic ensemble strategy for imbalanced datasets (SDEID) applied to bankruptcy prediction
    • R. Mathiasi Horta, B. Pires De Lima, C. Borges, A semi-deterministic ensemble strategy for imbalanced datasets (SDEID) applied to bankruptcy prediction, WIT Trans. Inf. Commun. Technol. 40 (2008) 205-213.
    • (2008) WIT Trans. Inf. Commun. Technol. , vol.40 , pp. 205-213
    • Mathiasi Horta, R.1    Lima De B.Pires2    Borges, C.3
  • 42
    • 84880920627 scopus 로고    scopus 로고
    • Tackling the problem of classification with noisy data using multiple classifier systems: Analysis of the performance and robustness
    • J.A. Sáez, M. Galar, J. Luengo, F. Herrera, Tackling the problem of classification with noisy data using multiple classifier systems: analysis of the performance and robustness, Inf. Sci. 247 (2013) 1-20.
    • (2013) Inf. Sci. , vol.247 , pp. 1-20
    • Sáez, J.A.1    Galar, M.2    Luengo, J.3    Herrera, F.4
  • 43
    • 84884965072 scopus 로고    scopus 로고
    • Analyzing the presence of noise in multi-class problems: Alleviating its influence with the one-vs-one decomposition
    • J.A. Sáez, M. Galar, J. Luengo, F. Herrera, Analyzing the presence of noise in multi-class problems: alleviating its influence with the one-vs-one decomposition, Knowl. Inf. Syst. 38 (2014) 179-206.
    • (2014) Knowl. Inf. Syst. , vol.38 , pp. 179-206
    • Sáez, J.A.1    Galar, M.2    Luengo, J.3    Herrera, F.4
  • 44
    • 84866043469 scopus 로고    scopus 로고
    • Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification
    • J.A. Sáez, J. Luengo, F. Herrera, Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification, Pattern Recogn. 46 (2013) 355-364.
    • (2013) Pattern Recogn. , vol.46 , pp. 355-364
    • Sáez, J.A.1    Luengo, J.2    Herrera, F.3
  • 45
    • 0036948440 scopus 로고    scopus 로고
    • Application of rule induction and rough sets to verification of magnetic resonance diagnosis
    • K. Slowin ski, J. Stefanowski, D. Siwin ski, Application of rule induction and rough sets to verification of magnetic resonance diagnosis, Fund. Inform. 53 (2002) 345-363.
    • (2002) Fund. Inform. , vol.53 , pp. 345-363
    • Slowin Ski, K.1    Stefanowski, J.2    Siwin Ski, D.3
  • 46
    • 84879318547 scopus 로고    scopus 로고
    • Overlapping rare examples and class decomposition in learning classifiers from imbalanced data
    • S. Ramanna, L.C. Jain, R.J. Howlett (Eds.), Springer, Berlin, Heidelberg
    • J. Stefanowski, Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data, in: S. Ramanna, L.C. Jain, R.J. Howlett (Eds.), Emerging Paradigms in Machine Learning, Smart Innovation, Systems and Technologies, vol. 13, Springer, Berlin, Heidelberg, 2013, pp. 277-306.
    • (2013) Emerging Paradigms in Machine Learning, Smart Innovation, Systems and Technologies , vol.13 , pp. 277-306
    • Stefanowski, J.1
  • 47
    • 52949093937 scopus 로고    scopus 로고
    • Selective pre-processing of imbalanced data for improving classification performance
    • I.Y. Song, J. Eder, T. Nguyen (Eds.), Springer, Berlin/Heidelberg
    • J. Stefanowski, S. Wilk, Selective pre-processing of imbalanced data for improving classification performance, in: I.Y. Song, J. Eder, T. Nguyen (Eds.), Data Warehousing and Knowledge Discovery, Lecture Notes in Computer Science, vol. 5182, Springer, Berlin/Heidelberg, 2008, pp. 283-292.
    • (2008) Data Warehousing and Knowledge Discovery, Lecture Notes in Computer Science , vol.5182 , pp. 283-292
    • Stefanowski, J.1    Wilk, S.2
  • 48
    • 34648841070 scopus 로고    scopus 로고
    • An evaluation of the robustness of MTS for imbalanced data
    • C.T. Su, Y.H. Hsiao, An evaluation of the robustness of MTS for imbalanced data, IEEE Trans. Knowl. Data Eng. 19 (2007) 1321-1332.
    • (2007) IEEE Trans. Knowl. Data Eng. , vol.19 , pp. 1321-1332
    • Su, C.T.1    Hsiao, Y.H.2
  • 49
    • 70350565063 scopus 로고    scopus 로고
    • On strategies for imbalanced text classification using SVM: A comparative study
    • A. Sun, E.P. Lim, Y. Liu, On strategies for imbalanced text classification using SVM: a comparative study, Decis. Support Syst. 48 (2009) 191-201.
    • (2009) Decis. Support Syst. , vol.48 , pp. 191-201
    • Sun, A.1    Lim, E.P.2    Liu, Y.3
  • 50
    • 34547673383 scopus 로고    scopus 로고
    • Cost-sensitive boosting for classification of imbalanced data
    • Y. Sun, M.S. Kamel, A.K. Wong, Y. Wang, Cost-sensitive boosting for classification of imbalanced data, Pattern Recogn. 40 (2007) 3358-3378.
    • (2007) Pattern Recogn. , vol.40 , pp. 3358-3378
    • Sun, Y.1    Kamel, M.S.2    Wong, A.K.3    Wang, Y.4
  • 52
    • 71549121041 scopus 로고    scopus 로고
    • Parasite detection and identification for automated thin blood film malaria diagnosis
    • F.B. T.k, A.G. Dempster, I. Kale, Parasite detection and identification for automated thin blood film malaria diagnosis, Comput. Vis. Image Understand. 114 (2010) 21-32.
    • (2010) Comput. Vis. Image Understand. , vol.114 , pp. 21-32
    • Tek, F.B.1    Dempster, A.G.2    Kale, I.3
  • 55
    • 0015361129 scopus 로고
    • Asymptotic properties of nearest neighbor rules using edited data
    • D.L. Wilson, Asymptotic properties of nearest neighbor rules using edited data, IEEE Trans. Syst. Man Cybern. 2 (1972) 408-421.
    • (1972) IEEE Trans. Syst. Man Cybern. , vol.2 , pp. 408-421
    • Wilson, D.L.1
  • 57
    • 19544372918 scopus 로고    scopus 로고
    • Class noise vs. Attribute noise: A quantitative study
    • X. Zhu, X. Wu, Class noise vs. attribute noise: a quantitative study, Artif. Intell. Rev. 22 (2004) 177-210.
    • (2004) Artif. Intell. Rev. , vol.22 , pp. 177-210
    • Zhu, X.1    Wu, X.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.