메뉴 건너뛰기




Volumn 28, Issue 1, 2014, Pages 92-122

Training and assessing classification rules with imbalanced data

Author keywords

Accuracy; Binary classification; Bootstrap; Imbalanced learning; Kernel density estimation

Indexed keywords

DATA MINING;

EID: 84891860723     PISSN: 13845810     EISSN: None     Source Type: Journal    
DOI: 10.1007/s10618-012-0295-5     Document Type: Article
Times cited : (541)

References (61)
  • 2
    • 36948999941 scopus 로고    scopus 로고
    • University of California, School of Inf. and Comput. Sci., Irvine
    • Asuncion A, Newman DJ (2007) UCI machine learning repository http://www.ics.uci.edu/~mlearn/MLRepository.html. University of California, School of Inf. and Comput. Sci., Irvine
    • (2007) UCI Machine Learning Repository
    • Asuncion, A.1    Newman, D.J.2
  • 3
    • 0036522693 scopus 로고    scopus 로고
    • Strategies for learning in class imbalance problems
    • Barandela R, SÃnchez JS, GarcÃá1a V, Rangel E (2003) Strategies for learning in class imbalance problems. Patt Recognit 36: 849-851
    • (2003) Patt Recognit , vol.36 , pp. 849-851
    • Barandela R, S.1
  • 4
    • 27144531570 scopus 로고    scopus 로고
    • A study of the behaviour of several methods for balancing machine learning training data
    • 10.1145/1007730.1007735
    • Batista G, Prati R, Monard M (2004) A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explor 6(1): 20-29
    • (2004) SIGKDD Explor , vol.6 , Issue.1 , pp. 20-29
    • Batista, G.1    Prati, R.2    Monard, M.3
  • 5
    • 77953089698 scopus 로고    scopus 로고
    • FSVM-CIL: Fuzzy support vector machines for class imbalance learning
    • 10.1109/TFUZZ.2010.2042721
    • Batuwita R, Palade V (2010) FSVM-CIL: fuzzy support vector machines for class imbalance learning. IEEE Trans Fuzzy Syst 18(3): 558-571
    • (2010) IEEE Trans Fuzzy Syst , vol.18 , Issue.3 , pp. 558-571
    • Batuwita, R.1    Palade, V.2
  • 7
    • 0030211964 scopus 로고    scopus 로고
    • Bagging predictors
    • Breiman L (1996) Bagging predictors. Mach Learn 24: 123-140 (Pubitemid 126724382)
    • (1996) Machine Learning , vol.24 , Issue.2 , pp. 123-140
    • Breiman, L.1
  • 9
    • 58349098976 scopus 로고    scopus 로고
    • Handling class imbalance in customer churn prediction
    • 10.1016/j.eswa.2008.05.027
    • Burez J, Vanden Poel D (2009) Handling class imbalance in customer churn prediction. Expert Syst Appl 36: 4626-4636
    • (2009) Expert Syst Appl , vol.36 , pp. 4626-4636
    • Burez, J.1    Vanden Poel, D.2
  • 10
    • 33748943134 scopus 로고    scopus 로고
    • C4.5 and imbalanced data sets: Investigating the effect of sampling method, probabilistic estimate, and decision tree structure
    • Chawla NV (2003) C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. Proceedings of the ICML'03 Workshop on Class Imbalances
    • (2003) Proceedings of the ICML'03 Workshop on Class Imbalances
    • Chawla, N.V.1
  • 12
    • 0022251130 scopus 로고
    • Application of bootstrap and other resampling methods: Evaluation of classifier performance
    • 10.1016/0167-8655(85)90049-2
    • Chernick M, Murthy V, Nealy C (1985) Application of bootstrap and other resampling methods: evaluation of classifier performance. Pattern Recogn Lett 3: 167-178
    • (1985) Pattern Recogn Lett , vol.3 , pp. 167-178
    • Chernick, M.1    Murthy, V.2    Nealy, C.3
  • 13
    • 56049126929 scopus 로고    scopus 로고
    • Learning decision trees for unbalanced data
    • 10.1007/978-3-540-87479-9-34
    • Cieslak D, Chawla N (2008) Learning decision trees for unbalanced data. Lect. Notes in Comput. Sci. 5211: 241-256
    • (2008) Lect. Notes in Comput. Sci. , vol.5211 , pp. 241-256
    • Cieslak, D.1    Chawla, N.2
  • 14
    • 0002430993 scopus 로고    scopus 로고
    • Predictive performance of binary logit models in unbalanced samples
    • Cramer JS (1999) Predictive performance of binary logit models in unbalanced samples. The Statistician 48: 85-94
    • (1999) The Statistician , vol.48 , pp. 85-94
    • Cramer, J.S.1
  • 15
  • 16
    • 29644438050 scopus 로고    scopus 로고
    • Statistical comparisons of classifiers over multiple data sets
    • Demsar J (2006) Statistical comparison of classifiers over multiple data sets. J Mach Learn Res 7(7): 1-30 (Pubitemid 43022939)
    • (2006) Journal of Machine Learning Research , vol.7 , pp. 1-30
    • Demsar, J.1
  • 17
    • 33748991193 scopus 로고    scopus 로고
    • Cost curves: An improved method for visualizing classifier performance
    • DOI 10.1007/s10994-006-8199-5
    • Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1): 95-130 (Pubitemid 44451195)
    • (2006) Machine Learning , vol.65 , Issue.1 , pp. 95-130
    • Drummond, C.1    Holte, R.C.2
  • 19
    • 33846864233 scopus 로고    scopus 로고
    • Classification of highly unbalanced CYP450 data of drugs using cost sensitive machine learning techniques
    • DOI 10.1021/ci6002619
    • Eitrich T, Kless A, Druska C, Meyer W, Grotendorst J (2007) Classification of highly unbalanced CYP450 data of drugs using cost sensitive mach learning techniques. J Chem Inform Model 47(1): 92-103 (Pubitemid 46225564)
    • (2007) Journal of Chemical Information and Modeling , vol.47 , Issue.1 , pp. 92-103
    • Eitrich, T.1    Kless, A.2    Druska, C.3    Meyer, W.4    Grotendorst, J.5
  • 20
    • 1442356040 scopus 로고    scopus 로고
    • A multiple resampling method for learning form imbalanced data sets
    • 10.1111/j.0824-7935.2004.t01-1-00228.x
    • Estabrooks A, Taeho J, Japkovicz N (2004) A multiple resampling method for learning form imbalanced data sets. Comput Intell 20: 18-36
    • (2004) Comput Intell , vol.20 , pp. 18-36
    • Estabrooks, A.1    Taeho, J.2    Japkovicz, N.3
  • 21
    • 84862515469 scopus 로고    scopus 로고
    • A review on ensembles for the class imbalance problem: Bagging, boosting, and hybrid-based approaches
    • 10.1109/TSMCC.2011.2179028
    • Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging, boosting, and hybrid-based approaches. IEEE Trans Syst, Man, Cybern, C 42: 463-484
    • (2012) IEEE Trans Syst, Man, Cybern, C , vol.42 , pp. 463-484
    • Fernandez, A.1    Barrenechea, E.2    Bustince, H.3    Herrera, F.4
  • 22
    • 80052414830 scopus 로고    scopus 로고
    • Evolutionary-based selection of generalized instances for imbalanced classification
    • 10.1016/j.knosys.2011.01.012
    • García S, Derrac J, Triguero I, Carmona CJ, Herrera F (2012) Evolutionary-based selection of generalized instances for imbalanced classification. Knowl Based Syst 25: 3-12
    • (2012) Knowl Based Syst , vol.25 , pp. 3-12
    • García, S.1    Derrac, J.2    Triguero, I.3    Carmona, C.J.4    Herrera, F.5
  • 23
    • 27144479454 scopus 로고    scopus 로고
    • Boosting with data generation: Improving the classification of hard to learn examples
    • 10.1145/1007730.1007736
    • Guo H, Viktor HL (2004) Boosting with data generation: improving the classification of hard to learn examples. SIGKDD Explor 6(1): 30-39
    • (2004) SIGKDD Explor , vol.6 , Issue.1 , pp. 30-39
    • Guo, H.1    Viktor, H.L.2
  • 24
    • 33745886270 scopus 로고    scopus 로고
    • Classifier technology and the illusion of progress
    • DOI 10.1214/088342306000000060
    • Hand D (2006) Classifier technology and the illusion of progress. Stat Sci 21(1): 1-14 (Pubitemid 44046906)
    • (2006) Statistical Science , vol.21 , Issue.1 , pp. 1-14
    • Hand, D.J.1
  • 25
    • 0037410687 scopus 로고    scopus 로고
    • Choosing k for two-class nearest neighbour classifiers with unbalanced classes
    • DOI 10.1016/S0167-8655(02)00394-X
    • Hand D, Vinciotti V (2003) Choosing K for two-class nearest neighbour classifiers with unbalanced classes. Patt Recognit Lett 24: 1555-1562 (Pubitemid 36225860)
    • (2003) Pattern Recognition Letters , vol.24 , Issue.9-10 , pp. 1555-1562
    • Hand, D.J.1    Vinciotti, V.2
  • 27
    • 33845536164 scopus 로고    scopus 로고
    • The class imbalance problem: A systematic study
    • Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data An J 6
    • (2002) Intell Data An J , pp. 6
    • Japkowicz, N.1    Stephen, S.2
  • 28
    • 27144540575 scopus 로고    scopus 로고
    • Class imbalances versus small disjuncts
    • 10.1145/1007730.1007737 2056198
    • Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. SIGKDD Explor 6(1): 40-49
    • (2004) SIGKDD Explor , vol.6 , Issue.1 , pp. 40-49
    • Jo, T.1    Japkowicz, N.2
  • 31
    • 0036678729 scopus 로고    scopus 로고
    • A preliminary investigation of maximum likelihood logistic regression versus exact logistic regression
    • 10.1198/00031300283 1963262
    • King EN, Ryan TP (2002) A preliminary investigation of maximum likelihood logistic regression versus exact logistic regression. Am Stat 56: 163-170
    • (2002) Am Stat , vol.56 , pp. 163-170
    • King, E.N.1    Ryan, T.P.2
  • 32
    • 4544259831 scopus 로고    scopus 로고
    • Logistic regression in rare events data
    • 10.1093/oxfordjournals.pan.a004868
    • King G, Zeng L (2001) Logistic regression in rare events data. Political Anal 9: 137-163
    • (2001) Political Anal , vol.9 , pp. 137-163
    • King, G.1    Zeng, L.2
  • 36
    • 0034726260 scopus 로고    scopus 로고
    • Noisy replication in skewed binary classification
    • 10.1016/S0167-9473(99)00095-X 1046.62063
    • Lee S (2000) Noisy replication in skewed binary classification. Comput Stat Data An 34: 165-191
    • (2000) Comput Stat Data An , vol.34 , pp. 165-191
    • Lee, S.1
  • 37
    • 0033424579 scopus 로고    scopus 로고
    • Regularization in skewed binary classification
    • 10.1007/s001800050018 0933.62050
    • Lee S (1999) Regularization in skewed binary classification. Comput Stat 14: 277-292
    • (1999) Comput Stat , vol.14 , pp. 277-292
    • Lee, S.1
  • 38
    • 0036161029 scopus 로고    scopus 로고
    • Support vector machines for classification in nonstandard situations
    • DOI 10.1023/A:1012406528296
    • Lin Y, Lee Y, Wahba G (2002) Support vector machines for classification in nonstandard situations. Mach Learn 46: 191-202 (Pubitemid 34129968)
    • (2002) Machine Learning , vol.46 , Issue.1-3 , pp. 191-202
    • Lin, Y.1    Lee, Y.2    Wahba, G.3
  • 39
    • 33746529930 scopus 로고    scopus 로고
    • A study in machine learning from imbalanced data for sentence boundary detection in speech
    • DOI 10.1016/j.csl.2005.06.002, PII S0885230805000306
    • Liu Y, Chawla NV, Harper MP, Shriberg E, Stolcke A (2006) A study in machine learning from imbalanced data for sentence boundary detection in speech. Comput Speech & Lang 20: 468-494 (Pubitemid 44142004)
    • (2006) Computer Speech and Language , vol.20 , Issue.4 , pp. 468-494
    • Liu, Y.1    Chawla, N.V.2    Harper, M.P.3    Shriberg, E.4    Stolcke, A.5
  • 40
    • 40649126091 scopus 로고    scopus 로고
    • Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance
    • 10.1016/j.neunet.2007.12.031
    • Mazurowski MA (2008) Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Netw 21: 427-436
    • (2008) Neural Netw , vol.21 , pp. 427-436
    • Mazurowski, M.A.1
  • 42
    • 33947284406 scopus 로고    scopus 로고
    • Boosted classification trees and class probability/quantile estimation
    • Mease D, Wyner A, Buja A (2007) Boosted classification trees and class probability-quantile estimation. J Mach Learn Res 8: 409-439 (Pubitemid 46434120)
    • (2007) Journal of Machine Learning Research , vol.8 , pp. 409-439
    • Mease, D.1    Wyner, A.J.2    Buja, A.3
  • 43
    • 78650748373 scopus 로고    scopus 로고
    • Sampling bias and class imbalance in maximum-likelihood logistic regression
    • 10.1007/s11004-010-9311-8 1204.86025
    • Oommen T, BaiseL Vogel R (2011) Sampling bias and class imbalance in maximum-likelihood logistic regression. Math Geosci 43: 99-120
    • (2011) Math Geosci , vol.43 , pp. 99-120
    • Oommen, T.1    Baisel Vogel, R.2
  • 44
    • 80052951721 scopus 로고    scopus 로고
    • Assessing the impact of class-imbalanced data for classifying relevant/irrelevant medline documents
    • 10.1007/978-3-642-19914-1-45
    • Pavón R, Laza R, Reboiro-Jato M, Fdez-Riverola F (2011) Assessing the impact of class-imbalanced data for classifying relevant/irrelevant medline documents. Adv Intell Soft Comput 93: 345-353
    • (2011) Adv Intell Soft Comput , vol.93 , pp. 345-353
    • Pavón, R.1    Laza, R.2    Reboiro-Jato, M.3    Fdez-Riverola, F.4
  • 46
    • 0028202408 scopus 로고
    • Representation design and brute-force induction in a Boeing manufacturing domain
    • 10.1080/08839519408945435
    • Riddle P, Segal R, Etzioni O (1994) Representation design and brute-force induction in a Boeing manufacturing domain. Appl Artif Intell 8: 125-147
    • (1994) Appl Artif Intell , vol.8 , pp. 125-147
    • Riddle, P.1    Segal, R.2    Etzioni, O.3
  • 49
    • 79957987118 scopus 로고    scopus 로고
    • A parallel neural network approach to prediction of Parkinson's Disease
    • 10.1016/j.eswa.2011.04.028
    • Ström F, Koker R (2011) A parallel neural network approach to prediction of Parkinson's Disease. Expert Syst Appl 38(10): 12470-12474
    • (2011) Expert Syst Appl , vol.38 , Issue.10 , pp. 12470-12474
    • Ström, F.1    Koker, R.2
  • 50
    • 34547673383 scopus 로고    scopus 로고
    • Cost-sensitive boosting for classification of imbalanced data
    • DOI 10.1016/j.patcog.2007.04.009, PII S0031320307001835
    • Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Patt Recogn 40(12): 3358-3378 (Pubitemid 47223287)
    • (2007) Pattern Recognition , vol.40 , Issue.12 , pp. 3358-3378
    • Sun, Y.1    Kamel, M.S.2    Wong, A.K.C.3    Wang, Y.4
  • 51
    • 67650706774 scopus 로고    scopus 로고
    • Classification of imbalanced data: A review
    • 10.1142/S0218001409007326
    • Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Patt Recogn Artif Intell 23(4): 687-719
    • (2009) Int J Patt Recogn Artif Intell , vol.23 , Issue.4 , pp. 687-719
    • Sun, Y.1    Wong, A.K.C.2    Kamel, M.S.3
  • 52
    • 0036565589 scopus 로고    scopus 로고
    • An instance-weighting method to induce cost-sensitive trees
    • DOI 10.1109/TKDE.2002.1000348
    • Ting KM (2002) An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14(3): 659-665 (Pubitemid 34669622)
    • (2002) IEEE Transactions on Knowledge and Data Engineering , vol.14 , Issue.3 , pp. 659-665
    • Ting, K.M.1
  • 55
    • 77956023732 scopus 로고    scopus 로고
    • Combating the small sample class imbalance problem using feature selection
    • 10.1109/TKDE.2009.187
    • Wasikowski M, Chen XW (2010) Combating the small sample class imbalance problem using feature selection. IEEE Trans Knowl Data Eng 22(10): 1388-1400
    • (2010) IEEE Trans Knowl Data Eng , vol.22 , Issue.10 , pp. 1388-1400
    • Wasikowski, M.1    Chen, X.W.2
  • 56
    • 26844497970 scopus 로고    scopus 로고
    • A comparison of nonparametric error rate estimation methods in classification problems
    • DOI 10.1002/bimj.200410011
    • Wehberg S, Schumacher M (2004) A comparison of nonparametric error rate estimation methods in classification problems. Biom J 46(1): 35-47 (Pubitemid 41448942)
    • (2004) Biometrical Journal , vol.46 , Issue.1 , pp. 35-47
    • Wehberg, S.1    Schumacher, M.2
  • 57
    • 20844458491 scopus 로고    scopus 로고
    • Mining with rarity: A unifying framework
    • Weiss GM (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newsletter 6(1)
    • (2004) ACM SIGKDD Explor. Newsletter , vol.6 , Issue.1
    • Weiss, G.M.1
  • 58
    • 0003790115 scopus 로고    scopus 로고
    • The effect of class distribution on classifier learning: An empirical study
    • Department of Computer Science, Rutgers University, New Jersey
    • Weiss GM, Provost F (2001) The effect of class distribution on classifier learning: an empirical study. Technical report, ML-TR-44, Department of Computer Science, Rutgers University, New Jersey
    • (2001) Technical Report, ML-TR-44
    • Weiss, G.M.1    Provost, F.2
  • 59
    • 64049108468 scopus 로고    scopus 로고
    • Exploratory undersampling for class-imbalance learning
    • Wu XLJ, Zhou Z (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans: On Syst., Man, Cybern., B 39: 539-550
    • (2009) IEEE Trans: On Syst., Man, Cybern., B , vol.39 , pp. 539-550
    • Wu, X.L.J.1    Zhou, Z.2
  • 60
    • 33748947459 scopus 로고    scopus 로고
    • Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset
    • DOI 10.1007/11816492-89, Intelligent Control and Automation: International Conference on Intelligent Computing, ICIC 2006
    • Yen S, Lee Y (2006) Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset. Intelligent Control and Automation. Series: Lecture Notes in Control and Information Sciences, pp 731-740 (Pubitemid 44431759)
    • (2006) Lecture Notes in Control and Information Sciences , vol.344 , pp. 731-740
    • Yen, S.-J.1    Lee, Y.-S.2
  • 61
    • 31344442851 scopus 로고    scopus 로고
    • Training cost-sensitive neural networks with methods addressing the class imbalance problem
    • DOI 10.1109/TKDE.2006.17
    • Zhou Z, Liu X (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1): 63-77 (Pubitemid 43145089)
    • (2006) IEEE Transactions on Knowledge and Data Engineering , vol.18 , Issue.1 , pp. 63-77
    • Zhou, Z.-H.1    Liu, X.-Y.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.