메뉴 건너뛰기




Volumn 285, Issue 1, 2014, Pages 112-137

On the use of MapReduce for imbalanced big data using Random Forest

Author keywords

Big data; Cost sensitive learning; Imbalanced dataset; MapReduce; Random Forest; Sampling

Indexed keywords

CLASSIFICATION (OF INFORMATION); DATA MINING; DECISION TREES; DIGITAL STORAGE; LEARNING SYSTEMS; SAMPLING;

EID: 84906873734     PISSN: 00200255     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.ins.2014.03.043     Document Type: Article
Times cited : (277)

References (61)
  • 1
    • 84926156679 scopus 로고    scopus 로고
    • (accessed December 2013)
    • Apache Drill, 2013 (accessed December 2013).
    • (2013)
  • 2
    • 84888107606 scopus 로고    scopus 로고
    • (accessed December 2013)
    • Apache Hadoop Project, Apache Hadoop, 2013 (accessed December 2013).
    • (2013) Apache Hadoop
  • 3
    • 84870749286 scopus 로고    scopus 로고
    • (accessed December 2013)
    • Apache Mahout Project, Apache Mahout, 2013 (accessed December 2013).
    • (2013) Apache Mahout
  • 6
    • 27144531570 scopus 로고    scopus 로고
    • A study of the behaviour of several methods for balancing machine learning training data
    • G.E.A.P.A. Batista, R.C. Prati, M.C. Monard, A study of the behaviour of several methods for balancing machine learning training data, SIGKDD Explor. 6 (1) (2004) 20-29.
    • (2004) SIGKDD Explor. , vol.6 , Issue.1 , pp. 20-29
    • Batista, G.E.A.P.A.1    Prati, R.C.2    Monard, M.C.3
  • 7
    • 84857022489 scopus 로고    scopus 로고
    • Adjusted geometric-mean: A novel performance measure for imbalanced bioinformatics datasets learning
    • R. Batuwita, V. Palade, Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning, J. Bioinform. Comput. Biol. 10 (4) (2012).
    • (2012) J. Bioinform. Comput. Biol. , vol.10 , Issue.4
    • Batuwita, R.1    Palade, V.2
  • 8
    • 84891059922 scopus 로고    scopus 로고
    • 3D data management: Controlling data volume
    • (accessed August 2013)
    • M. Beyer, D. Laney, 3D Data Management: Controlling Data Volume, Velocity and Variety, 2001 (accessed August 2013).
    • (2001) Velocity and Variety
    • Beyer, M.1    Laney, D.2
  • 9
    • 0035478854 scopus 로고    scopus 로고
    • Random forests
    • L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5-32.
    • (2001) Mach. Learn. , vol.45 , Issue.1 , pp. 5-32
    • Breiman, L.1
  • 13
    • 50549101751 scopus 로고    scopus 로고
    • Automatically countering imbalance and its empirical relationship to cost
    • N.V. Chawla, D.A. Cieslak, L.O. Hall, A. Joshi, Automatically countering imbalance and its empirical relationship to cost, Data Min. Knowl. Discov. 17 (2) (2008) 225-252.
    • (2008) Data Min. Knowl. Discov. , vol.17 , Issue.2 , pp. 225-252
    • Chawla, N.V.1    Cieslak, D.A.2    Hall, L.O.3    Joshi, A.4
  • 14
  • 15
    • 37549003336 scopus 로고    scopus 로고
    • Mapreduce: Simplified data processing on large clusters
    • J. Dean, S. Ghemawat, Mapreduce: simplified data processing on large clusters, Commun. ACM 51 (1) (2008) 107-113.
    • (2008) Commun. ACM , vol.51 , Issue.1 , pp. 107-113
    • Dean, J.1    Ghemawat, S.2
  • 19
    • 84874667219 scopus 로고    scopus 로고
    • Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches
    • A. Fernández, V. López, M. Galar, M.J. del Jesus, F. Herrera, Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches, Knowl.-Based Syst. 42 (2013) 97-110.
    • (2013) Knowl.-Based Syst. , vol.42 , pp. 97-110
    • Fernández, A.1    López, V.2    Galar, M.3    Del Jesus, M.J.4    Herrera, F.5
  • 21
    • 80052414830 scopus 로고    scopus 로고
    • Evolutionary-based selection of generalized instances for imbalanced classification
    • S. García, J. Derrac, I. Triguero, C.J. Carmona, F. Herrera, Evolutionary-based selection of generalized instances for imbalanced classification, Knowl. Based Syst. 25 (1) (2012) 3-12.
    • (2012) Knowl. Based Syst. , vol.25 , Issue.1 , pp. 3-12
    • García, S.1    Derrac, J.2    Triguero, I.3    Carmona, C.J.4    Herrera, F.5
  • 22
    • 70349617264 scopus 로고    scopus 로고
    • Evolutionary under-sampling for classification with imbalanced data sets: Proposals and taxonomy
    • S. García, F. Herrera, Evolutionary under-sampling for classification with imbalanced data sets: proposals and taxonomy, Evol. Comput. 17 (3) (2009) 275-306.
    • (2009) Evol. Comput. , vol.17 , Issue.3 , pp. 275-306
    • García, S.1    Herrera, F.2
  • 23
    • 50549093573 scopus 로고    scopus 로고
    • On the k-NN performance in a challenging scenario of imbalance and overlapping
    • V. García, R.A. Mollineda, J.S. Sánchez, On the k-NN performance in a challenging scenario of imbalance and overlapping, Pattern Anal. Appl. 11 (3-4) (2008) 269-280.
    • (2008) Pattern Anal. Appl. , vol.11 , Issue.3-4 , pp. 269-280
    • García, V.1    Mollineda, R.A.2    Sánchez, J.S.3
  • 27
    • 68549133155 scopus 로고    scopus 로고
    • Learning from imbalanced data
    • H. He, E.A. Garcia, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng. 21 (9) (2009) 1263-1284.
    • (2009) IEEE Trans. Knowl. Data Eng. , vol.21 , Issue.9 , pp. 1263-1284
    • He, H.1    Garcia, E.A.2
  • 28
    • 33845536164 scopus 로고    scopus 로고
    • The class imbalance problem: A systematic study
    • N. Japkowicz, S. Stephen, The class imbalance problem: a systematic study, Intell. Data Anal. J. 6 (5) (2002) 429-450.
    • (2002) Intell. Data Anal. J. , vol.6 , Issue.5 , pp. 429-450
    • Japkowicz, N.1    Stephen, S.2
  • 31
    • 84911445875 scopus 로고    scopus 로고
    • Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data
    • (in press)
    • V. López, S. del Río, J. Benítez, F. Herrera, Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data, Fuzzy Sets Syst. (2014), http://dx.doi.org/10.1016/j.fss.2014.01.01 (in press).
    • (2014) Fuzzy Sets Syst.
    • López, V.1    Del Río, S.2    Benítez, J.3    Herrera, F.4
  • 32
    • 84871621085 scopus 로고    scopus 로고
    • A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets
    • V. López, A. Fernández, M.J. del Jesus, F. Herrera, A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets, Knowl.-Based Syst. 38 (2013) 85-104.
    • (2013) Knowl.-Based Syst. , vol.38 , pp. 85-104
    • López, V.1    Fernández, A.2    Del Jesus, M.J.3    Herrera, F.4
  • 33
    • 84883447718 scopus 로고    scopus 로고
    • An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics
    • V. López, A. Fernández, S. García, V. Palade, F. Herrera, An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics, Inform. Sci. 250 (2013) 113-141.
    • (2013) Inform. Sci. , vol.250 , pp. 113-141
    • López, V.1    Fernández, A.2    García, S.3    Palade, V.4    Herrera, F.5
  • 34
    • 84856964446 scopus 로고    scopus 로고
    • Analysis of preprocessing vs. Cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics
    • V. López, A. Fernández, J.G. Moreno-Torres, F. Herrera, Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics, Exp. Syst. Appl. 39 (7) (2012) 6585-6608.
    • (2012) Exp. Syst. Appl. , vol.39 , Issue.7 , pp. 6585-6608
    • López, V.1    Fernández, A.2    Moreno-Torres, J.G.3    Herrera, F.4
  • 39
    • 84866062110 scopus 로고    scopus 로고
    • Computational intelligence for heart disease diagnosis: A medical knowledge driven approach
    • J. Nahar, T. Imam, K.S. Tickle, Y.-P.P. Chen, Computational intelligence for heart disease diagnosis: a medical knowledge driven approach, Exp. Syst. Appl. 40 (1) (2013) 96-104.
    • (2013) Exp. Syst. Appl. , vol.40 , Issue.1 , pp. 96-104
    • Nahar, J.1    Imam, T.2    Tickle, K.S.3    Chen, Y.-P.P.4
  • 41
    • 84874403630 scopus 로고    scopus 로고
    • Coping with unbalanced class data sets in oral absorption models
    • D. Newby, A.A. Freitas, T. Ghafourian, Coping with unbalanced class data sets in oral absorption models, J. Chem. Inform. Model. 53 (2) (2013) 461-474.
    • (2013) J. Chem. Inform. Model. , vol.53 , Issue.2 , pp. 461-474
    • Newby, D.1    Freitas, A.A.2    Ghafourian, T.3
  • 42
    • 55549116330 scopus 로고    scopus 로고
    • Evolutionary rule-based systems for imbalanced datasets
    • A. Orriols-Puig, E. Bernadó-Mansilla, Evolutionary rule-based systems for imbalanced datasets, Soft Comput. 13 (3) (2009) 213-225.
    • (2009) Soft Comput. , vol.13 , Issue.3 , pp. 213-225
    • Orriols-Puig, A.1    Bernadó-Mansilla, E.2
  • 44
    • 84873292703 scopus 로고    scopus 로고
    • The design of polynomial function-based neural network predictors for detection of software defects
    • B.-J. Park, S.-K. Oh, W. Pedrycz, The design of polynomial function-based neural network predictors for detection of software defects, Inform. Sci. 229 (2013) 40-57.
    • (2013) Inform. Sci. , vol.229 , pp. 40-57
    • Park, B.-J.1    Oh, S.-K.2    Pedrycz, W.3
  • 45
    • 0026120032 scopus 로고
    • Small sample size effects in statistical pattern recognition: Recommendations for practitioners
    • S.J. Raudys, A.K. Jain, Small sample size effects in statistical pattern recognition: recommendations for practitioners, IEEE Trans. Pattern Anal. Mach. Intell. 13 (3) (1991) 252-264.
    • (1991) IEEE Trans. Pattern Anal. Mach. Intell. , vol.13 , Issue.3 , pp. 252-264
    • Raudys, S.J.1    Jain, A.K.2
  • 46
    • 84883450766 scopus 로고    scopus 로고
    • An empirical study of the classification performance of learners on imbalanced and noisy software quality data
    • C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse, A. Folleco, An empirical study of the classification performance of learners on imbalanced and noisy software quality data, Inform. Sci. 259 (2014) 571-595.
    • (2014) Inform. Sci. , vol.259 , pp. 571-595
    • Seiffert, C.1    Khoshgoftaar, T.M.2    Van Hulse, J.3    Folleco, A.4
  • 47
    • 84926175106 scopus 로고    scopus 로고
    • (accessed December 2013)
    • Spark, 2013 (accessed December 2013).
    • (2013)
  • 48
    • 84926225131 scopus 로고    scopus 로고
    • (accessed December 2013)
    • Storm, 2013 (accessed December 2013).
    • (2013)
  • 49
    • 34547673383 scopus 로고    scopus 로고
    • Cost-sensitive boosting for classification of imbalanced data
    • Y. Sun, M.S. Kamel, A.K.C. Wong, Y. Wang, Cost-sensitive boosting for classification of imbalanced data, Pattern Recognit. 40 (12) (2007) 3358-3378.
    • (2007) Pattern Recognit. , vol.40 , Issue.12 , pp. 3358-3378
    • Sun, Y.1    Kamel, M.S.2    Wong, A.K.C.3    Wang, Y.4
  • 53
    • 77958064179 scopus 로고    scopus 로고
    • Mining data with random forests: A survey and results of new tests
    • A. Verikas, A. Gelzinis, M. Bacauskiene, Mining data with random forests: a survey and results of new tests, Pattern Recognit. 44 (2) (2011) 330-349.
    • (2011) Pattern Recognit. , vol.44 , Issue.2 , pp. 330-349
    • Verikas, A.1    Gelzinis, A.2    Bacauskiene, M.3
  • 54
    • 77956023732 scopus 로고    scopus 로고
    • Combating the small sample class imbalance problem using feature selection
    • M. Wasikowski, X.-W. Chen, Combating the small sample class imbalance problem using feature selection, IEEE Trans. Knowl. Data Eng. 22 (10) (2010) 1388-1400.
    • (2010) IEEE Trans. Knowl. Data Eng. , vol.22 , Issue.10 , pp. 1388-1400
    • Wasikowski, M.1    Chen, X.-W.2
  • 55
    • 82555204641 scopus 로고    scopus 로고
    • Mining with rare cases
    • O. Maimon, L. Rokach (eds.) Springer
    • G.M. Weiss, Mining with rare cases, in: O. Maimon, L. Rokach (Eds.), The Data Mining and Knowledge Discovery Handbook, Springer, 2005, pp. 765-776.
    • (2005) The Data Mining and Knowledge Discovery Handbook , pp. 765-776
    • Weiss, G.M.1
  • 56
    • 77956198600 scopus 로고    scopus 로고
    • The impact of small disjuncts on classifier learning
    • R. Stahlbock, S.F. Crone, S. Lessmann (eds.) Springer
    • G.M. Weiss, The impact of small disjuncts on classifier learning, in: R. Stahlbock, S.F. Crone, S. Lessmann (Eds.), Data Mining, Annals of Information Systems, vol. 8, Springer, 2010, pp. 193-226.
    • (2010) Data Mining, Annals of Information Systems , vol.8 , pp. 193-226
    • Weiss, G.M.1
  • 58
    • 84868621497 scopus 로고    scopus 로고
    • ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced dna microarray data
    • H. Yu, J. Ni, J. Zhao, ACOSampling: an ant colony optimization-based undersampling method for classifying imbalanced dna microarray data, Neurocomputing 101 (2013) 309-318.
    • (2013) Neurocomputing , vol.101 , pp. 309-318
    • Yu, H.1    Ni, J.2    Zhao, J.3
  • 61
    • 84873725663 scopus 로고    scopus 로고
    • Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods
    • L. Zhou, Performance of corporate bankruptcy prediction models on imbalanced dataset: the effect of sampling methods, Knowl.-Based Syst. 41 (2013) 16-25.
    • (2013) Knowl.-Based Syst. , vol.41 , pp. 16-25
    • Zhou, L.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.