메뉴 건너뛰기




Volumn 44, Issue 2, 2011, Pages 330-349

Mining data with random forests: A survey and results of new tests

Author keywords

Classifier; Data proximity; Random forests; Variable importance; Variable selection

Indexed keywords

APPLICATION EXAMPLES; DATA EXPLORATION; DATA PROXIMITY; IMPORTANCE MEASURE; LITERATURE SURVEY; OUTLIER DETECTION; RANDOM FORESTS; SMALL DATA SET; VARIABLE IMPORTANCE; VARIABLE RANKING; VARIABLE SELECTION;

EID: 77958064179     PISSN: 00313203     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.patcog.2010.08.011     Document Type: Article
Times cited : (606)

References (112)
  • 4
    • 0001224048 scopus 로고    scopus 로고
    • Sparse bayesian learning and the relevance vector machine
    • M.E. Tipping Sparse bayesian learning and the relevance vector machine Journal of Machine Learning Research 1 2001 211 244
    • (2001) Journal of Machine Learning Research , vol.1 , pp. 211-244
    • Tipping, M.E.1
  • 8
    • 0030211964 scopus 로고    scopus 로고
    • Bagging predictors
    • L. Breiman Bagging predictors Machine Learning 24 2 1996 123 140
    • (1996) Machine Learning , vol.24 , Issue.2 , pp. 123-140
    • Breiman, L.1
  • 9
    • 0034250160 scopus 로고    scopus 로고
    • An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization
    • T.G. Dietterich An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization Machine Learning 40 2 2000 139 157
    • (2000) Machine Learning , vol.40 , Issue.2 , pp. 139-157
    • Dietterich, T.G.1
  • 13
    • 62649160355 scopus 로고    scopus 로고
    • Identification of DNA-binding proteins using structural, electrostatic and evolutionary features
    • G. Nimrod, A. Szilagyi, C. Leslie, and N. Ben-Tal Identification of DNA-binding proteins using structural, electrostatic and evolutionary features Journal of Molecular Biology 387 4 2009 1040 1053
    • (2009) Journal of Molecular Biology , vol.387 , Issue.4 , pp. 1040-1053
    • Nimrod, G.1    Szilagyi, A.2    Leslie, C.3    Ben-Tal, N.4
  • 14
    • 33745840436 scopus 로고    scopus 로고
    • Segmenting highly articulated video objects with weak-prior random forests
    • H.T. Chen, T.L. Liu, and C.S. Fuh Segmenting highly articulated video objects with weak-prior random forests A. Leonardis, H. Bischof, A. Pinz, ECCV 2006, Part IV, Lecture Notes in Computer Science vol. 3954 2006 Springer-Verlag Berlin, Heidelberg 373 385
    • (2006) ECCV 2006, Part IV, Lecture Notes in Computer Science , vol.3954 , pp. 373-385
    • Chen, H.T.1    Liu, T.L.2    Fuh, C.S.3
  • 23
    • 60649095847 scopus 로고    scopus 로고
    • Large-scale prediction of long disordered regions in proteins using random forests
    • P. Han, X. Zhang, R.S. Norton, and Z.P. Feng Large-scale prediction of long disordered regions in proteins using random forests BMC Bioinformatics 10 8 2009 1 9
    • (2009) BMC Bioinformatics , vol.10 , Issue.8 , pp. 1-9
    • Han, P.1    Zhang, X.2    Norton, R.S.3    Feng, Z.P.4
  • 24
    • 67651115922 scopus 로고    scopus 로고
    • Monitoring of cropland practices for carbon sequestration purposes in north central montana by landsat remote sensing
    • J.D. Watts, R.L. Lawrence, P.R. Miller, and C. Montagne Monitoring of cropland practices for carbon sequestration purposes in north central montana by landsat remote sensing Remote Sensing of Environment 113 2009 1843 1852
    • (2009) Remote Sensing of Environment , vol.113 , pp. 1843-1852
    • Watts, J.D.1    Lawrence, R.L.2    Miller, P.R.3    Montagne, C.4
  • 25
    • 39049116621 scopus 로고    scopus 로고
    • Feature selection for morphological feature extraction using random forests
    • IEEE New York, Reykjavik, Iceland
    • S.R. Joelsson, J.A. Benediktsson, and J.R. Sveinsson Feature selection for morphological feature extraction using random forests Seventh Nordic Signal Processing Symposium 2006 IEEE New York, Reykjavik, Iceland 138 141
    • (2006) Seventh Nordic Signal Processing Symposium , pp. 138-141
    • Joelsson, S.R.1    Benediktsson, J.A.2    Sveinsson, J.R.3
  • 26
    • 30344489020 scopus 로고    scopus 로고
    • QSAR analysis of phenolic antioxidants using MOLMAP descriptors of local properties
    • S. Gupta, S. Matthew, P.M. Abreu, and J.A. de Sousa QSAR analysis of phenolic antioxidants using MOLMAP descriptors of local properties Bioorganic & Medicinal Chemistry 14 2006 1199 1206
    • (2006) Bioorganic & Medicinal Chemistry , vol.14 , pp. 1199-1206
    • Gupta, S.1    Matthew, S.2    Abreu, P.M.3    De Sousa, J.A.4
  • 29
    • 0345040873 scopus 로고    scopus 로고
    • Classification and regression by random forest
    • A. Liaw, and M. Wiener Classification and regression by random forest R News 2 3 2002 18 22
    • (2002) R News , vol.2 , Issue.3 , pp. 18-22
    • Liaw, A.1    Wiener, M.2
  • 30
    • 84855869376 scopus 로고    scopus 로고
    • RFtools-for predicting and understanding data
    • Berkeley University, Berkeley, USA
    • L. Breiman, RFtoolsfor predicting and understanding data, Technical Report, Berkeley University, Berkeley, USA 〈 http://oz.berkeley.edu/users/ breiman/RandomForests/cc.papers.htm 〉, 2004.
    • Technical Report
    • Breiman, L.1
  • 33
  • 34
    • 48549094895 scopus 로고    scopus 로고
    • A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
    • A. Statnikov, L. Wang, and C.F. Aliferis A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification BMC Bioinformatics 9 319 2008 1 10
    • (2008) BMC Bioinformatics , vol.9 , Issue.319 , pp. 1-10
    • Statnikov, A.1    Wang, L.2    Aliferis, C.F.3
  • 36
    • 0036083445 scopus 로고    scopus 로고
    • A data complexity analysis of comparative advantages of decision forest constructors
    • T.K. Ho A data complexity analysis of comparative advantages of decision forest constructors Pattern Analysis and Applications 5 2002 102 112
    • (2002) Pattern Analysis and Applications , vol.5 , pp. 102-112
    • Ho, T.K.1
  • 38
    • 0242288813 scopus 로고    scopus 로고
    • The support vector machine under test
    • D. Meyer, F. Leisch, and K. Hornik The support vector machine under test Neurocomputing 55 12 2003 169 186
    • (2003) Neurocomputing , vol.55 , Issue.12 , pp. 169-186
    • Meyer, D.1    Leisch, F.2    Hornik, K.3
  • 41
    • 0033570831 scopus 로고    scopus 로고
    • Combined 5×2 cv F test for comparing supervised classification learning algorithms
    • E. Alpaydin Combined 5×2 cv F test for comparing supervised classification learning algorithms Neural Computation 11 8 1999 1885 1892
    • (1999) Neural Computation , vol.11 , Issue.8 , pp. 1885-1892
    • Alpaydin, E.1
  • 42
    • 29644438050 scopus 로고    scopus 로고
    • Statistical comparisons of classifiers over multiple data sets
    • J. Demsar Statistical comparisons of classifiers over multiple data sets Journal of Machine Learning Research 7 2006 1 30 (Pubitemid 43022939)
    • (2006) Journal of Machine Learning Research , vol.7 , pp. 1-30
    • Demsar, J.1
  • 43
    • 30644464444 scopus 로고    scopus 로고
    • Gene selection and classification of microarray data using random forest
    • R. Diaz-Uriarte, and S. Alvarez de Andres Gene selection and classification of microarray data using random forest BMC Bioinformatics 7 3 2006 1 13
    • (2006) BMC Bioinformatics , vol.7 , Issue.3 , pp. 1-13
    • Diaz-Uriarte, R.1    Alvarez De Andres, S.2
  • 45
    • 58349116623 scopus 로고    scopus 로고
    • Customer churn prediction using improved balanced random forests
    • Y. Xie, X. Li, E.W.T. Ngai, and W. Ying Customer churn prediction using improved balanced random forests Expert Systems with Applications 36 2009 5445 5449
    • (2009) Expert Systems with Applications , vol.36 , pp. 5445-5449
    • Xie, Y.1    Li, X.2    Ngai, E.W.T.3    Ying, W.4
  • 47
    • 34248524002 scopus 로고    scopus 로고
    • Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques
    • K. Coussement, and D. Van den Poel Churn prediction in subscription services: an application of support vector machines while comparing two parameter-selection techniques Expert Systems with Applications 34 2008 313 327
    • (2008) Expert Systems with Applications , vol.34 , pp. 313-327
    • Coussement, K.1    Van Den Poel, D.2
  • 48
    • 58349110712 scopus 로고    scopus 로고
    • Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers
    • K. Coussement, and D. Van den Poel Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers Expert Systems with Applications 36 2009 6127 6134
    • (2009) Expert Systems with Applications , vol.36 , pp. 6127-6134
    • Coussement, K.1    Van Den Poel, D.2
  • 54
    • 62649092955 scopus 로고    scopus 로고
    • Towards large-scale FAME-based bacterial species identification using machine learning techniques
    • B. Slabbinck, B. De Baets, P. Dawyndt, and P. De Vos Towards large-scale FAME-based bacterial species identification using machine learning techniques Systematic and Applied Microbiology 32 2009 163 176
    • (2009) Systematic and Applied Microbiology , vol.32 , pp. 163-176
    • Slabbinck, B.1    De Baets, B.2    Dawyndt, P.3    De Vos, P.4
  • 57
    • 56349098794 scopus 로고    scopus 로고
    • Subcellular localisation of proteins in fluorescent microscope images using a random forest
    • IEEE Hong Kong, PR China
    • A.Z. Kouzani Subcellular localisation of proteins in fluorescent microscope images using a random forest 2008 IEEE International Joint Conference on Neural Networks vol. 18 2008 IEEE Hong Kong, PR China 3926 3932
    • (2008) 2008 IEEE International Joint Conference on Neural Networks , vol.18 , pp. 3926-3932
    • Kouzani, A.Z.1
  • 58
    • 64749092250 scopus 로고    scopus 로고
    • Discriminating acidic and alkaline enzymes using a random forest model with secondary structure amino acid composition
    • G. Zhang, H. Li, and B. Fang Discriminating acidic and alkaline enzymes using a random forest model with secondary structure amino acid composition Process Biochemistry 44 2009 654 660
    • (2009) Process Biochemistry , vol.44 , pp. 654-660
    • Zhang, G.1    Li, H.2    Fang, B.3
  • 59
    • 33746317317 scopus 로고    scopus 로고
    • Predicting habitat suitability with machine learning models: The potential area of Pinus sylvestris L. in the Iberian peninsula
    • M.B. Garzon, R. Blazek, M. Neteler, R.S. de Dios, H.S. Ollero, and C. Furlanello Predicting habitat suitability with machine learning models: the potential area of Pinus sylvestris L. in the Iberian peninsula Ecological Modelling 197 2006 383 393
    • (2006) Ecological Modelling , vol.197 , pp. 383-393
    • Garzon, M.B.1    Blazek, R.2    Neteler, M.3    De Dios, R.S.4    Ollero, H.S.5    Furlanello, C.6
  • 60
    • 33645330972 scopus 로고    scopus 로고
    • Newer classification and regression tree techniques: Bagging and random forests for ecological prediction
    • A.M. Prasad, L.R. Iverson, and A. Liaw Newer classification and regression tree techniques: bagging and random forests for ecological prediction Ecosystems 9 2006 181 199
    • (2006) Ecosystems , vol.9 , pp. 181-199
    • Prasad, A.M.1    Iverson, L.R.2    Liaw, A.3
  • 61
    • 34247345395 scopus 로고    scopus 로고
    • Comparing the chemical spaces of metabolites and available chemicals: Models of metabolite-likeness
    • S. Gupta, and J. Aires-deSousa Comparing the chemical spaces of metabolites and available chemicals: models of metabolite-likeness Molecular Diversity 11 1 2007 23 36
    • (2007) Molecular Diversity , vol.11 , Issue.1 , pp. 23-36
    • Gupta, S.1    Aires-Desousa, J.2
  • 63
    • 62249107327 scopus 로고    scopus 로고
    • Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition
    • D. Kocev, S. Dzeroski, M.D. White, G.R. Newell, and P. Griffioen Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition Ecological Modelling 220 2009 1159 1168
    • (2009) Ecological Modelling , vol.220 , pp. 1159-1168
    • Kocev, D.1    Dzeroski, S.2    White, M.D.3    Newell, G.R.4    Griffioen, P.5
  • 65
    • 10444274107 scopus 로고    scopus 로고
    • Customer base analysis: Partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting
    • W. Buckinx, and D. Van den Poel Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting European Journal of Operational Research 164 2005 252 268
    • (2005) European Journal of Operational Research , vol.164 , pp. 252-268
    • Buckinx, W.1    Van Den Poel, D.2
  • 66
    • 2342533421 scopus 로고    scopus 로고
    • Class prediction by nearest shrunken centroids, with applications to DNA microarrays
    • R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu Class prediction by nearest shrunken centroids, with applications to DNA microarrays Statistical Science 18 1 2003 104 117
    • (2003) Statistical Science , vol.18 , Issue.1 , pp. 104-117
    • Tibshirani, R.1    Hastie, T.2    Narasimhan, B.3    Chu, G.4
  • 67
    • 0001025418 scopus 로고
    • Bayesian interpolation
    • D.J. MacKay Bayesian interpolation Neural Computation 4 1992 415 447
    • (1992) Neural Computation , vol.4 , pp. 415-447
    • MacKay, D.J.1
  • 68
    • 0000234257 scopus 로고
    • The evidence framework applied to classification networks
    • D.J.C. MacKay The evidence framework applied to classification networks Neural Computation 4 5 1992 720 736
    • (1992) Neural Computation , vol.4 , Issue.5 , pp. 720-736
    • MacKay, D.J.C.1
  • 70
  • 73
    • 33846833726 scopus 로고    scopus 로고
    • Rapid and non-destructive identification of strawberry cultivars by direct PTR-MS headspace analysis and data mining techniques
    • P.M. Granitto, F. Biasioli, E. Aprea, D. Mott, C. Furlanello, T.D. Mark, and F. Gasperi Rapid and non-destructive identification of strawberry cultivars by direct PTR-MS headspace analysis and data mining techniques Sensors and Actuators B 121 2007 379 385
    • (2007) Sensors and Actuators B , vol.121 , pp. 379-385
    • Granitto, P.M.1    Biasioli, F.2    Aprea, E.3    Mott, D.4    Furlanello, C.5    Mark, T.D.6    Gasperi, F.7
  • 78
    • 57049102289 scopus 로고    scopus 로고
    • Benchmarking classifiers to optimally integrate terrain analysis and multispectral remote sensing in automatic rock glacier detection
    • A. Brenning Benchmarking classifiers to optimally integrate terrain analysis and multispectral remote sensing in automatic rock glacier detection Remote Sensing of Environment 113 1 2009 239 247
    • (2009) Remote Sensing of Environment , vol.113 , Issue.1 , pp. 239-247
    • Brenning, A.1
  • 80
    • 15744402095 scopus 로고    scopus 로고
    • A performance comparison of modern statistical techniques for molecular descriptor selection and retention prediction in chromatographic QSRR studies
    • T. Hancock, R. Put, D. Coomans, Y.V. Heyden, and Y. Everingham A performance comparison of modern statistical techniques for molecular descriptor selection and retention prediction in chromatographic QSRR studies Chemometrics and Intelligent Laboratory Systems 76 2005 185 196
    • (2005) Chemometrics and Intelligent Laboratory Systems , vol.76 , pp. 185-196
    • Hancock, T.1    Put, R.2    Coomans, D.3    Heyden, Y.V.4    Everingham, Y.5
  • 81
    • 33847096395 scopus 로고    scopus 로고
    • Bias in random forest variable importance measures: Illustrations, sources and a solution
    • C. Strobl, A.L. Boulesteix, A. Zeileis, and T. Hothorn Bias in random forest variable importance measures: illustrations, sources and a solution BMC Bioinformatics 8 25 2007 1 21
    • (2007) BMC Bioinformatics , vol.8 , Issue.25 , pp. 1-21
    • Strobl, C.1    Boulesteix, A.L.2    Zeileis, A.3    Hothorn, T.4
  • 83
    • 35748978234 scopus 로고    scopus 로고
    • Empirical characterization of random forest variable importance measures
    • K.J. Archer, and R.V. Kimes Empirical characterization of random forest variable importance measures Computational Statistics & Data Analysis 52 4 2008 2249 2260
    • (2008) Computational Statistics & Data Analysis , vol.52 , Issue.4 , pp. 2249-2260
    • Archer, K.J.1    Kimes, R.V.2
  • 84
    • 38049134913 scopus 로고    scopus 로고
    • Random forest for gene expression based cancer classification: Overlooked issues
    • O. Okun, and H. Priisalu Random forest for gene expression based cancer classification: overlooked issues J. Marti, IbPRIA 2007, Part II, Lecture Notes in Computer Science vol. 4478 2007 Springer-Verlag Berlin, Heidelberg 483 490
    • (2007) IbPRIA 2007, Part II, Lecture Notes in Computer Science , vol.4478 , pp. 483-490
    • Okun, O.1    Priisalu, H.2
  • 85
    • 57849165378 scopus 로고    scopus 로고
    • Exploring precrash maneuvers using classification trees and random forests
    • R. Harb, X. Yan, E. Radwan, and X. Su Exploring precrash maneuvers using classification trees and random forests Accident Analysis and Prevention 41 2009 98 107
    • (2009) Accident Analysis and Prevention , vol.41 , pp. 98-107
    • Harb, R.1    Yan, X.2    Radwan, E.3    Su, X.4
  • 86
    • 58549109024 scopus 로고    scopus 로고
    • Multivariate exponential survival trees and their application to tooth prognosis
    • J. Fan, M.E. Nunn, and X. Su Multivariate exponential survival trees and their application to tooth prognosis Computational Statistics and Data Analysis 53 2009 1110 1121
    • (2009) Computational Statistics and Data Analysis , vol.53 , pp. 1110-1121
    • Fan, J.1    Nunn, M.E.2    Su, X.3
  • 89
    • 33847236254 scopus 로고    scopus 로고
    • Multivariate feature selection and hierarchical classification for infrared spectroscopy: Serum-based detection of bovine spongiform encephalopathy
    • B.H. Menze, W. Petrich, and F.A. Hamprecht Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy Analytical and Bioanalytical Chemistry 387 5 2007 1801 1807
    • (2007) Analytical and Bioanalytical Chemistry , vol.387 , Issue.5 , pp. 1801-1807
    • Menze, B.H.1    Petrich, W.2    Hamprecht, F.A.3
  • 90
    • 33748692689 scopus 로고    scopus 로고
    • Bagged super wavelets reduction for boosted prostate cancer classification of seldi-tof mass spectral serum profiles
    • D. Donald, T. Hancock, D. Coomans, and Y. Everingham Bagged super wavelets reduction for boosted prostate cancer classification of seldi-tof mass spectral serum profiles Chemometrics and Intelligent Laboratory Systems 82 2006 2 7
    • (2006) Chemometrics and Intelligent Laboratory Systems , vol.82 , pp. 2-7
    • Donald, D.1    Hancock, T.2    Coomans, D.3    Everingham, Y.4
  • 91
    • 22144481002 scopus 로고    scopus 로고
    • Predicting customer retention and profitability by using random forests and regression forests techniques
    • B. Lariviere, and D. Van den Poel Predicting customer retention and profitability by using random forests and regression forests techniques Expert Systems with Applications 29 2005 472 484
    • (2005) Expert Systems with Applications , vol.29 , pp. 472-484
    • Lariviere, B.1    Van Den Poel, D.2
  • 92
    • 40849145711 scopus 로고    scopus 로고
    • Understanding preferences for income redistribution
    • L.C. Keely, and C.M. Tan Understanding preferences for income redistribution Journal of Public Economics 92 2008 944 961
    • (2008) Journal of Public Economics , vol.92 , pp. 944-961
    • Keely, L.C.1    Tan, C.M.2
  • 93
    • 25444453244 scopus 로고    scopus 로고
    • Screening large-scale association study data: Exploiting interactions using random forests
    • K.L. Lunetta, L.B. Hayward, J. Segal, and P. Van Eerdewegh Screening large-scale association study data: exploiting interactions using random forests BMC Genetics 5 32 2004 1 13
    • (2004) BMC Genetics , vol.5 , Issue.32 , pp. 1-13
    • Lunetta, K.L.1    Hayward, L.B.2    Segal, J.3    Van Eerdewegh, P.4
  • 95
    • 67349191538 scopus 로고    scopus 로고
    • Hedged predictions for traditional Chinese chronic gastritis diagnosis with confidence machine
    • H. Wang, C. Lin, F. Yang, and X. Hu Hedged predictions for traditional Chinese chronic gastritis diagnosis with confidence machine Computers in Biology and Medicine 39 2009 425 432
    • (2009) Computers in Biology and Medicine , vol.39 , pp. 425-432
    • Wang, H.1    Lin, C.2    Yang, F.3    Hu, X.4
  • 99
    • 50649092580 scopus 로고    scopus 로고
    • Statistical analysis and optimization of parametric delay test
    • IEEE Santa Clara, CA, USA
    • S.H. Wu, B.N. Lee, L.C. Wang, and M.S. Abadir Statistical analysis and optimization of parametric delay test IEEE International Test Conference vol. 12 2007 IEEE Santa Clara, CA, USA 613 622
    • (2007) IEEE International Test Conference , vol.12 , pp. 613-622
    • Wu, S.H.1    Lee, B.N.2    Wang, L.C.3    Abadir, M.S.4
  • 106
    • 84861323365 scopus 로고    scopus 로고
    • Traffic flow prediction using adaboost algorithm with random forests as a weak learner
    • Engineering and Technology
    • G. Leshem, Y. Ritov, Traffic flow prediction using adaboost algorithm with random forests as a weak learner, in: Proceedings of World Academy of Science, Engineering and Technology, vol. 19, Bangkok, Thailand, 2007, pp. 193198.
    • (2007) Proceedings of World Academy of Science , vol.19 , pp. 193-198
    • Leshem, G.1    Ritov, Y.2
  • 108
  • 109
    • 37349087901 scopus 로고    scopus 로고
    • Predictor output sensitivity and feature similarity-based feature selection
    • A. Verikas, M. Bacauskiene, D. Valincius, and A. Gelzinis Predictor output sensitivity and feature similarity-based feature selection Fuzzy Sets & Systems 159 2008 422 434
    • (2008) Fuzzy Sets & Systems , vol.159 , pp. 422-434
    • Verikas, A.1    Bacauskiene, M.2    Valincius, D.3    Gelzinis, A.4
  • 110
    • 0003120218 scopus 로고    scopus 로고
    • Fast training of support vector machines using sequential minimal optimization
    • J.C. Platt Fast training of support vector machines using sequential minimal optimization B. Scholkopf, C.J.C. Burges, A.J. Smola, Advances in Kernel Methods: Support Vector Learning 1998 MIT Press Cambridge, MA 185 208
    • (1998) Advances in Kernel Methods: Support Vector Learning , pp. 185-208
    • Platt, J.C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.