메뉴 건너뛰기




Volumn 8, Issue , 2007, Pages

Bias in random forest variable importance measures: Illustrations, sources and a solution

Author keywords

[No Author keywords available]

Indexed keywords

BOOTSTRAP SAMPLINGS; CLASSIFICATION TASKS; CLASSIFICATION TREES; COMPUTATIONAL BIOLOGY; CONTINUOUS VARIABLES; RANDOM FOREST ALGORITHM; STATISTICAL COMPUTING; VARIABLE IMPORTANCES;

EID: 33847096395     PISSN: None     EISSN: 14712105     Source Type: Journal    
DOI: 10.1186/1471-2105-8-25     Document Type: Article
Times cited : (2613)

References (38)
  • 2
    • 33745079199 scopus 로고    scopus 로고
    • The Challenge for Genetic Epidemiologists: How to Analyze Large Numbers of SNPs in Relation to Complex Diseases
    • 1479365 16630340 10.1186/1471-2156-7-23
    • Heidema AG Boer JMA Nagelkerke N Mariman ECM van der A DL Feskens EJM The Challenge for Genetic Epidemiologists: How to Analyze Large Numbers of SNPs in Relation to Complex Diseases BMC Genetics 2006 7 23 1479365 16630340 10.1186/1471-2156-7-23
    • (2006) BMC Genetics , vol.7 , pp. 23
    • Heidema, A.G.1    Boer, J.M.A.2    Nagelkerke, N.3    Mariman, E.C.M.4    van der A, D.L.5    Feskens, E.J.M.6
  • 3
    • 0035478854 scopus 로고    scopus 로고
    • Random Forests
    • 10.1023/A:1010933404324
    • Breiman L Random Forests Machine Learning 2001 45 5 32 10.1023/ A:1010933404324
    • (2001) Machine Learning , vol.45 , pp. 5-32
    • Breiman, L.1
  • 4
    • 30644464444 scopus 로고    scopus 로고
    • Gene Selection and Classification of Microarray Data Using Random Forest
    • 1363357 16398926 10.1186/1471-2105-7-3
    • Díaz-Uriarte R Alvarez de Andrés S Gene Selection and Classification of Microarray Data Using Random Forest BMC Bioinformatics 2006 7 3 1363357 16398926 10.1186/1471-2105-7-3
    • (2006) BMC Bioinformatics , vol.7 , pp. 3
    • Díaz-Uriarte, R.1    Alvarez de Andrés, S.2
  • 5
    • 25444453244 scopus 로고    scopus 로고
    • Screening Large-Scale Association Study Data: Exploiting Interactions Using Random Forests
    • 545646 15588316 10.1186/1471-2156-5-32
    • Lunetta KL Hayward LB Segal J Eerdewegh PV Screening Large-Scale Association Study Data: Exploiting Interactions Using Random Forests BMC Genetics 2004 5 32 545646 15588316 10.1186/1471-2156-5-32
    • (2004) BMC Genetics , vol.5 , pp. 32
    • Lunetta, K.L.1    Hayward, L.B.2    Segal, J.3    Eerdewegh, P.V.4
  • 6
    • 0041421151 scopus 로고    scopus 로고
    • Prediction of Clinical Drug Efficacy by Classification of Drug-induced Genomic Expression Profiles in vitro
    • 10.1073/pnas.1632587100
    • Gunther EC Stone DJ Gerwien RW Bento P Heyes MP Prediction of Clinical Drug Efficacy by Classification of Drug-induced Genomic Expression Profiles in vitro Proceedings of the National Academy of Sciences 2003 100 9608 9613 10.1073/pnas.1632587100
    • (2003) Proceedings of the National Academy of Sciences , vol.100 , pp. 9608-9613
    • Gunther, E.C.1    Stone, D.J.2    Gerwien, R.W.3    Bento, P.4    Heyes, M.P.5
  • 7
    • 25444467981 scopus 로고    scopus 로고
    • A Comparative Study of Discriminating Human Heart Failure Etiology Using Gene Expression Profiles
    • 1224853 16120216 10.1186/1471-2105-6-205
    • Huang X Pan W Grindle S Han X Chen Y Park SJ Miller LW Hall J A Comparative Study of Discriminating Human Heart Failure Etiology Using Gene Expression Profiles BMC Bioinformatics 2005 6 205 1224853 16120216 10.1186/1471-2105-6-205
    • (2005) BMC Bioinformatics , vol.6 , pp. 205
    • Huang, X.1    Pan, W.2    Grindle, S.3    Han, X.4    Chen, Y.5    Park, S.J.6    Miller, L.W.7    Hall, J.8
  • 8
    • 16444381830 scopus 로고    scopus 로고
    • Tumor Classification by Tissue Microarray Profiling: Random Forest Clustering Applied to Renal Cell Carcinoma
    • 10.1038/modpathol.3800322 15529185
    • Shih Y Tumor Classification by Tissue Microarray Profiling: Random Forest Clustering Applied to Renal Cell Carcinoma Modern Pathology 2005 18 547 557 10.1038/modpathol.3800322 15529185
    • (2005) Modern Pathology , vol.18 , pp. 547-557
    • Shih, Y.1
  • 10
    • 13244262710 scopus 로고    scopus 로고
    • Few Amino Acid Positions in rpoB are Associated with Most of the Rifampin Resistance in Mycobacterium Tuberculosis
    • 524371 15453919 10.1186/1471-2105-5-137
    • Cummings MP Segal MR Few Amino Acid Positions in rpoB are Associated with Most of the Rifampin Resistance in Mycobacterium Tuberculosis BMC Bioinformatics 2004 5 137 524371 15453919 10.1186/1471-2105-5-137
    • (2004) BMC Bioinformatics , vol.5 , pp. 137
    • Cummings, M.P.1    Segal, M.R.2
  • 11
    • 13244255317 scopus 로고    scopus 로고
    • Simple Statistical Models Predict C-to-U Edited Sites in Plant Mitochondrial RNA
    • 521485 15373947 10.1186/1471-2105-5-132
    • Cummings MP Myers DS Simple Statistical Models Predict C-to-U Edited Sites in Plant Mitochondrial RNA BMC Bioinformatics 2004 5 132 521485 15373947 10.1186/1471-2105-5-132
    • (2004) BMC Bioinformatics , vol.5 , pp. 132
    • Cummings, M.P.1    Myers, D.S.2
  • 12
    • 33646018046 scopus 로고    scopus 로고
    • Evaluation of Different Biological Data and Computational Classification Methods for Use in Protein Interaction Prediction
    • 10.1002/prot.20865 16450363
    • Qi Y Bar-Joseph Z Klein-Seetharaman J Evaluation of Different Biological Data and Computational Classification Methods for Use in Protein Interaction Prediction Proteins 2006 63 490 500 10.1002/prot.20865 16450363
    • (2006) Proteins , vol.63 , pp. 490-500
    • Qi, Y.1    Bar-Joseph, Z.2    Klein-Seetharaman, J.3
  • 13
    • 10044227497 scopus 로고    scopus 로고
    • Development of Linear, Ensemble, and Nonlinear Models for the Prediction and Interpretation of the Biological Activity of a Set of PDGFR Inhibitors
    • 10.1021/ci049849f
    • Guha R Jurs PC Development of Linear, Ensemble, and Nonlinear Models for the Prediction and Interpretation of the Biological Activity of a Set of PDGFR Inhibitors Journal of Chemical Information and Computer Sciences 2003 44 2179-2189 10.1021/ci049849f
    • (2003) Journal of Chemical Information and Computer Sciences , vol.44 , pp. 2179-2189
    • Guha, R.1    Jurs, P.C.2
  • 17
    • 32544435787 scopus 로고    scopus 로고
    • Short-Term Prediction of Mortality in Patients with Systemic Lupus Erythematosus: Classification of Outcomes Using Random Forests
    • 10.1002/art.21695
    • Ward MM Pajevic S Dreyfuss J Malley JD Short-Term Prediction of Mortality in Patients with Systemic Lupus Erythematosus: Classification of Outcomes Using Random Forests Arthritis and Rheumatism 2006 55 74-80 10.1002/art.21695
    • (2006) Arthritis and Rheumatism , vol.55 , pp. 74-80
    • Ward, M.M.1    Pajevic, S.2    Dreyfuss, J.3    Malley, J.D.4
  • 19
    • 0035470889 scopus 로고    scopus 로고
    • Greedy Function Approximation: A Gradient Boosting Machine
    • 10.1214/aos/1013203451
    • Friedman J Greedy Function Approximation: A Gradient Boosting Machine The Annals of Statistics 2001 29 1189 1232 10.1214/aos/1013203451
    • (2001) The Annals of Statistics , vol.29 , pp. 1189-1232
    • Friedman, J.1
  • 20
    • 33748324384 scopus 로고    scopus 로고
    • R Foundation for Statistical Computing, Vienna, Austria
    • R Development Core Team R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing, Vienna, Austria 2006 http://www.R-project.org/
    • (2006) R: A Language and Environment for Statistical Computing
  • 22
    • 0345040873 scopus 로고    scopus 로고
    • Classification and Regression by randomForest
    • Liaw A Wiener M Classification and Regression by randomForest R News 2002 2 18-22 http://CRAN.R-project.org/doc/Rnews/
    • (2002) R News , vol.2 , pp. 18-22
    • Liaw, A.1    Wiener, M.2
  • 25
    • 1542573450 scopus 로고    scopus 로고
    • Classification Trees with Unbiased Multiway Splits
    • 10.1198/016214501753168271
    • Kim H Loh W Classification Trees with Unbiased Multiway Splits Journal of the American Statistical Association 2001 96 589-604 10.1198/ 016214501753168271
    • (2001) Journal of the American Statistical Association , vol.96 , pp. 589-604
    • Kim, H.1    Loh, W.2
  • 26
    • 33745662195 scopus 로고    scopus 로고
    • Maximally Selected Chi-square Statistics for Ordinal Variables
    • 10.1002/bimj.200510161 16845908
    • Boulesteix AL Maximally Selected Chi-square Statistics for Ordinal Variables Biometrical Journal 2006 48 451-462 10.1002/bimj.200510161 16845908
    • (2006) Biometrical Journal , vol.48 , pp. 451-462
    • Boulesteix, A.L.1
  • 27
    • 33750533760 scopus 로고    scopus 로고
    • Maximally Selected Chi-square Statistics and Binary Splits of Nominal Variables
    • 10.1002/bimj.200510191 17094347
    • Boulesteix AL Maximally Selected Chi-square Statistics and Binary Splits of Nominal Variables Biometrical Journal 2006 48 838-848 10.1002/ bimj.200510191 17094347
    • (2006) Biometrical Journal , vol.48 , pp. 838-848
    • Boulesteix, A.L.1
  • 30
    • 0013281807 scopus 로고    scopus 로고
    • On Bagging and Nonlinear Estimation
    • preprint
    • Friedman J Hall P On Bagging and Nonlinear Estimation preprint 1999 http://www-stat.stanford.edu/~jhf/
    • (1999)
    • Friedman, J.1    Hall, P.2
  • 31
    • 0043289776 scopus 로고    scopus 로고
    • Analyzing Bagging
    • 10.1214/aos/1031689014
    • Bühlmann P Yu B Analyzing Bagging The Annals of Statistics 2002 30 927-961 10.1214/aos/1031689014
    • (2002) The Annals of Statistics , vol.30 , pp. 927-961
    • Bühlmann, P.1    Yu, B.2
  • 34
    • 84874666667 scopus 로고    scopus 로고
    • Statistical Sources of Variable Selection Bias in Classification Tree Algorithms Based on the Gini Index
    • Discussion Paper 420 Munich, Germany
    • Strobl C Statistical Sources of Variable Selection Bias in Classification Tree Algorithms Based on the Gini Index Discussion Paper 420, SFB "Statistical Analysis of Discrete Structures", Munich, Germany 2005 http://www.stat.uni-muenchen.de/sfb386/papers/dsp/ paper420.ps
    • (2005) SFB "Statistical Analysis of Discrete Structures"
    • Strobl, C.1
  • 35
    • 85069680169 scopus 로고    scopus 로고
    • Variable Selection in Classification Trees Based on Imprecise Probabilities
    • Cozman F, Nau R, Seidenfeld T Carnegy Mellon University, Pittsburgh, PA, USA
    • Strobl C Variable Selection in Classification Trees Based on Imprecise Probabilities Proceedings of the Fourth International Symposium on Imprecise Probabilities and Their Applications, Carnegy Mellon University, Pittsburgh, PA, USA Cozman F, Nau R, Seidenfeld T 2005 340-348
    • (2005) Proceedings of the Fourth International Symposium on Imprecise Probabilities and Their Applications , pp. 340-348
    • Strobl, C.1
  • 37
    • 0038563898 scopus 로고    scopus 로고
    • The Bootstrap in Hypothesis Testing
    • de Gunst M, Klaassen C, van der Vaart A IMS Lecture Notes Monograph Series, Beachwood, OH, USA
    • Bickel PJ Ren JJ The Bootstrap in Hypothesis Testing State of the Art in Probability and Statistics, Festschrift for Willem R. van Zwet, IMS Lecture Notes Monograph Series, Beachwood, OH, USA de Gunst M, Klaassen C, van der Vaart A 2001 36 91-112
    • (2001) State of the Art in Probability and Statistics, Festschrift for Willem R. Van Zwet , vol.36 , pp. 91-112
    • Bickel, P.J.1    Ren, J.J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.