메뉴 건너뛰기




Volumn 6, Issue 1, 2014, Pages

Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation

Author keywords

Cross validation; Double cross validation; External validation; Internal validation; Prediction error; Regression

Indexed keywords


EID: 84916603462     PISSN: None     EISSN: 17582946     Source Type: Journal    
DOI: 10.1186/s13321-014-0047-1     Document Type: Article
Times cited : (125)

References (72)
  • 1
    • 0030771347 scopus 로고    scopus 로고
    • QSAR and 3D QSAR in drug design. Part 1: Methodology
    • Kubinyi H: QSAR and 3D QSAR in drug design. Part 1: methodology. Drug Discov Today 1997, 2:457-467.
    • (1997) Drug Discov Today , vol.2 , pp. 457-467
    • Kubinyi, H.1
  • 2
    • 0037841526 scopus 로고    scopus 로고
    • Cross-validation as the objective function of variable selection
    • Baumann K: Cross-validation as the objective function of variable selection. Trends Anal Chem 2003, 22:395-406.
    • (2003) Trends Anal Chem , vol.22 , pp. 395-406
    • Baumann, K.1
  • 5
    • 0002128914 scopus 로고
    • Data Analysis, Including Statistics
    • Edited by Gardner L, Eliot A. MA, USA: Springer: Addison-Wesley, Reading
    • Mosteller F, Turkey J: Data Analysis, Including Statistics. In The Handbook of Social Psychology. 2nd edition. Edited by Gardner L, Eliot A. MA, USA: Springer: Addison-Wesley, Reading; 1968:109-112.
    • (1968) The Handbook of Social Psychology. 2nd Edition , pp. 109-112
    • Mosteller, F.1    Turkey, J.2
  • 6
    • 0000629975 scopus 로고
    • Cross-validatory choice and assessment of statistical predictions
    • Stone M: Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B Methodol 1974, 36:111-147.
    • (1974) J R Stat Soc Ser B Methodol , vol.36 , pp. 111-147
    • Stone, M.1
  • 7
    • 84985627306 scopus 로고
    • On selecting variables and assessing their performance in linear discriminant analysis
    • Ganeshanandam S, Krzanowski WJ: On selecting variables and assessing their performance in linear discriminant analysis. Aust J Stat 1989, 31:433-447.
    • (1989) Aust J Stat , vol.31 , pp. 433-447
    • Ganeshanandam, S.1    Krzanowski, W.J.2
  • 8
    • 0043228681 scopus 로고    scopus 로고
    • On the use of cross-validation to assess performance in multivariate prediction
    • Jonathan P, Krzanowski WJ, McCarthy WV: On the use of cross-validation to assess performance in multivariate prediction. Stat Comput 2000, 10:209-229.
    • (2000) Stat Comput , vol.10 , pp. 209-229
    • Jonathan, P.1    Krzanowski, W.J.2    McCarthy, W.V.3
  • 9
    • 0037076322 scopus 로고    scopus 로고
    • Selection bias in gene extraction on the basis of microarray gene-expression data
    • Ambroise C, McLachlan GJ: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci U S A 2002, 99:6562-6566.
    • (2002) Proc Natl Acad Sci U S A , vol.99 , pp. 6562-6566
    • Ambroise, C.1    McLachlan, G.J.2
  • 12
    • 33644860703 scopus 로고    scopus 로고
    • Bias in error estimation when using cross-validation for model selection
    • Varma S, Simon R: Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 2006, 7:91.
    • (2006) BMC Bioinformatics , vol.7 , pp. 91
    • Varma, S.1    Simon, R.2
  • 13
    • 84874397060 scopus 로고    scopus 로고
    • Genetic variants and their interactions in disease risk prediction - Machine learning and network perspectives
    • Okser S, Pahikkala T, Aittokallio T: Genetic variants and their interactions in disease risk prediction - machine learning and network perspectives. BioData Min 2013, 6:5.
    • (2013) BioData Min , vol.6 , pp. 5
    • Okser, S.1    Pahikkala, T.2    Aittokallio, T.3
  • 15
    • 2942704287 scopus 로고    scopus 로고
    • Feature selection for descriptor based classification models. 1. Theory and GA-SEC algorithm
    • Wegner JK, Fröhlich H, Zell A: Feature selection for descriptor based classification models. 1. Theory and GA-SEC algorithm. J Chem Inf Comput Sci 2004, 44:921-930.
    • (2004) J Chem Inf Comput Sci , vol.44 , pp. 921-930
    • Wegner, J.K.1    Fröhlich, H.2    Zell, A.3
  • 16
    • 33750724182 scopus 로고    scopus 로고
    • Reducing over-optimism in variable selection by cross-model validation
    • Anderssen E, Dyrstad K, Westad F, Martens H: Reducing over-optimism in variable selection by cross-model validation. Chemom Intell Lab Syst 2006, 84:69-74.
    • (2006) Chemom Intell Lab Syst , vol.84 , pp. 69-74
    • Anderssen, E.1    Dyrstad, K.2    Westad, F.3    Martens, H.4
  • 17
    • 51749125994 scopus 로고    scopus 로고
    • Cross model validation and optimisation of bilinear regression models
    • Gidskehaug L, Anderssen E, Alsberg B: Cross model validation and optimisation of bilinear regression models. Chemom Intell Lab Syst 2008, 93:1-10.
    • (2008) Chemom Intell Lab Syst , vol.93 , pp. 1-10
    • Gidskehaug, L.1    Anderssen, E.2    Alsberg, B.3
  • 18
    • 84899084283 scopus 로고    scopus 로고
    • Cross-validation pitfalls when selecting and assessing regression and classification models
    • Krstajic D, Buturovic LJ, Leahy DE, Thomas S: Cross-validation pitfalls when selecting and assessing regression and classification models. J Cheminform 2014, 6:1-15.
    • (2014) J Cheminform , vol.6 , pp. 1-15
    • Krstajic, D.1    Buturovic, L.J.2    Leahy, D.E.3    Thomas, S.4
  • 19
    • 54249125512 scopus 로고    scopus 로고
    • Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection
    • Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Öberg T, Todeschini R, Fourches D, Varnek A: Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection. J Chem Inf Model 2008, 48:1733-1746.
    • (2008) J Chem Inf Model , vol.48 , pp. 1733-1746
    • Tetko, I.V.1    Sushko, I.2    Pandey, A.K.3    Zhu, H.4    Tropsha, A.5    Papa, E.6    Öberg, T.7    Todeschini, R.8    Fourches, D.9    Varnek, A.10
  • 20
    • 84879797130 scopus 로고    scopus 로고
    • A large-scale empirical evaluation of cross-validation and external test set validation in (Q)SAR
    • Gütlein M, Helma C, Karwath A, Kramer S: A large-scale empirical evaluation of cross-validation and external test set validation in (Q)SAR. Mol Inform 2013, 32:516-528.
    • (2013) Mol Inform , vol.32 , pp. 516-528
    • Gütlein, M.1    Helma, C.2    Karwath, A.3    Kramer, S.4
  • 21
    • 0001935527 scopus 로고    scopus 로고
    • An introduction to model selection
    • Zucchini W: An introduction to model selection. J Math Psychol 2000, 44:41-61.
    • (2000) J Math Psychol , vol.44 , pp. 41-61
    • Zucchini, W.1
  • 22
    • 33846240326 scopus 로고    scopus 로고
    • Statistical strategies for avoiding false discoveries in metabolomics and related experiments
    • Broadhurst DI, Kell DB: Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics 2006, 2:171-196.
    • (2006) Metabolomics , vol.2 , pp. 171-196
    • Broadhurst, D.I.1    Kell, D.B.2
  • 23
    • 39949083755 scopus 로고    scopus 로고
    • Cross-validation of component models: A critical look at current methods
    • Bro R, Kjeldahl K, Smilde AK, Kiers HAL: Cross-validation of component models: a critical look at current methods. Anal Bioanal Chem 2008, 390:1241-1251.
    • (2008) Anal Bioanal Chem , vol.390 , pp. 1241-1251
    • Bro, R.1    Kjeldahl, K.2    Smilde, A.K.3    Kiers, H.A.L.4
  • 24
    • 84890445089 scopus 로고    scopus 로고
    • Overfitting in making comparisons between variable selection methods
    • Reunanen J: Overfitting in making comparisons between variable selection methods. J Mach Learn Res 2003, 3:1371-1382.
    • (2003) J Mach Learn Res , vol.3 , pp. 1371-1382
    • Reunanen, J.1
  • 25
    • 1642380461 scopus 로고    scopus 로고
    • The problem of overfitting
    • Hawkins DM: The problem of overfitting. J Chem Inf Comput Sci 2004, 44:1-12.
    • (2004) J Chem Inf Comput Sci , vol.44 , pp. 1-12
    • Hawkins, D.M.1
  • 26
    • 77956907243 scopus 로고    scopus 로고
    • On over-fitting in model selection and subsequent selection bias in performance evaluation
    • Cawley GC, Talbot NLC: On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 2010, 11:2079-2107.
    • (2010) J Mach Learn Res , vol.11 , pp. 2079-2107
    • Cawley, G.C.1    Talbot, N.L.C.2
  • 27
    • 28444497469 scopus 로고    scopus 로고
    • Chance correlation in variable subset regression: Influence of the objective function, the selection mechanism, and ensemble averaging
    • Baumann K: Chance correlation in variable subset regression: Influence of the objective function, the selection mechanism, and ensemble averaging. QSAR Comb Sci 2005, 24:1033-1046.
    • (2005) QSAR Comb Sci , vol.24 , pp. 1033-1046
    • Baumann, K.1
  • 28
    • 20844448884 scopus 로고    scopus 로고
    • Validation tools for variable subset regression
    • Baumann K, Stiefl N: Validation tools for variable subset regression. J Comput Aided Mol Des 2004, 18:549-562.
    • (2004) J Comput Aided Mol des , vol.18 , pp. 549-562
    • Baumann, K.1    Stiefl, N.2
  • 30
    • 1242296549 scopus 로고    scopus 로고
    • Model selection in ecology and evolution
    • Johnson JB, Omland KS: Model selection in ecology and evolution. Trends Ecol Evol 2004, 19:101-108.
    • (2004) Trends Ecol Evol , vol.19 , pp. 101-108
    • Johnson, J.B.1    Omland, K.S.2
  • 32
    • 80053295024 scopus 로고    scopus 로고
    • Real external predictivity of QSAR models: How to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient
    • Chirico N, Gramatica P: Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. J Chem Inf Model 2011, 51:2320-2335.
    • (2011) J Chem Inf Model , vol.51 , pp. 2320-2335
    • Chirico, N.1    Gramatica, P.2
  • 33
    • 34250628103 scopus 로고    scopus 로고
    • Principles of QSAR models validation: Internal and external
    • Gramatica P: Principles of QSAR models validation: internal and external. QSAR Comb Sci 2007, 26:694-701.
    • (2007) QSAR Comb Sci , vol.26 , pp. 694-701
    • Gramatica, P.1
  • 35
    • 18344363227 scopus 로고    scopus 로고
    • The better predictive model: High q2 for the training set or low root mean square error of prediction for the test set?
    • Aptula AO, Jeliazkova NG, Schultz TW, Cronin MTD: The better predictive model: High q2 for the training set or low root mean square error of prediction for the test set? QSAR Comb Sci 2005, 24:385-396.
    • (2005) QSAR Comb Sci , vol.24 , pp. 385-396
    • Aptula, A.O.1    Jeliazkova, N.G.2    Schultz, T.W.3    Cronin, M.T.D.4
  • 36
    • 0038724207 scopus 로고    scopus 로고
    • The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models
    • Tropsha A, Gramatica P, Gombar VK: The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 2003, 22:69-77.
    • (2003) QSAR Comb Sci , vol.22 , pp. 69-77
    • Tropsha, A.1    Gramatica, P.2    Gombar, V.K.3
  • 37
    • 0033574245 scopus 로고    scopus 로고
    • Assessing the generalizability of prognostic information
    • Justice AC, Covinsky KE, Berlin JA: Assessing the generalizability of prognostic information. Ann Intern Med 1999, 130:515-524.
    • (1999) Ann Intern Med , vol.130 , pp. 515-524
    • Justice, A.C.1    Covinsky, K.E.2    Berlin, J.A.3
  • 40
    • 0032801090 scopus 로고    scopus 로고
    • Estimating the uncertainty in estimates of root mean square error of prediction: Application to determining the size of an adequate test set in multivariate calibration
    • Faber N, Klaas M: Estimating the uncertainty in estimates of root mean square error of prediction: application to determining the size of an adequate test set in multivariate calibration. Chemom Intell Lab Syst 1999, 49:79-89.
    • (1999) Chemom Intell Lab Syst , vol.49 , pp. 79-89
    • Faber, N.1    Klaas, M.2
  • 41
    • 33947227575 scopus 로고
    • Prediction error and its estimation for subset-selected models
    • Roecker EB: Prediction error and its estimation for subset-selected models. Technometrics 1991, 33:459-468.
    • (1991) Technometrics , vol.33 , pp. 459-468
    • Roecker, E.B.1
  • 42
    • 77951066659 scopus 로고    scopus 로고
    • Determinstic fallacies and model validation
    • Hawkins DM, Kraker JJ: Determinstic fallacies and model validation. J Chem Inf Model 2010, 24:188-193.
    • (2010) J Chem Inf Model , vol.24 , pp. 188-193
    • Hawkins, D.M.1    Kraker, J.J.2
  • 44
    • 52949118135 scopus 로고    scopus 로고
    • The C1C2: A framework for simultaneous model selection and assessment
    • Eklund M, Spjuth O, Wikberg JE: The C1C2: a framework for simultaneous model selection and assessment. BMC Bioinformatics 2008, 9:360-373.
    • (2008) BMC Bioinformatics , vol.9 , pp. 360-373
    • Eklund, M.1    Spjuth, O.2    Wikberg, J.E.3
  • 45
    • 0035478854 scopus 로고    scopus 로고
    • Random forests
    • Breiman L: Random forests. Mach Learn 2001, 45:5-32.
    • (2001) Mach Learn , vol.45 , pp. 5-32
    • Breiman, L.1
  • 46
    • 0036062152 scopus 로고    scopus 로고
    • A systematic evaluation of the benefits and hazards of variable selection in latent variable regression. Part I. Search algorithm, theory and simulations
    • Baumann K, Albert H, von Korff M: A systematic evaluation of the benefits and hazards of variable selection in latent variable regression. Part I. Search algorithm, theory and simulations. J Chemom 2002, 16:339-350.
    • (2002) J Chemom , vol.16 , pp. 339-350
    • Baumann, K.1    Albert, H.2    Von Korff, M.3
  • 47
    • 77956649096 scopus 로고    scopus 로고
    • A survey of cross-validation procedures for model selection
    • Arlot S, Celisse A: A survey of cross-validation procedures for model selection. Stat Surv 2010, 4:40-79.
    • (2010) Stat Surv , vol.4 , pp. 40-79
    • Arlot, S.1    Celisse, A.2
  • 48
    • 0000131403 scopus 로고    scopus 로고
    • Cross-validation methods
    • Browne M: Cross-validation methods. J Math Psychol 2000, 44:108-132.
    • (2000) J Math Psychol , vol.44 , pp. 108-132
    • Browne, M.1
  • 49
    • 21144474350 scopus 로고
    • Linear model selection by cross-validation
    • Shao J: Linear model selection by cross-validation. J Am Stat Assoc 1993, 88:486-494.
    • (1993) J Am Stat Assoc , vol.88 , pp. 486-494
    • Shao, J.1
  • 50
    • 79952797057 scopus 로고    scopus 로고
    • Conceptual complexity and the bias/variance tradeoff
    • Briscoe E, Feldman J: Conceptual complexity and the bias/variance tradeoff. Cognition 2011, 118:2-16.
    • (2011) Cognition , vol.118 , pp. 2-16
    • Briscoe, E.1    Feldman, J.2
  • 52
    • 79952205470 scopus 로고    scopus 로고
    • Predictions of hot spot residues at protein-protein interfaces using support vector machines
    • Lise S, Buchan D, Pontil M, Jones DT: Predictions of hot spot residues at protein-protein interfaces using support vector machines. PLoS ONE 2011, 6:e16774.
    • (2011) PLoS ONE , vol.6 , pp. e16774
    • Lise, S.1    Buchan, D.2    Pontil, M.3    Jones, D.T.4
  • 53
    • 48549094895 scopus 로고    scopus 로고
    • A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
    • Statnikov A, Wang L, Aliferis CF: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics 2008, 9:319.
    • (2008) BMC Bioinformatics , vol.9 , pp. 319
    • Statnikov, A.1    Wang, L.2    Aliferis, C.F.3
  • 55
    • 18744414232 scopus 로고    scopus 로고
    • Molecular decomposition of complex clinical phenotypes using biologically structured analysis of microarray data
    • Lottaz C, Spang R: Molecular decomposition of complex clinical phenotypes using biologically structured analysis of microarray data. Bioinformatics 2005, 21:1971-1978.
    • (2005) Bioinformatics , vol.21 , pp. 1971-1978
    • Lottaz, C.1    Spang, R.2
  • 57
    • 85194972808 scopus 로고    scopus 로고
    • Regression shrinkage and selection via the lasso
    • Tibshirani R: Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 1996, 58:267-288.
    • (1996) J R Stat Soc Ser B Methodol , vol.58 , pp. 267-288
    • Tibshirani, R.1
  • 59
    • 0001645890 scopus 로고    scopus 로고
    • Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology
    • Huuskonen J: Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology. J Chem Inf Comput Sci 2000, 40:773-777.
    • (2000) J Chem Inf Comput Sci , vol.40 , pp. 773-777
    • Huuskonen, J.1
  • 60
    • 79953005609 scopus 로고    scopus 로고
    • PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints
    • Yap CW: PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 2011, 32:1466-1474.
    • (2011) J Comput Chem , vol.32 , pp. 1466-1474
    • Yap, C.W.1
  • 61
    • 79961214573 scopus 로고    scopus 로고
    • High-dimensional regression and variable selection using CAR scores
    • Zuber V, Strimmer K: High-dimensional regression and variable selection using CAR scores. Stat Appl Genet Mol Biol 2010, 10:25.
    • (2010) Stat Appl Genet Mol Biol , vol.10 , pp. 25
    • Zuber, V.1    Strimmer, K.2
  • 62
    • 4043129529 scopus 로고    scopus 로고
    • Development of QSAR models to predict and interpret the biological activity of artemisinin analogues
    • Guha R, Jurs PC: Development of QSAR models to predict and interpret the biological activity of artemisinin analogues. J Chem Inf Comput Sci 2004, 44:1440-1449.
    • (2004) J Chem Inf Comput Sci , vol.44 , pp. 1440-1449
    • Guha, R.1    Jurs, P.C.2
  • 63
    • 49449098592 scopus 로고    scopus 로고
    • Mold (2), molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics
    • Hong H, Xie Q, Ge W, Qian F, Fang H, Shi L, Su Z, Perkins R, Tong W: Mold (2), molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics. J Chem Inf Model 2008, 48:1337-1344.
    • (2008) J Chem Inf Model , vol.48 , pp. 1337-1344
    • Hong, H.1    Xie, Q.2    Ge, W.3    Qian, F.4    Fang, H.5    Shi, L.6    Su, Z.7    Perkins, R.8    Tong, W.9
  • 66
    • 29244477248 scopus 로고    scopus 로고
    • The phantom menace: Omitted variable bias in econometric research
    • Clarke K: The phantom menace: omitted variable bias in econometric research. Confl Manag Peace Sci 2005, 22:341-352.
    • (2005) Confl Manag Peace Sci , vol.22 , pp. 341-352
    • Clarke, K.1
  • 67
    • 0025078552 scopus 로고
    • Calibration modeling by partial least-squares and principal component regression and its optimization using an improved leverage correction for prediction testing
    • Marbach R, Heise HM: Calibration modeling by partial least-squares and principal component regression and its optimization using an improved leverage correction for prediction testing. Chemom Intell Lab Syst 1990, 9:45-63.
    • (1990) Chemom Intell Lab Syst , vol.9 , pp. 45-63
    • Marbach, R.1    Heise, H.M.2
  • 68
    • 0031536511 scopus 로고    scopus 로고
    • Improvements on cross-validation: The.632+ bootstrap method
    • Efron B, Tibshirani R: Improvements on cross-validation: the .632+ bootstrap method. J Am Stat Assoc 1997, 92:548-560.
    • (1997) J Am Stat Assoc , vol.92 , pp. 548-560
    • Efron, B.1    Tibshirani, R.2
  • 69
    • 0000343716 scopus 로고
    • Submodel selection and evaluation in regression. The X-random case
    • Breiman L, Spector P: Submodel selection and evaluation in regression. The X-random case. Int Stat Rev 1992, 60:291-319.
    • (1992) Int Stat Rev , vol.60 , pp. 291-319
    • Breiman, L.1    Spector, P.2
  • 72


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.