메뉴 건너뛰기




Volumn 25, Issue 2, 2013, Pages 189-206

Significance tests or confidence intervals: Which are preferable for the comparison of classifiers?

Author keywords

classification; confidence interval; null hypothesis significance testing; p value; reasoning

Indexed keywords

CLASSIFICATION PERFORMANCE; CLASSIFICATION RESULTS; COMPARISON OF CLASSIFIERS; CONFIDENCE INTERVAL; P-VALUES; REASONING; SIGNIFICANCE TESTING; STATISTICAL EVALUATION;

EID: 84877652134     PISSN: 0952813X     EISSN: 13623079     Source Type: Journal    
DOI: 10.1080/0952813X.2012.680252     Document Type: Article
Times cited : (28)

References (46)
  • 1
    • 84936527430 scopus 로고
    • Statistical analysis and the illusion of objectivity
    • Berger, JO, and, Berry, DA. 1988. Statistical Analysis and the Illusion of Objectivity. American Scientist, 76: 159-165.
    • (1988) American Scientist , vol.76 , pp. 159-165
    • Berger, J.O.1    Berry, D.A.2
  • 2
    • 33646856024 scopus 로고    scopus 로고
    • Avoiding model selection bias in small-sample genomic data sets
    • Berrar, D, Bradbury, I, and, Dubitzky, W. 2006. Avoiding Model Selection Bias in Small-sample Genomic Data Sets. Bioinformatics, 22: 1245-1250.
    • (2006) Bioinformatics , vol.22 , pp. 1245-1250
    • Berrar, D.1    Bradbury, I.2    Dubitzky, W.3
  • 3
    • 7444237797 scopus 로고    scopus 로고
    • Evaluating the replicability of significance tests for comparing learning algorithms
    • Bouckaert, RR, and, Frank, E. 2004. Evaluating the ReplicabilIty of Significance Tests for Comparing Learning Algorithms. Advances in Knowledge Discovery and Data Mining, 3056: 3-12.
    • (2004) Advances in Knowledge Discovery and Data Mining , vol.3056 , pp. 3-12
    • Bouckaert, R.R.1    Frank, E.2
  • 4
    • 0035478854 scopus 로고    scopus 로고
    • Random forests
    • Breiman, L. 2001. Random Forests. Machine Learning, 45: 5-32.
    • (2001) Machine Learning , vol.45 , pp. 5-32
    • Breiman, L.1
  • 5
    • 77956907243 scopus 로고    scopus 로고
    • On over-fitting in model selection and subsequent selection bias in performance evaluation
    • Cawley, GC, and, Talbot, NLC. 2010. On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation. Journal of Machine Learning Research, 11: 2079-2107.
    • (2010) Journal of Machine Learning Research , vol.11 , pp. 2079-2107
    • Cawley, G.C.1    Talbot, N.L.C.2
  • 7
    • 29644438050 scopus 로고    scopus 로고
    • Statistical comparisons of classifiers over multiple data sets
    • Demšar, J. 2006. Statistical Comparisons of Classifiers Over Multiple Data Sets. Journal of Machine Learning Research, 7: 1-30.
    • (2006) Journal of Machine Learning Research , vol.7 , pp. 1-30
    • Demšar, J.1
  • 9
    • 3042555481 scopus 로고    scopus 로고
    • Alternatives to null hypothesis significance testing
    • Accessed 19 April 2012
    • Denis, D. (2003), Alternatives to Null Hypothesis Significance Testing, Theory & Science, 4(1). Available online at http://theoryandscience.icaap. org/content/vol4.1/02-denis.html. Accessed 19 April 2012
    • (2003) Theory & Science , vol.4 , Issue.1
    • Denis, D.1
  • 10
    • 0000259511 scopus 로고    scopus 로고
    • Approximate statistical tests for comparing supervised classification learning algorithms
    • Dietterich, TG. 1998. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10: 31-36.
    • (1998) Neural Computation , vol.10 , pp. 31-36
    • Dietterich, T.G.1
  • 11
    • 0000854691 scopus 로고    scopus 로고
    • Why scientists value p values
    • Dixon, P. 1998. Why Scientists Value p Values. Psychonomic Bulletin & Review, 5: 390-396.
    • (1998) Psychonomic Bulletin & Review , vol.5 , pp. 390-396
    • Dixon, P.1
  • 15
    • 77952424025 scopus 로고    scopus 로고
    • Pointwise exact bootstrap distributions of roc curves
    • Dugas, C, and, Gadoury, D. 2010. Pointwise Exact Bootstrap Distributions of ROC Curves. Machine Learning, 78: 103-136.
    • (2010) Machine Learning , vol.78 , pp. 103-136
    • Dugas, C.1    Gadoury, D.2
  • 16
    • 67949087971 scopus 로고    scopus 로고
    • The null hypothesis significance testing debate and its implications for personality research
    • Edited by: Robins, RW, Fraley, RC and Krueger, RF. New York: Guilford
    • Fraley, RC, and, Marks, MJ. 2007. " The Null Hypothesis Significance Testing Debate and its Implications for Personality Research ". In Handbook of Research Methods in Personality Psychology, Edited by: Robins, RW, Fraley, RC and Krueger, RF. 149-169. New York: Guilford.
    • (2007) Handbook of Research Methods in Personality Psychology , pp. 149-169
    • Fraley, R.C.1    Marks, M.J.2
  • 18
    • 58149287952 scopus 로고    scopus 로고
    • An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons
    • Garcia, S, and, Herrera, F. 2008. An Extension on Statistical Comparisons of Classifiers Over Multiple Data Sets for All Pairwise Comparisons. Journal of Machine Learning Research, 9: 2677-2694.
    • (2008) Journal of Machine Learning Research , vol.9 , pp. 2677-2694
    • Garcia, S.1    Herrera, F.2
  • 19
    • 0027471290 scopus 로고
    • P values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate
    • Goodman, S. 1993. P Values, Hypothesis Tests, and Likelihood: Implications for Epidemiology of a Neglected Historical Debate. American Journal of Epidemiology, 137: 485-496.
    • (1993) American Journal of Epidemiology , vol.137 , pp. 485-496
    • Goodman, S.1
  • 20
    • 0033564491 scopus 로고    scopus 로고
    • Toward evidence-based medical statistics. 1: The p value fallacy
    • Goodman, S. 1999. Toward Evidence-based Medical Statistics. 1: The p Value Fallacy. Annals of Internal Medicine, 130: 995-1004.
    • (1999) Annals of Internal Medicine , vol.130 , pp. 995-1004
    • Goodman, S.1
  • 21
    • 45749100408 scopus 로고    scopus 로고
    • A dirty dozen: Twelve p-value misconceptions
    • Goodman, S. 2008. A Dirty Dozen: Twelve p-Value Misconceptions. Seminars in Hematology, 45: 135-140.
    • (2008) Seminars in Hematology , vol.45 , pp. 135-140
    • Goodman, S.1
  • 22
    • 33745886270 scopus 로고    scopus 로고
    • Classifier technology and the illusion of progres
    • Hand, D. 2006. Classifier Technology and the Illusion of Progres. Statistical Science, 21: 1-14.
    • (2006) Statistical Science , vol.21 , pp. 1-14
    • Hand, D.1
  • 24
    • 18244400004 scopus 로고
    • On the application of three modified bonferroni procedures to pairwise multiple comparisons in balanced repeated measures designs
    • Holland, B. 1991. On the Application of Three Modified Bonferroni Procedures to Pairwise Multiple Comparisons in Balanced Repeated Measures Designs. Computational Statistics Quarterly, 6: 219-231.
    • (1991) Computational Statistics Quarterly , vol.6 , pp. 219-231
    • Holland, B.1
  • 25
    • 0002294347 scopus 로고
    • A simple sequentially rejective multiple test procedure
    • Holm, S. 1979. A Simple Sequentially Rejective Multiple Test Procedure. Scandinavian Journal of Statistics, 6: 65-70.
    • (1979) Scandinavian Journal of Statistics , vol.6 , pp. 65-70
    • Holm, S.1
  • 26
    • 0031584064 scopus 로고    scopus 로고
    • Uniform requirements for manuscripts submitted to biomedical journals
    • International Committee of Medical Journal Editors
    • International Committee of Medical Journal Editors (1997), Uniform Requirements for Manuscripts Submitted to Biomedical Journals, New England Journal of Medicine, 336, 309-315
    • (1997) New England Journal of Medicine , vol.336 , pp. 309-315
  • 27
    • 0032808670 scopus 로고    scopus 로고
    • The insignificance of statistical significance testing
    • Johnson, DH. 1999. The Insignificance of Statistical Significance Testing. Journal of Wildlife Management, 63: 763-772.
    • (1999) Journal of Wildlife Management , vol.63 , pp. 763-772
    • Johnson, D.H.1
  • 29
    • 79959753092 scopus 로고    scopus 로고
    • Machine learning as an experimental science
    • Langley, P. 2011. Machine Learning As an Experimental Science. Machine Learning, 82: 275-279.
    • (2011) Machine Learning , vol.82 , pp. 275-279
    • Langley, P.1
  • 30
    • 84877670795 scopus 로고    scopus 로고
    • Exhaustive conditional inference: Improving the evidential value of a statistical test by identifying the most relevant p-value and error probabilities
    • Australia: University of Melbourne
    • Leslie, C. 2008. " Exhaustive Conditional Inference: Improving the Evidential Value of a Statistical Test by Identifying the Most Relevant p-value and Error Probabilities ". In PhD Thesis, Australia: University of Melbourne.
    • (2008) PhD Thesis
    • Leslie, C.1
  • 31
    • 22644450970 scopus 로고    scopus 로고
    • Further reflections on hypothesis testing and editorial policy for primary research journals
    • Levin, JR, and, Robinson, DH. 1999. Further Reflections on Hypothesis Testing and Editorial Policy for Primary Research Journals. Educational Psychology Review, 11 (2): 143-155.
    • (1999) Educational Psychology Review , vol.11 , Issue.2 , pp. 143-155
    • Levin, J.R.1    Robinson, D.H.2
  • 32
    • 3042562270 scopus 로고    scopus 로고
    • Genomics, prior probability, and statistical tests of multiple hypotheses
    • Manly, KF, Nettleton, D, and, Hwang, JT. 2004. Genomics, Prior Probability, and Statistical Tests of Multiple Hypotheses. Genome Research, 14: 997-1001.
    • (2004) Genome Research , vol.14 , pp. 997-1001
    • Manly, K.F.1    Nettleton, D.2    Hwang, J.T.3
  • 33
    • 0030765252 scopus 로고    scopus 로고
    • Confidence intervals for differences in correlated binary proportions
    • May, WL, and, Johnson, WD. 1997. Confidence Intervals for Differences in Correlated Binary Proportions. Statistics in Medicine, 16: 2127-2136.
    • (1997) Statistics in Medicine , vol.16 , pp. 2127-2136
    • May, W.L.1    Johnson, W.D.2
  • 34
    • 0000596361 scopus 로고
    • Note on the sampling error of the difference between correlated proportions or percentages
    • McNemar, Q. 1947. Note on the Sampling Error of the Difference Between Correlated Proportions or Percentages. Psychometrika, 12: 153-157.
    • (1947) Psychometrika , vol.12 , pp. 153-157
    • McNemar, Q.1
  • 35
    • 0004786871 scopus 로고    scopus 로고
    • There is a time and a place for significance testing
    • Edited by: Harlow, LL, Mulaik, SA and Steiger, JH. New Jersey (USA): Lawrence Erlbaum Associates
    • Mulaik, SA, Raju, NS, and, Harshman, RA. 2007. " There Is a Time and a Place for Significance Testing ". In What If There Were No Significance Tests?, Edited by: Harlow, LL, Mulaik, SA and Steiger, JH. 65-115. New Jersey (USA): Lawrence Erlbaum Associates.
    • (2007) What if There Were No Significance Tests? , pp. 65-115
    • Mulaik, S.A.1    Raju, N.S.2    Harshman, R.A.3
  • 36
    • 0042847140 scopus 로고    scopus 로고
    • Inference for the generalization error
    • Nadeau, C, and, Bengio, Y. 2003. Inference for the Generalization Error. Machine Learning, 52: 239-281.
    • (2003) Machine Learning , vol.52 , pp. 239-281
    • Nadeau, C.1    Bengio, Y.2
  • 37
    • 0002711488 scopus 로고    scopus 로고
    • The data analysis dilemma: Ban or abandon. A review of null hypothesis significance testing
    • Nix, TW, and, Barnette, JJ. 1998. The Data Analysis Dilemma: Ban or Abandon. A Review of Null Hypothesis Significance Testing. Research in the Schools, 5: 3-14.
    • (1998) Research in the Schools , vol.5 , pp. 3-14
    • Nix, T.W.1    Barnette, J.J.2
  • 38
    • 77954676863 scopus 로고    scopus 로고
    • Permutation tests for studying classifier performance
    • Ojala, M, and, Garriga, GC. 2010. Permutation Tests for Studying Classifier Performance. Journal of Machine Learning Research, 11: 1833-1863.
    • (2010) Journal of Machine Learning Research , vol.11 , pp. 1833-1863
    • Ojala, M.1    Garriga, G.C.2
  • 39
    • 0035063893 scopus 로고    scopus 로고
    • Low p-values or narrow confidence intervals: Which are more durable?
    • Poole, C. 2001. Low p-values or Narrow Confidence Intervals: Which Are More Durable?. Epidemiology, 12: 291-294.
    • (2001) Epidemiology , vol.12 , pp. 291-294
    • Poole, C.1
  • 40
    • 0001638507 scopus 로고
    • Large sample simultaneous confidence intervals for multinomial proportions
    • Quesenberry, CP, and, Hurst, DC. 1964. Large Sample Simultaneous Confidence Intervals for Multinomial Proportions. Technometrics, 6: 191-195.
    • (1964) Technometrics , vol.6 , pp. 191-195
    • Quesenberry, C.P.1    Hurst, D.C.2
  • 41
    • 70149113077 scopus 로고    scopus 로고
    • Team, R Development Core Vienna, Austria: R Foundation for Statistical Computing
    • Team, R Development Core. (2009), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing
    • (2009) R: A Language and Environment for Statistical Computing
  • 42
    • 85042555447 scopus 로고
    • On the necessity of bayesian inference and the construction of measures of nearness to bayesian form
    • Robinson, GK. 1978. On the Necessity of Bayesian Inference and the Construction of Measures of Nearness to Bayesian Form. Biometrika, 65: 49-52.
    • (1978) Biometrika , vol.65 , pp. 49-52
    • Robinson, G.K.1
  • 44
    • 0000724985 scopus 로고    scopus 로고
    • Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers
    • Schmidt, FL. 1996. Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers. Psychological Methods, 1: 115-129.
    • (1996) Psychological Methods , vol.1 , pp. 115-129
    • Schmidt, F.L.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.