메뉴 건너뛰기




Volumn 27, Issue 1, 2010, Pages 3-40

Intelligent choice of the number of clusters in k-means clustering: An experimental study with different cluster spreads

Author keywords

Anomalous pattern; Gap statistic; Hartigan's rule; K Means clustering; Number of clusters

Indexed keywords


EID: 77953022339     PISSN: 01764268     EISSN: 14321343     Source Type: Journal    
DOI: 10.1007/s00357-010-9049-5     Document Type: Article
Times cited : (264)

References (57)
  • 1
    • 0027453616 scopus 로고
    • Model-based Gaussian and Non-Gaussian Clustering
    • BANFIELD, J. D., and RAFTERY, A. E. (1993), "Model-based Gaussian and Non-Gaussian Clustering", Biometrics, 49, 803-821.
    • (1993) Biometrics , vol.49 , pp. 803-821
    • Banfield, J.D.1    Raftery, A.E.2
  • 4
    • 67749150743 scopus 로고    scopus 로고
    • Clustering Methods: A History of k-Means Algorithms
    • P. Brito, P. Bertrand, G. Cucumel, and F. CarvalhoDe (Eds.), Heidelberg: Springer Verlag
    • BOCK, H.-H. (2007), "Clustering Methods: A History of k-Means Algorithms", in Selected Contributions in Data Analysis and Classification, eds. P. Brito, P. Bertrand, G. Cucumel, and F. De Carvalho, Heidelberg: Springer Verlag, pp. 161-172.
    • (2007) Selected Contributions in Data Analysis and Classification , pp. 161-172
    • Bock, H.-H.1
  • 5
    • 0000642069 scopus 로고
    • Replicating Cluster Analysis: Method, Consistency and Validity
    • BRECKENRIDGE, J. (1989), "Replicating Cluster Analysis: Method, Consistency and Validity", Multivariate Behavioral Research, 24, 147-61.
    • (1989) Multivariate Behavioral Research , vol.24 , pp. 147-161
    • Breckenridge, J.1
  • 6
  • 8
    • 33646944424 scopus 로고    scopus 로고
    • A Method of Predicting the Number of Clusters Using Rand's Statistic
    • CHAE, S. S., DUBIEN, J. L., and WARDE, W. D. (2006), "A Method of Predicting the Number of Clusters Using Rand's Statistic", Computational Statistics and Data Analysis, 50 (12), 3531-3546.
    • (2006) Computational Statistics and Data Analysis , vol.50 , Issue.12 , pp. 3531-3546
    • Chae, S.S.1    Dubien, J.L.2    Warde, W.D.3
  • 9
    • 0036011451 scopus 로고    scopus 로고
    • An Examination of Indexes for Determining the Number of Clusters in Binary Data Sets
    • DIMITRIADOU, E., DOLNICAR, S., and WEINGASSEL, A. (2002), "An Examination of Indexes for Determining the Number of Clusters in Binary Data Sets", Psychometrika, 67(1), 137-160.
    • (2002) Psychometrika , vol.67 , Issue.1 , pp. 137-160
    • Dimitriadou, E.1    Dolnicar, S.2    Weingassel, A.3
  • 11
    • 0037172724 scopus 로고    scopus 로고
    • A Prediction-Based Resampling Method for Estimating the Number of Clusters in a Dataset
    • research 0036. 1-0036. 21
    • DUDOIT, S., and FRIDLYAND, J. (2002), "A Prediction-Based Resampling Method for Estimating the Number of Clusters in a Dataset", Genome Biology, 3(7), research 0036. 1-0036. 21.
    • (2002) Genome Biology , Issue.3-7
    • Dudoit, S.1    Fridlyand, J.2
  • 13
    • 0003641269 scopus 로고    scopus 로고
    • U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.), Menlo Park, CA: AAAI Press/The MIT Press
    • FAYYAD, U. M., PIATETSKY-SHAPIRO, G., SMYTH, P., and UTHURUSAMY, R. (eds.) (1996), Advances in Knowledge Discovery and Data Mining, Menlo Park, CA: AAAI Press/The MIT Press.
    • (1996) Advances in Knowledge Discovery and Data Mining
  • 15
    • 0035998835 scopus 로고    scopus 로고
    • Model-based Clustering, Discriminant Analysis, and Density Estimation
    • FRALEY, C., and RAFTERY, A. E. (2002), "Model-based Clustering, Discriminant Analysis, and Density Estimation", Journal of the American Statistical Association, 97 (458), 611-631.
    • (2002) Journal of the American Statistical Association , vol.97 , Issue.458 , pp. 611-631
    • Fraley, C.1    Raftery, A.E.2
  • 16
    • 77953026462 scopus 로고    scopus 로고
    • GENERATION OF GAUSSIAN MIXTURE DISTRIBUTED DATA
    • GENERATION OF GAUSSIAN MIXTURE DISTRIBUTED DATA (2006), NETLAB neural network software, http://www. ncrg. aston. ac. uk/netlab.
    • (2006) NETLAB neural network software
  • 17
    • 19044364181 scopus 로고    scopus 로고
    • Optimising k-means Clustering Results with Standard Software Packages
    • HAND, D. J., and KRZANOWSKI, W. J. (2005), "Optimising k-means Clustering Results with Standard Software Packages", Computational Statistics and Data Analysis, 49, 969-973.
    • (2005) Computational Statistics and Data Analysis , vol.49 , pp. 969-973
    • Hand, D.J.1    Krzanowski, W.J.2
  • 18
    • 0034819175 scopus 로고    scopus 로고
    • J-MEANS: A New Local Search Heuristic for Minimum Sum of Squares Clustering
    • HANSEN, P., and MLADENOVIC, N. (2001), "J-MEANS: A New Local Search Heuristic for Minimum Sum of Squares Clustering", Pattern Recognition, 34, 405-413.
    • (2001) Pattern Recognition , vol.34 , pp. 405-413
    • Hansen, P.1    Mladenovic, N.2
  • 22
    • 0001518855 scopus 로고
    • A General Statistical Framework for Assessing Categorical Clustering in Free Recall
    • HUBERT, L. J., and LEVIN, J. R. (1976), "A General Statistical Framework for Assessing Categorical Clustering in Free Recall", Psychological Bulletin, 83, 1072-1080.
    • (1976) Psychological Bulletin , vol.83 , pp. 1072-1080
    • Hubert, L.J.1    Levin, J.R.2
  • 23
    • 33751272141 scopus 로고    scopus 로고
    • An Expansion of X-Means for Automatically Determining the Optimal Number of Clusters
    • Calgary AB, Canada
    • ISHIOKA, T. (2005), "An Expansion of X-Means for Automatically Determining the Optimal Number of Clusters", Proceedings of International Conference on Computational Intelligence, Calgary AB, Canada, pp. 91-96.
    • (2005) Proceedings of International Conference on Computational Intelligence , pp. 91-96
    • Ishioka, T.1
  • 26
    • 0023905024 scopus 로고
    • A Criterion for Determining the Number of Groups in a Dataset Using Sum of Squares Clustering
    • KRZANOWSKI W., and LAI Y. (1985), "A Criterion for Determining the Number of Groups in a Dataset Using Sum of Squares Clustering", Biometrics, 44, 23-34.
    • (1985) Biometrics , vol.44 , pp. 23-34
    • Krzanowski, W.1    Lai, Y.2
  • 27
    • 33947159574 scopus 로고    scopus 로고
    • Evaluation of Stability of K-Means Cluster Ensembles with Respect to Random Initialization
    • KUNCHEVA, L. I., and VETROV, D. P. (2005), "Evaluation of Stability of K-Means Cluster Ensembles with Respect to Random Initialization", IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1798-1808.
    • (2005) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.28 , Issue.11 , pp. 1798-1808
    • Kuncheva, L.I.1    Vetrov, D.P.2
  • 29
    • 0033715579 scopus 로고    scopus 로고
    • Genetic Algorithm-based Clustering Technique
    • MAULIK, U., and BANDYOPADHYAY, S. (2000), "Genetic Algorithm-based Clustering Technique", Pattern Recognition, 33, 1455-1465.
    • (2000) Pattern Recognition , vol.33 , pp. 1455-1465
    • Maulik, U.1    Bandyopadhyay, S.2
  • 30
    • 23744437299 scopus 로고    scopus 로고
    • On a Resampling Approach for Tests on the Number of Clusters with Mixture Model-Based Clustering of Tissue Samples
    • MCLACHLAN, G. J., and KHAN, N. (2004), "On a Resampling Approach for Tests on the Number of Clusters with Mixture Model-Based Clustering of Tissue Samples", Journal of Multivariate Analysis, 90, 990-1005.
    • (2004) Journal of Multivariate Analysis , vol.90 , pp. 990-1005
    • McLachlan, G.J.1    Khan, N.2
  • 33
    • 0000228352 scopus 로고
    • A Monte-Carlo Study of Thirty Internal Criterion Measures for Cluster Analysis
    • MILLIGAN, G. W. (1981), "A Monte-Carlo Study of Thirty Internal Criterion Measures for Cluster Analysis", Psychometrika, 46, 187-199.
    • (1981) Psychometrika , vol.46 , pp. 187-199
    • Milligan, G.W.1
  • 34
    • 34250115918 scopus 로고
    • An Examination of Procedures for Determining the Number of Clusters in a Data Set
    • MILLIGAN, G. W., and COOPER, M. C. (1985), "An Examination of Procedures for Determining the Number of Clusters in a Data Set", Psychometrika, 50, 159-179.
    • (1985) Psychometrika , vol.50 , pp. 159-179
    • Milligan, G.W.1    Cooper, M.C.2
  • 35
    • 0000235019 scopus 로고
    • A Study of Standardization of Variables in Cluster Analysis
    • MILLIGAN, G. W., and COOPER, M. C. (1988), "A Study of Standardization of Variables in Cluster Analysis", Journal of Classification, 5, 181-204.
    • (1988) Journal of Classification , vol.5 , pp. 181-204
    • Milligan, G.W.1    Cooper, M.C.2
  • 37
    • 0001090009 scopus 로고
    • Sequential Fitting Procedures for Linear Data Aggregation Model
    • MIRKIN, B. (1990), "Sequential Fitting Procedures for Linear Data Aggregation Model", Journal of Classification, 7, 167-195.
    • (1990) Journal of Classification , vol.7 , pp. 167-195
    • Mirkin, B.1
  • 40
    • 0038724494 scopus 로고    scopus 로고
    • Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data
    • MONTI, S., TAMAYO, P., MESIROV, J., and GOLUB, T. (2003), "Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data", Machine Learning, 52, 91-118.
    • (2003) Machine Learning , vol.52 , pp. 91-118
    • Monti, S.1    Tamayo, P.2    Mesirov, J.3    Golub, T.4
  • 41
    • 0000930268 scopus 로고
    • Hierarchical Grouping Methods and Stopping Rules: An Evaluation
    • MOJENA, R. (1977), "Hierarchical Grouping Methods and Stopping Rules: An Evaluation", The Computer Journal, 20, 359-363.
    • (1977) The Computer Journal , vol.20 , pp. 359-363
    • Mojena, R.1
  • 42
    • 0021644144 scopus 로고
    • Fitting Straight Lines to Point Patterns
    • MURTAGH, F., and RAFTERY, A. E. (1984), "Fitting Straight Lines to Point Patterns", Pattern Recognition, 17, 479-483.
    • (1984) Pattern Recognition , vol.17 , pp. 479-483
    • Murtagh, F.1    Raftery, A.E.2
  • 43
    • 0001820920 scopus 로고    scopus 로고
    • X-means: Extending K-Means with Efficient Estimation of the Number of Clusters
    • San-Francisco: Morgan Kaufmann
    • PELLEG, D., and MOORE, A. (2000), "X-means: Extending K-Means with Efficient Estimation of the Number of Clusters", Proceedings of 17th International Conference on Machine Learning, San-Francisco: Morgan Kaufmann, pp. 727-734.
    • (2000) Proceedings of 17th International Conference on Machine Learning , pp. 727-734
    • Pelleg, D.1    Moore, A.2
  • 44
    • 0033204902 scopus 로고    scopus 로고
    • An Empirical Comparison of Four Initialization Methods for K-Means Algorithm
    • PENA, J. M., LOZANO, J. A., and LARRANAGA P. (1999), "An Empirical Comparison of Four Initialization Methods for K-Means Algorithm", Pattern Recognition Letters, 20(10), 1027-1040.
    • (1999) Pattern Recognition Letters , vol.20 , Issue.10 , pp. 1027-1040
    • Pena, J.M.1    Lozano, J.A.2    Larranaga, P.3
  • 48
    • 33744726077 scopus 로고    scopus 로고
    • Standardizing Variables in K-Means Clustering
    • D. Banks, L. House, F. R. McMorris, P. Arabie, and W. Gaul (Eds.), New York: Springer
    • STEINLEY, D. (2004), "Standardizing Variables in K-Means Clustering", in Classification, Clustering, and Data Mining Applications, eds. D. Banks, L. House, F. R. McMorris, P. Arabie and W. Gaul, New York: Springer, pp. 53-60.
    • (2004) Classification, Clustering, and Data Mining Applications , pp. 53-60
    • Steinley, D.1
  • 50
    • 34250871625 scopus 로고    scopus 로고
    • Initializing K-Means Batch Clustering: A Critical Evaluation of Several Techniques
    • STEINLEY, D., and BRUSCO M. (2007), "Initializing K-Means Batch Clustering: A Critical Evaluation of Several Techniques", Journal of Classification, 24, 99-121.
    • (2007) Journal of Classification , vol.24 , pp. 99-121
    • Steinley, D.1    Brusco, M.2
  • 51
    • 33744720321 scopus 로고    scopus 로고
    • OCLUS: An Analytic Method for Generating Clusters with Known Overlap
    • STEINLEY, D., and HENSON, R. (2005), "OCLUS: An Analytic Method for Generating Clusters with Known Overlap", Journal of Classification, 22, 221-250.
    • (2005) Journal of Classification , vol.22 , pp. 221-250
    • Steinley, D.1    Henson, R.2
  • 52
    • 0242679438 scopus 로고    scopus 로고
    • Finding the Number of Clusters in a Data Set: An Information-Theoretic Approach
    • SUGAR, C. A., and JAMES, G. M. (2003), "Finding the Number of Clusters in a Data Set: An Information-Theoretic Approach", Journal of American Statistical Association, 98(463), 750-778.
    • (2003) Journal of American Statistical Association , vol.98 , Issue.463 , pp. 750-778
    • Sugar, C.A.1    James, G.M.2
  • 56
    • 26444503697 scopus 로고    scopus 로고
    • Nearest Neighbours in Least-Squares Data Imputation Algorithms with Different Missing Patterns
    • WASITO, I., and MIRKIN, B. (2006), "Nearest Neighbours in Least-Squares Data Imputation Algorithms with Different Missing Patterns", Computational Statistics & Data Analysis, 50, 926-949.
    • (2006) Computational Statistics & Data Analysis , vol.50 , pp. 926-949
    • Wasito, I.1    Mirkin, B.2
  • 57
    • 0034800371 scopus 로고    scopus 로고
    • Details of the Adjusted Rand Index and Clustering Algorithms
    • YEUNG, K. Y., and RUZZO, W. L. (2001), "Details of the Adjusted Rand Index and Clustering Algorithms", Bioinformatics, 17, 763-774.
    • (2001) Bioinformatics , vol.17 , pp. 763-774
    • Yeung, K.Y.1    Ruzzo, W.L.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.