Abstract
The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points.
In this article, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/sigma, where sigma is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set.
Original language | Undefined |
---|---|
Pages (from-to) | 19:1-19:31 |
Number of pages | 31 |
Journal | Journal of the Association for Computing Machinery |
Volume | 58 |
Issue number | 5 |
DOIs | |
Publication status | Published - Oct 2011 |
Keywords
- Smoothed Analysis
- k-Means method
- IR-78342
- Clustering
- METIS-279662
- EWI-20663