Abstract
The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points.
In this paper, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/sigma, where sigma is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set.
Original language | Undefined |
---|---|
Title of host publication | Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2009) |
Editors | D.A. Spielman |
Place of Publication | Los Alamitos |
Publisher | IEEE Computer Society |
Pages | 405-414 |
Number of pages | 10 |
ISBN (Print) | 978-0-7695-3850-1 |
DOIs | |
Publication status | Published - 2009 |
Event | The 50th Annual IEEE Symposium on Foundations of Computer Science 2009 - Renaissance Hotel Downtown Atlanta, Atlanta, United States Duration: 24 Oct 2009 → 27 Oct 2009 Conference number: 50 https://www.cc.gatech.edu/focs2009/ |
Publication series
Name | |
---|---|
Publisher | IEEE Computer Society Press |
Conference
Conference | The 50th Annual IEEE Symposium on Foundations of Computer Science 2009 |
---|---|
Abbreviated title | FOCS 2009 |
Country | United States |
City | Atlanta |
Period | 24/10/09 → 27/10/09 |
Internet address |
Keywords
- METIS-266490
- IR-70194
- Smoothed analyis
- Clustering
- Probabilistic Analysis
- k-Means
- k-Means clustering
- k-Means method
- EWI-16979