Abstract
The k-means method is a widely used clustering algorithm. One of its distinguished features is its speed in practice. Its worst-case running-time, however, is exponential, leaving a gap between practical and theoretical performance. Arthur and Vassilvitskii [3] aimed at closing this gap, and they proved a bound of poly(nk, σ−1) on the smoothed running-time of the k-means method, where n is the number of data points and σ is the standard deviation of the Gaussian perturbation. This bound, though better than the worst-case bound, is still much larger than the running-time observed in practice.
We improve the smoothed analysis of the k-means method by showing two upper bounds on the expected running-time of k-means. First, we prove that the expected running-time is bounded by a polynomial in n√k and σ−1. Second, we prove an upper bound of kkd·poly(n, σ−1), where d is the dimension of the data space. The polynomial is independent of k and d, and we obtain a polynomial bound for the expected running-time for k, d ∈ O(√logn/log logn).
Finally, we show that k-means runs in smoothed polynomial time for one-dimensional instances.
We improve the smoothed analysis of the k-means method by showing two upper bounds on the expected running-time of k-means. First, we prove that the expected running-time is bounded by a polynomial in n√k and σ−1. Second, we prove an upper bound of kkd·poly(n, σ−1), where d is the dimension of the data space. The polynomial is independent of k and d, and we obtain a polynomial bound for the expected running-time for k, d ∈ O(√logn/log logn).
Finally, we show that k-means runs in smoothed polynomial time for one-dimensional instances.
Original language | English |
---|---|
Title of host publication | Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009 |
Editors | C. Mathieu |
Place of Publication | Philadelphia |
Publisher | SIAM |
Pages | 461-470 |
Number of pages | 10 |
Publication status | Published - 2009 |
Event | 20th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009 - New York, United States Duration: 4 Jan 2009 → 6 Jan 2009 Conference number: 20 |
Conference
Conference | 20th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009 |
---|---|
Abbreviated title | SODA |
Country/Territory | United States |
City | New York |
Period | 4/01/09 → 6/01/09 |
Keywords
- $k$-means method
- IR-68852
- Smoothed Analysis
- Clustering
- EWI-16976
- METIS-264227