## Abstract

The k-means method is a widely used clustering algorithm. One of its distinguished features is its speed in practice. Its worst-case running-time, however, is exponential, leaving a gap between practical and theoretical performance. Arthur and Vassilvitskii [3] aimed at closing this gap, and they proved a bound of poly(nk, σ−1) on the smoothed running-time of the k-means method, where n is the number of data points and σ is the standard deviation of the Gaussian perturbation. This bound, though better than the worst-case bound, is still much larger than the running-time observed in practice.

We improve the smoothed analysis of the k-means method by showing two upper bounds on the expected running-time of k-means. First, we prove that the expected running-time is bounded by a polynomial in n√k and σ−1. Second, we prove an upper bound of kkd·poly(n, σ−1), where d is the dimension of the data space. The polynomial is independent of k and d, and we obtain a polynomial bound for the expected running-time for k, d ∈ O(√logn/log logn).

Finally, we show that k-means runs in smoothed polynomial time for one-dimensional instances.

We improve the smoothed analysis of the k-means method by showing two upper bounds on the expected running-time of k-means. First, we prove that the expected running-time is bounded by a polynomial in n√k and σ−1. Second, we prove an upper bound of kkd·poly(n, σ−1), where d is the dimension of the data space. The polynomial is independent of k and d, and we obtain a polynomial bound for the expected running-time for k, d ∈ O(√logn/log logn).

Finally, we show that k-means runs in smoothed polynomial time for one-dimensional instances.

Original language | English |
---|---|

Title of host publication | Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009 |

Editors | C. Mathieu |

Place of Publication | Philadelphia |

Publisher | SIAM |

Pages | 461-470 |

Number of pages | 10 |

Publication status | Published - 2009 |

Event | 20th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009 - New York, United States Duration: 4 Jan 2009 → 6 Jan 2009 Conference number: 20 |

### Conference

Conference | 20th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009 |
---|---|

Abbreviated title | SODA |

Country/Territory | United States |

City | New York |

Period | 4/01/09 → 6/01/09 |

## Keywords

- $k$-means method
- IR-68852
- Smoothed Analysis
- Clustering
- EWI-16976
- METIS-264227