k-Means has polynomial smoothed complexity

David Arthur, Bodo Manthey, Heiko Röglin

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

67 Citations (Scopus)
91 Downloads (Pure)

Abstract

The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points. In this paper, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/sigma, where sigma is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set.
Original languageUndefined
Title of host publicationProceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2009)
EditorsD.A. Spielman
Place of PublicationLos Alamitos
PublisherIEEE Computer Society
Pages405-414
Number of pages10
ISBN (Print)978-0-7695-3850-1
DOIs
Publication statusPublished - 2009
EventThe 50th Annual IEEE Symposium on Foundations of Computer Science 2009 - Renaissance Hotel Downtown Atlanta, Atlanta, United States
Duration: 24 Oct 200927 Oct 2009
Conference number: 50
https://www.cc.gatech.edu/focs2009/

Publication series

Name
PublisherIEEE Computer Society Press

Conference

ConferenceThe 50th Annual IEEE Symposium on Foundations of Computer Science 2009
Abbreviated titleFOCS 2009
CountryUnited States
CityAtlanta
Period24/10/0927/10/09
Internet address

Keywords

  • METIS-266490
  • IR-70194
  • Smoothed analyis
  • Clustering
  • Probabilistic Analysis
  • k-Means
  • k-Means clustering
  • k-Means method
  • EWI-16979

Cite this