GoSCAN: Decentralized scalable data clustering

Hoda Mashayekhi* (Corresponding Author), Jafar Habibi, Spyros Voulgaris, Maarten van Steen

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

16 Citations (Scopus)

Abstract

Identifying clusters is an important aspect of analyzing large datasets. Clustering algorithms classically require access to the complete dataset. However, as huge amounts of data are increasingly originating from multiple, dispersed sources in distributed systems, alternative solutions are required. Furthermore, data and network dynamicity in a distributed setting demand adaptable clustering solutions that offer accurate clustering models at a reasonable pace. In this paper, we propose GoScan, a fully decentralized density-based clustering algorithm which is capable of clustering dynamic and distributed datasets without requiring central control or message flooding. We identify two major tasks: finding the core data points, and forming the actual clusters, which we execute in parallel employing gossip-based communication. This approach is very efficient, as it offers each peer enough authority to discover the clusters it is interested in. Our algorithm poses no extra burden of overlay formation in the network, while providing high levels of scalability. We also offer several optimizations to the basic clustering algorithm for improving communication overhead and processing costs. Coping with dynamic data is made possible by introducing an age factor, which gradually detects data-set changes and enables clustering updates. In our experimental evaluation, we will show that GoSCAN can discover the clusters efficiently with scalable transmission cost.

Original languageEnglish
Pages (from-to)759-784
Number of pages26
JournalComputing
Volume95
Issue number9
DOIs
Publication statusPublished - 1 Sept 2013
Externally publishedYes

Keywords

  • Decentralized clustering
  • Distributed systems
  • Gossip-based dissemination

Fingerprint

Dive into the research topics of 'GoSCAN: Decentralized scalable data clustering'. Together they form a unique fingerprint.

Cite this