GDCluster: A General Decentralized Clustering Algorithm

Hoda Mashayekhi*, Jafar Habibi, Tania Khalafbeigi, Spyros Voulgaris, Maarten van Steen

*Corresponding author for this work

    Research output: Contribution to journalArticleAcademicpeer-review

    28 Citations (Scopus)
    151 Downloads (Pure)

    Abstract

    In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed data sets. Nodes continuously cooperate through decentralized gossip-based communication to maintain summarized views of the data set. We customize GDCluster for execution of the partition-based and density-based clustering methods on the summarized views, and also offer enhancements to the basic algorithm. Coping with dynamic data is made possible by gradually adapting the clustering model. Our experimental evaluations show that GDCluster can discover the clusters efficiently with scalable transmission cost, and also expose its supremacy in comparison to the popular method LSP2P.

    Original languageEnglish
    Article number7006742
    Pages (from-to)1892-1905
    Number of pages14
    JournalIEEE transactions on knowledge and data engineering
    Volume27
    Issue number7
    DOIs
    Publication statusPublished - 1 Jul 2015

    Keywords

    • Clustering
    • Distributed systems
    • Partition-based clustering
    • Density-based clustering
    • Dynamic systems
    • n/a OA procedure

    Fingerprint

    Dive into the research topics of 'GDCluster: A General Decentralized Clustering Algorithm'. Together they form a unique fingerprint.

    Cite this