GDCluster: A General Decentralized Clustering Algorithm

Hoda Mashayekhi, Jafar Habibi, Tania Khalafbeigi, Spyros Voulgaris, Martinus Richardus van Steen

    Research output: Contribution to journalArticleAcademicpeer-review

    15 Citations (Scopus)
    58 Downloads (Pure)

    Abstract

    In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed data sets. Nodes continuously cooperate through decentralized gossip-based communication to maintain summarized views of the data set. We customize GDCluster for execution of the partition-based and density-based clustering methods on the summarized views, and also offer enhancements to the basic algorithm. Coping with dynamic data is made possible by gradually adapting the clustering model. Our experimental evaluations show that GDCluster can discover the clusters efficiently with scalable transmission cost, and also expose its supremacy in comparison to the popular method LSP2P.
    Original languageUndefined
    Pages (from-to)1892-1905
    Number of pages14
    JournalIEEE transactions on knowledge and data engineering
    Volume27
    Issue number7
    DOIs
    Publication statusPublished - Jul 2015

    Keywords

    • EWI-26879
    • Clustering
    • Distributed systems
    • IR-100081
    • partition-based clustering
    • density-based clustering
    • METIS-316851
    • dynamic system

    Cite this

    Mashayekhi, Hoda ; Habibi, Jafar ; Khalafbeigi, Tania ; Voulgaris, Spyros ; van Steen, Martinus Richardus. / GDCluster: A General Decentralized Clustering Algorithm. In: IEEE transactions on knowledge and data engineering. 2015 ; Vol. 27, No. 7. pp. 1892-1905.
    @article{61ae904be083446cb42a051d9830b6df,
    title = "GDCluster: A General Decentralized Clustering Algorithm",
    abstract = "In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed data sets. Nodes continuously cooperate through decentralized gossip-based communication to maintain summarized views of the data set. We customize GDCluster for execution of the partition-based and density-based clustering methods on the summarized views, and also offer enhancements to the basic algorithm. Coping with dynamic data is made possible by gradually adapting the clustering model. Our experimental evaluations show that GDCluster can discover the clusters efficiently with scalable transmission cost, and also expose its supremacy in comparison to the popular method LSP2P.",
    keywords = "EWI-26879, Clustering, Distributed systems, IR-100081, partition-based clustering, density-based clustering, METIS-316851, dynamic system",
    author = "Hoda Mashayekhi and Jafar Habibi and Tania Khalafbeigi and Spyros Voulgaris and {van Steen}, {Martinus Richardus}",
    note = "10.1109/TKDE.2015.2391123",
    year = "2015",
    month = "7",
    doi = "10.1109/TKDE.2015.2391123",
    language = "Undefined",
    volume = "27",
    pages = "1892--1905",
    journal = "IEEE transactions on knowledge and data engineering",
    issn = "1041-4347",
    publisher = "IEEE Computer Society",
    number = "7",

    }

    GDCluster: A General Decentralized Clustering Algorithm. / Mashayekhi, Hoda; Habibi, Jafar; Khalafbeigi, Tania; Voulgaris, Spyros; van Steen, Martinus Richardus.

    In: IEEE transactions on knowledge and data engineering, Vol. 27, No. 7, 07.2015, p. 1892-1905.

    Research output: Contribution to journalArticleAcademicpeer-review

    TY - JOUR

    T1 - GDCluster: A General Decentralized Clustering Algorithm

    AU - Mashayekhi, Hoda

    AU - Habibi, Jafar

    AU - Khalafbeigi, Tania

    AU - Voulgaris, Spyros

    AU - van Steen, Martinus Richardus

    N1 - 10.1109/TKDE.2015.2391123

    PY - 2015/7

    Y1 - 2015/7

    N2 - In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed data sets. Nodes continuously cooperate through decentralized gossip-based communication to maintain summarized views of the data set. We customize GDCluster for execution of the partition-based and density-based clustering methods on the summarized views, and also offer enhancements to the basic algorithm. Coping with dynamic data is made possible by gradually adapting the clustering model. Our experimental evaluations show that GDCluster can discover the clusters efficiently with scalable transmission cost, and also expose its supremacy in comparison to the popular method LSP2P.

    AB - In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed data sets. Nodes continuously cooperate through decentralized gossip-based communication to maintain summarized views of the data set. We customize GDCluster for execution of the partition-based and density-based clustering methods on the summarized views, and also offer enhancements to the basic algorithm. Coping with dynamic data is made possible by gradually adapting the clustering model. Our experimental evaluations show that GDCluster can discover the clusters efficiently with scalable transmission cost, and also expose its supremacy in comparison to the popular method LSP2P.

    KW - EWI-26879

    KW - Clustering

    KW - Distributed systems

    KW - IR-100081

    KW - partition-based clustering

    KW - density-based clustering

    KW - METIS-316851

    KW - dynamic system

    U2 - 10.1109/TKDE.2015.2391123

    DO - 10.1109/TKDE.2015.2391123

    M3 - Article

    VL - 27

    SP - 1892

    EP - 1905

    JO - IEEE transactions on knowledge and data engineering

    JF - IEEE transactions on knowledge and data engineering

    SN - 1041-4347

    IS - 7

    ER -