GDCluster: A General Decentralized Clustering Algorithm

Hoda Mashayekhi, Jafar Habibi, Tania Khalafbeigi, Spyros Voulgaris, Martinus Richardus van Steen

Research output: Contribution to journalArticleAcademicpeer-review

14 Citations (Scopus)
47 Downloads (Pure)

Abstract

In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed data sets. Nodes continuously cooperate through decentralized gossip-based communication to maintain summarized views of the data set. We customize GDCluster for execution of the partition-based and density-based clustering methods on the summarized views, and also offer enhancements to the basic algorithm. Coping with dynamic data is made possible by gradually adapting the clustering model. Our experimental evaluations show that GDCluster can discover the clusters efficiently with scalable transmission cost, and also expose its supremacy in comparison to the popular method LSP2P.
Original languageUndefined
Pages (from-to)1892-1905
Number of pages14
JournalIEEE transactions on knowledge and data engineering
Volume27
Issue number7
DOIs
Publication statusPublished - Jul 2015

Keywords

  • EWI-26879
  • Clustering
  • Distributed systems
  • IR-100081
  • partition-based clustering
  • density-based clustering
  • METIS-316851
  • dynamic system

Cite this

Mashayekhi, Hoda ; Habibi, Jafar ; Khalafbeigi, Tania ; Voulgaris, Spyros ; van Steen, Martinus Richardus. / GDCluster: A General Decentralized Clustering Algorithm. In: IEEE transactions on knowledge and data engineering. 2015 ; Vol. 27, No. 7. pp. 1892-1905.
@article{61ae904be083446cb42a051d9830b6df,
title = "GDCluster: A General Decentralized Clustering Algorithm",
abstract = "In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed data sets. Nodes continuously cooperate through decentralized gossip-based communication to maintain summarized views of the data set. We customize GDCluster for execution of the partition-based and density-based clustering methods on the summarized views, and also offer enhancements to the basic algorithm. Coping with dynamic data is made possible by gradually adapting the clustering model. Our experimental evaluations show that GDCluster can discover the clusters efficiently with scalable transmission cost, and also expose its supremacy in comparison to the popular method LSP2P.",
keywords = "EWI-26879, Clustering, Distributed systems, IR-100081, partition-based clustering, density-based clustering, METIS-316851, dynamic system",
author = "Hoda Mashayekhi and Jafar Habibi and Tania Khalafbeigi and Spyros Voulgaris and {van Steen}, {Martinus Richardus}",
note = "10.1109/TKDE.2015.2391123",
year = "2015",
month = "7",
doi = "10.1109/TKDE.2015.2391123",
language = "Undefined",
volume = "27",
pages = "1892--1905",
journal = "IEEE transactions on knowledge and data engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "7",

}

GDCluster: A General Decentralized Clustering Algorithm. / Mashayekhi, Hoda; Habibi, Jafar; Khalafbeigi, Tania; Voulgaris, Spyros; van Steen, Martinus Richardus.

In: IEEE transactions on knowledge and data engineering, Vol. 27, No. 7, 07.2015, p. 1892-1905.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - GDCluster: A General Decentralized Clustering Algorithm

AU - Mashayekhi, Hoda

AU - Habibi, Jafar

AU - Khalafbeigi, Tania

AU - Voulgaris, Spyros

AU - van Steen, Martinus Richardus

N1 - 10.1109/TKDE.2015.2391123

PY - 2015/7

Y1 - 2015/7

N2 - In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed data sets. Nodes continuously cooperate through decentralized gossip-based communication to maintain summarized views of the data set. We customize GDCluster for execution of the partition-based and density-based clustering methods on the summarized views, and also offer enhancements to the basic algorithm. Coping with dynamic data is made possible by gradually adapting the clustering model. Our experimental evaluations show that GDCluster can discover the clusters efficiently with scalable transmission cost, and also expose its supremacy in comparison to the popular method LSP2P.

AB - In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed data sets. Nodes continuously cooperate through decentralized gossip-based communication to maintain summarized views of the data set. We customize GDCluster for execution of the partition-based and density-based clustering methods on the summarized views, and also offer enhancements to the basic algorithm. Coping with dynamic data is made possible by gradually adapting the clustering model. Our experimental evaluations show that GDCluster can discover the clusters efficiently with scalable transmission cost, and also expose its supremacy in comparison to the popular method LSP2P.

KW - EWI-26879

KW - Clustering

KW - Distributed systems

KW - IR-100081

KW - partition-based clustering

KW - density-based clustering

KW - METIS-316851

KW - dynamic system

U2 - 10.1109/TKDE.2015.2391123

DO - 10.1109/TKDE.2015.2391123

M3 - Article

VL - 27

SP - 1892

EP - 1905

JO - IEEE transactions on knowledge and data engineering

JF - IEEE transactions on knowledge and data engineering

SN - 1041-4347

IS - 7

ER -