TY - JOUR
T1 - Adaptive Explicit Kernel Minkowski Weighted K-means
AU - Aradnia, Amir
AU - Haeri, Maryam Amir
AU - Ebadzadeh, Mohammad Mehdi
N1 - Publisher Copyright:
© 2021 The Authors
PY - 2022/1
Y1 - 2022/1
N2 - The K-means algorithm is among the most commonly used data clustering methods. However, the regular K-means can only be applied in the input space, and it is applicable when clusters are linearly separable. The kernel K-means, which extends K-means into the kernel space, is able to capture nonlinear structures and identify arbitrarily shaped clusters. However, kernel methods often operate on the kernel matrix of the data, which scale poorly with the size of the matrix, or suffer from the high clustering cost due to the repetitive calculations of kernel values. Another issue is that algorithms access the data only through evaluation of K(xi,xj), which limits many processes that can be done on data through the clustering task. This paper proposes a method to combine the advantages of the linear and nonlinear approaches by using derived corresponding approximate finite-dimensional feature maps based on spectral analysis. Applying approximate finite-dimensional feature maps have been discussed before only in the context of Support Vector Machines (SVM) problems. We suggest using this method in the kernel K-means context as it does not require storing a huge kernel matrix in memory, calculates cluster centers more efficiently, and accesses the data explicitly in the feature space; thus taking advantage of K-means extensions in that space. We demonstrate that our Explicit Kernel Minkowski Weighted K-means (Explicit KMWK-means) method is able to achieve high accuracy in terms of cluster recovery in the new space by applying additional Minkowski exponent and feature weights. The proposed method is evaluated by four benchmark data sets, and its performance is compared with the commonly used kernel clustering approaches. Experiments show the proposed method consistently achieves superior clustering performances while reducing the memory consumption.
AB - The K-means algorithm is among the most commonly used data clustering methods. However, the regular K-means can only be applied in the input space, and it is applicable when clusters are linearly separable. The kernel K-means, which extends K-means into the kernel space, is able to capture nonlinear structures and identify arbitrarily shaped clusters. However, kernel methods often operate on the kernel matrix of the data, which scale poorly with the size of the matrix, or suffer from the high clustering cost due to the repetitive calculations of kernel values. Another issue is that algorithms access the data only through evaluation of K(xi,xj), which limits many processes that can be done on data through the clustering task. This paper proposes a method to combine the advantages of the linear and nonlinear approaches by using derived corresponding approximate finite-dimensional feature maps based on spectral analysis. Applying approximate finite-dimensional feature maps have been discussed before only in the context of Support Vector Machines (SVM) problems. We suggest using this method in the kernel K-means context as it does not require storing a huge kernel matrix in memory, calculates cluster centers more efficiently, and accesses the data explicitly in the feature space; thus taking advantage of K-means extensions in that space. We demonstrate that our Explicit Kernel Minkowski Weighted K-means (Explicit KMWK-means) method is able to achieve high accuracy in terms of cluster recovery in the new space by applying additional Minkowski exponent and feature weights. The proposed method is evaluated by four benchmark data sets, and its performance is compared with the commonly used kernel clustering approaches. Experiments show the proposed method consistently achieves superior clustering performances while reducing the memory consumption.
KW - Features map
KW - K-means
KW - Kernel clustering
KW - Minkowski metric
KW - UT-Hybrid-D
UR - http://www.scopus.com/inward/record.url?scp=85119079769&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2021.10.048
DO - 10.1016/j.ins.2021.10.048
M3 - Article
AN - SCOPUS:85119079769
SN - 0020-0255
VL - 584
SP - 503
EP - 518
JO - Information sciences
JF - Information sciences
ER -