In density based clustering, the density peak algorithm has attracted much attention due to its effectiveness and simplicity, and a vast amount of clustering approaches have been proposed based on this algorithm. Some of these works require manual selection of cluster centers with a decision graph, where human involvement leads to uncertainty in clustering results. In order to avoid human involvement, some other algorithms depend on user-specified number of clusters to determine cluster centers automatically. However, it is well known that accurate estimation of number of clusters is a long-standing difficulty in data clustering. In this paper we present a sequential density peak clustering algorithm to extract clusters one by one, thereby determining the number of clusters automatically and avoiding manual selection of cluster centers in the meanwhile. Starting from a density peak, our algorithm generates an initial cluster surrounding the density peak in the first step, and then obtains the final cluster by expanding the initial cluster based on the relative density relationship among neighboring data points. With a peeling-off strategy, we obtain all the clusters sequentially. Our algorithm works well with clusters of Gaussian distribution and is therefore potential for clustering of real-world data. Experiments with a large number of synthetic and real datasets and comparisons with existing algorithms demonstrate the effectiveness of the proposed algorithm.

Flexible density peak clustering for real-world data

Marcello Pelillo
2024-01-01

Abstract

In density based clustering, the density peak algorithm has attracted much attention due to its effectiveness and simplicity, and a vast amount of clustering approaches have been proposed based on this algorithm. Some of these works require manual selection of cluster centers with a decision graph, where human involvement leads to uncertainty in clustering results. In order to avoid human involvement, some other algorithms depend on user-specified number of clusters to determine cluster centers automatically. However, it is well known that accurate estimation of number of clusters is a long-standing difficulty in data clustering. In this paper we present a sequential density peak clustering algorithm to extract clusters one by one, thereby determining the number of clusters automatically and avoiding manual selection of cluster centers in the meanwhile. Starting from a density peak, our algorithm generates an initial cluster surrounding the density peak in the first step, and then obtains the final cluster by expanding the initial cluster based on the relative density relationship among neighboring data points. With a peeling-off strategy, we obtain all the clusters sequentially. Our algorithm works well with clusters of Gaussian distribution and is therefore potential for clustering of real-world data. Experiments with a large number of synthetic and real datasets and comparisons with existing algorithms demonstrate the effectiveness of the proposed algorithm.
2024
156
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0031320324005235-main.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: Copyright dell'editore
Dimensione 856.99 kB
Formato Adobe PDF
856.99 kB Adobe PDF   Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5081923
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact