Unsupervised clustering methods are increasingly being applied in psychology. Researchers may use such methods on multivariate data to reveal previously undetected sub-populations of individuals within a larger population. Realistic research scenarios in the cognitive science may not be ideally suited for a successful use of these methods, however, as they are characterized by modest effect sizes, limited sample sizes, and non-orthogonal indicators. This combination of characteristics even presents a high risk of detecting non-existing clusters. A systematic review showed that, among 191 studies published in 2016–2020 that used different clustering methods to classify human participants, the median sample size was only 322, and a median of 3 latent classes/clusters were detected. None of them concluded in favor of a one-cluster solution, potentially giving rise to an extreme publication bias. Dimensionality reduction techniques are almost never used before clustering. In a subsequent simulation study, we examined the performance of popular clustering techniques, including Gaussian mixture model, a partitioning, and a hierarchical agglomerative algorithm. We focused on their ability to detect the correct number of clusters, and on their classification accuracy. Under a reasoned set of scenarios that we considered plausible for the cognitive research, none of the methods adequately discriminates between one vs two true clusters. In addition, non-orthogonal indicators lead to a high risk of incorrectly detecting multiple clusters where none existed, even in the presence of only modest correlation (a frequent case in psychology). In conclusion, it is hard for researchers to be in a condition to achieve a valid unsupervised clustering for inferential purposes with a view to classifying individuals.

Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data?

Girardi, Paolo;
2022

Abstract

Unsupervised clustering methods are increasingly being applied in psychology. Researchers may use such methods on multivariate data to reveal previously undetected sub-populations of individuals within a larger population. Realistic research scenarios in the cognitive science may not be ideally suited for a successful use of these methods, however, as they are characterized by modest effect sizes, limited sample sizes, and non-orthogonal indicators. This combination of characteristics even presents a high risk of detecting non-existing clusters. A systematic review showed that, among 191 studies published in 2016–2020 that used different clustering methods to classify human participants, the median sample size was only 322, and a median of 3 latent classes/clusters were detected. None of them concluded in favor of a one-cluster solution, potentially giving rise to an extreme publication bias. Dimensionality reduction techniques are almost never used before clustering. In a subsequent simulation study, we examined the performance of popular clustering techniques, including Gaussian mixture model, a partitioning, and a hierarchical agglomerative algorithm. We focused on their ability to detect the correct number of clusters, and on their classification accuracy. Under a reasoned set of scenarios that we considered plausible for the cognitive research, none of the methods adequately discriminates between one vs two true clusters. In addition, non-orthogonal indicators lead to a high risk of incorrectly detecting multiple clusters where none existed, even in the presence of only modest correlation (a frequent case in psychology). In conclusion, it is hard for researchers to be in a condition to achieve a valid unsupervised clustering for inferential purposes with a view to classifying individuals.
File in questo prodotto:
File Dimensione Formato  
journal.pone.0269584.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Accesso libero (no vincoli)
Dimensione 3.65 MB
Formato Adobe PDF
3.65 MB Adobe PDF Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5004422
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact