LDA2Net Digging under the surface of COVID-19 scientific literature topics via a network-based approach

During the COVID-19 pandemic, the scientific literature related to SARS-COV-2 has been growing dramatically. These literary items encompass a varied set of topics, ranging from vaccination to protective equipment efficacy as well as lockdown policy evaluations. As a result, the development of automatic methods that allow an in-depth exploration of this growing literature has become a relevant issue, both to identify the topical trends of COVID-related research and to zoom-in on its sub-themes. This work proposes a novel methodology, called LDA2Net, which combines topic modelling and network analysis, to investigate topics under their surface. More specifically, LDA2Net exploits the frequencies of consecutive words pairs (i.e. bigram) to build those network structures underlying the hidden topics extracted from large volumes of text by Latent Dirichlet Allocation (LDA). Results are promising and suggest that the topic model efficacy is magnified by the network-based representation. In particular, such enrichment is noticeable when it comes to displaying and exploring the topics at different levels of granularity.

LDA2Net Digging under the surface of COVID-19 scientific literature topics via a network-based approach

Minello, Giorgia;Santagiustina, Carlo Romano Marcello Alessandro;Warglien, Massimo

2024-01-01

Abstract

During the COVID-19 pandemic, the scientific literature related to SARS-COV-2 has been growing dramatically. These literary items encompass a varied set of topics, ranging from vaccination to protective equipment efficacy as well as lockdown policy evaluations. As a result, the development of automatic methods that allow an in-depth exploration of this growing literature has become a relevant issue, both to identify the topical trends of COVID-related research and to zoom-in on its sub-themes. This work proposes a novel methodology, called LDA2Net, which combines topic modelling and network analysis, to investigate topics under their surface. More specifically, LDA2Net exploits the frequencies of consecutive words pairs (i.e. bigram) to build those network structures underlying the hidden topics extracted from large volumes of text by Latent Dirichlet Allocation (LDA). Results are promising and suggest that the topic model efficacy is magnified by the network-based representation. In particular, such enrichment is noticeable when it comes to displaying and exploring the topics at different levels of granularity.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2024
			
	Titolo della Rivista
	
				PLOS ONE
			
	N° Volume
	
				19
			
	DOI
	
				https://dx.doi.org/10.1371/journal.pone.0300194
			
	Appare nelle tipologie:
	
				2.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
journal.pone.0300194.pdf accesso aperto Descrizione: Articolo Tipologia: Versione dell'editore Licenza: Accesso libero (no vincoli) Dimensione 5.2 MB Formato Adobe PDF Visualizza/Apri	5.2 MB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5056020

Citazioni

ND

0

0

social impact