During the COVID-19 pandemic, the scientific literature related to SARS-COV-2 has been growing dramatically, both in terms of the number of publications and of its impact on people's life. This literature encompasses a varied set of sensible topics, ranging from vaccination, to protective equipment efficacy, to lockdown policy evaluation. Up to now, hundreds of thousands of papers have been uploaded on online repositories and published in scientific journals. As a result, the development of digital methods that allow an in-depth exploration of this growing literature has become a relevant issue, both to identify the topical trends of COVID-related research and to zoom-in its sub-themes. This work proposes a novel methodology, called LDA2Net, which combines topic modelling and network analysis to investigate topics under their surface. Specifically, LDA2Net exploits the frequencies of pairs of consecutive words to reconstruct the network structure of topics discussed in the Cord-19 corpus. The results suggest that the effectiveness of topic models can be magnified by enriching them with word network representations, and by using the latter to display, analyse, and explore COVID-related topics at different levels of granularity.

LDA2Net: Digging under the surface of COVID-19 topics in scientific literature

Giorgia Minello;Carlo Santagiustina;Massimo Warglien
2021-01-01

Abstract

During the COVID-19 pandemic, the scientific literature related to SARS-COV-2 has been growing dramatically, both in terms of the number of publications and of its impact on people's life. This literature encompasses a varied set of sensible topics, ranging from vaccination, to protective equipment efficacy, to lockdown policy evaluation. Up to now, hundreds of thousands of papers have been uploaded on online repositories and published in scientific journals. As a result, the development of digital methods that allow an in-depth exploration of this growing literature has become a relevant issue, both to identify the topical trends of COVID-related research and to zoom-in its sub-themes. This work proposes a novel methodology, called LDA2Net, which combines topic modelling and network analysis to investigate topics under their surface. Specifically, LDA2Net exploits the frequencies of pairs of consecutive words to reconstruct the network structure of topics discussed in the Cord-19 corpus. The results suggest that the effectiveness of topic models can be magnified by enriching them with word network representations, and by using the latter to display, analyse, and explore COVID-related topics at different levels of granularity.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5002014
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact