Hierarchical Text Classification and Its Foundations: A Review of Current Research

While collections of documents are often annotated with hierarchically structured concepts, the benefits of these structures are rarely taken into account by classification techniques. Within this context, hierarchical text classification methods are devised to take advantage of the labels’ organization to boost classification performance. In this work, we aim to deliver an updated overview of the current research in this domain. We begin by defining the task and framing it within the broader text classification area, examining important shared concepts such as text representation. Then, we dive into details regarding the specific task, providing a high-level description of its traditional approaches. We then summarize recently proposed methods, highlighting their main contributions. We also provide statistics for the most commonly used datasets and describe the benefits of using evaluation metrics tailored to hierarchical settings. Finally, a selection of recent proposals is benchmarked against non-hierarchical baselines on five public domain-specific datasets. These datasets, along with our code, are made available for future research.

Hierarchical Text Classification and Its Foundations: A Review of Current Research

Zangari, Alessandro^{Writing – Original Draft Preparation};Marcuzzo, Matteo^{Writing – Original Draft Preparation};Rizzo, Matteo^{Writing – Review & Editing};Giudice, Lorenzo^{Membro del Collaboration Group};Albarelli, Andrea^Supervision;Gasparetto, Andrea^Supervision

2024-01-01

Abstract

While collections of documents are often annotated with hierarchically structured concepts, the benefits of these structures are rarely taken into account by classification techniques. Within this context, hierarchical text classification methods are devised to take advantage of the labels’ organization to boost classification performance. In this work, we aim to deliver an updated overview of the current research in this domain. We begin by defining the task and framing it within the broader text classification area, examining important shared concepts such as text representation. Then, we dive into details regarding the specific task, providing a high-level description of its traditional approaches. We then summarize recently proposed methods, highlighting their main contributions. We also provide statistics for the most commonly used datasets and describe the benefits of using evaluation metrics tailored to hierarchical settings. Finally, a selection of recent proposals is benchmarked against non-hierarchical baselines on five public domain-specific datasets. These datasets, along with our code, are made available for future research.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2024
			
	Titolo della Rivista
	
				ELECTRONICS
			
	N° Volume
	
				13
			
	DOI
	
				https://dx.doi.org/10.3390/electronics13071199
			
	Appare nelle tipologie:
	
				2.1 Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5055640

Citazioni

ND

9

ND

social impact