t is often the case that collections of documents are annotated with hierarchically-structured concepts. However, the benefits of this structure are rarely taken into account by commonly-used classification techniques. Conversely, Hierarchical Text Classification methods are devisedto take advantage of the labels’ organization to boost classification performance. With this work,we aim to deliver an updated overview of current research in this domain. We begin by definingthe task and framing it within the broader text classification area, examining important shared concepts such as text representation. Then, we dive into details regarding the specific task,providing a high-level description of its traditional approaches. We then summarize recentlyproposed methods, highlighting their main contributions. We additionally provide statisticsfor the most adopted datasets and describe the benefits of using evaluation metrics tailored to hierarchical settings. Finally, a selection of recent proposals is benchmarked against non-hierarchical baselines on five domain-specific datasets.

Hierarchical Text Classification: a review of current research

Alessandro Zangari
Writing – Original Draft Preparation
;
Matteo Marcuzzo
Writing – Original Draft Preparation
;
Michele Schiavinato
Writing – Review & Editing
;
Matteo Rizzo
Writing – Review & Editing
;
Andrea Gasparetto
Supervision
;
Andrea Albarelli
Supervision
In corso di stampa

Abstract

t is often the case that collections of documents are annotated with hierarchically-structured concepts. However, the benefits of this structure are rarely taken into account by commonly-used classification techniques. Conversely, Hierarchical Text Classification methods are devisedto take advantage of the labels’ organization to boost classification performance. With this work,we aim to deliver an updated overview of current research in this domain. We begin by definingthe task and framing it within the broader text classification area, examining important shared concepts such as text representation. Then, we dive into details regarding the specific task,providing a high-level description of its traditional approaches. We then summarize recentlyproposed methods, highlighting their main contributions. We additionally provide statisticsfor the most adopted datasets and describe the benefits of using evaluation metrics tailored to hierarchical settings. Finally, a selection of recent proposals is benchmarked against non-hierarchical baselines on five domain-specific datasets.
In corso di stampa
224
File in questo prodotto:
File Dimensione Formato  
HTC_Survey-1.pdf

accesso aperto

Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 3.07 MB
Formato Adobe PDF
3.07 MB Adobe PDF Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5017724
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact