t is often the case that collections of documents are annotated with hierarchically-structured concepts. However, the benefits of this structure are rarely taken into account by commonly-used classification techniques. Conversely, Hierarchical Text Classification methods are devisedto take advantage of the labels’ organization to boost classification performance. With this work,we aim to deliver an updated overview of current research in this domain. We begin by definingthe task and framing it within the broader text classification area, examining important shared concepts such as text representation. Then, we dive into details regarding the specific task,providing a high-level description of its traditional approaches. We then summarize recentlyproposed methods, highlighting their main contributions. We additionally provide statisticsfor the most adopted datasets and describe the benefits of using evaluation metrics tailored to hierarchical settings. Finally, a selection of recent proposals is benchmarked against non-hierarchical baselines on five domain-specific datasets.
Hierarchical Text Classification: a review of current research
Alessandro ZangariWriting – Original Draft Preparation
;Matteo Marcuzzo
Writing – Original Draft Preparation
;Michele SchiavinatoWriting – Review & Editing
;Matteo RizzoWriting – Review & Editing
;Andrea GasparettoSupervision
;Andrea AlbarelliSupervision
In corso di stampa
Abstract
t is often the case that collections of documents are annotated with hierarchically-structured concepts. However, the benefits of this structure are rarely taken into account by commonly-used classification techniques. Conversely, Hierarchical Text Classification methods are devisedto take advantage of the labels’ organization to boost classification performance. With this work,we aim to deliver an updated overview of current research in this domain. We begin by definingthe task and framing it within the broader text classification area, examining important shared concepts such as text representation. Then, we dive into details regarding the specific task,providing a high-level description of its traditional approaches. We then summarize recentlyproposed methods, highlighting their main contributions. We additionally provide statisticsfor the most adopted datasets and describe the benefits of using evaluation metrics tailored to hierarchical settings. Finally, a selection of recent proposals is benchmarked against non-hierarchical baselines on five domain-specific datasets.File | Dimensione | Formato | |
---|---|---|---|
HTC_Survey-1.pdf
accesso aperto
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
3.07 MB
Formato
Adobe PDF
|
3.07 MB | Adobe PDF | Visualizza/Apri |
I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.