The automatic categorization of support tickets is a fundamental tool for modern businesses. Such requests are most commonly composed of concise textual descriptions that are noisy and filled with technical jargon. In this paper, we test the effectiveness of pre-trained LMs for the classification of issues related to software bugs. First, we test several strategies to produce single, ticket-wise representations starting from their BERT-generated word embeddings. Then, we showcase a simple yet effective way to build a multi-level classifier for the categorization of documents with two hierarchically dependent labels. We experiment on a public bugs dataset and compare our results with standard BERT-based and traditional SVM classifiers. Our findings suggest that both embedding strategies and hierarchical label dependencies considerably impact classification accuracy.
A multi-level approach for hierarchical Ticket Classification
Marcuzzo MatteoWriting – Original Draft Preparation
;Zangari AlessandroWriting – Original Draft Preparation
;Schiavinato MicheleWriting – Review & Editing
;Giudice LorenzoWriting – Review & Editing
;Gasparetto AndreaSupervision
;Albarelli AndreaSupervision
2022-01-01
Abstract
The automatic categorization of support tickets is a fundamental tool for modern businesses. Such requests are most commonly composed of concise textual descriptions that are noisy and filled with technical jargon. In this paper, we test the effectiveness of pre-trained LMs for the classification of issues related to software bugs. First, we test several strategies to produce single, ticket-wise representations starting from their BERT-generated word embeddings. Then, we showcase a simple yet effective way to build a multi-level classifier for the categorization of documents with two hierarchically dependent labels. We experiment on a public bugs dataset and compare our results with standard BERT-based and traditional SVM classifiers. Our findings suggest that both embedding strategies and hierarchical label dependencies considerably impact classification accuracy.File | Dimensione | Formato | |
---|---|---|---|
2022.wnut-1.22.pdf
accesso aperto
Tipologia:
Versione dell'editore
Licenza:
Creative commons
Dimensione
1.46 MB
Formato
Adobe PDF
|
1.46 MB | Adobe PDF | Visualizza/Apri |
I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.