Sentence Embedding Models for Similarity Detection of Software Requirements

Semantic similarity detection mainly relies on the availability of laboriously curated ontologies, as well as of supervised and unsupervised neural embedding models. In this paper, we present two domain-specific sentence embedding models trained on a natural language requirements dataset in order to derive sentence embeddings specific to the software requirements engineering domain. We use cosine-similarity measures in both these models. The result of the experimental evaluation confirm that the proposed models enhance the performance of textual semantic similarity measures over existing state-of-the-art neural sentence embedding models: we reach an accuracy of 88.35%—which improves by about 10% on existing benchmarks.

Semantic similarity detection mainly relies on the availability of laboriously curated ontologies, as well as of supervised and unsupervised neural embedding models. In this paper, we present two domain-specific sentence embedding models trained on a natural language requirements dataset in order to derive sentence embeddings specific to the software requirements engineering domain. We use cosine-similarity measures in both these models. The result of the experimental evaluation confirm that the proposed models enhance the performance of textual semantic similarity measures over existing state-of-the-art neural sentence embedding models: we reach an accuracy of 88.35%—which improves by about 10% on existing benchmarks.

Sentence Embedding Models for Similarity Detection of Software Requirements

Das, Souvick;Deb, Novarun;Cortesi, Agostino;Chaki, Nabendu

2021-01-01

Abstract

Semantic similarity detection mainly relies on the availability of laboriously curated ontologies, as well as of supervised and unsupervised neural embedding models. In this paper, we present two domain-specific sentence embedding models trained on a natural language requirements dataset in order to derive sentence embeddings specific to the software requirements engineering domain. We use cosine-similarity measures in both these models. The result of the experimental evaluation confirm that the proposed models enhance the performance of textual semantic similarity measures over existing state-of-the-art neural sentence embedding models: we reach an accuracy of 88.35%—which improves by about 10% on existing benchmarks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2021
			
	Titolo della Rivista
	
				SN COMPUTER SCIENCE
			
	N° Volume
	
				2
			
	DOI
	
				https://dx.doi.org/10.1007/s42979-020-00427-1
			
	Appare nelle tipologie:
	
				2.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Das2021_Article_SentenceEmbeddingModelsForSimi (1).pdf Open Access dal 02/02/2022 Descrizione: versione dell'editore Tipologia: Versione dell'editore Licenza: Accesso gratuito (solo visione) Dimensione 1.67 MB Formato Adobe PDF Visualizza/Apri	1.67 MB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/3736435

Citazioni

ND

30

ND

social impact