Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder

Accent plays a significant role in speech communication, influencing one's capability to understand as well as conveying a person's identity. This paper introduces a novel and efficient framework for accented Text-to-Speech (TTS) synthesis based on a Conditional Variational Autoencoder. It has the ability to synthesize a selected speaker's voice, and convert this to any desired target accent. Our thorough experiments validate the effectiveness of the proposed framework using both objective and subjective evaluations. The results also show remarkable performance in terms of the model's ability to manipulate accents in the synthesized speech. Overall, our proposed framework presents a promising avenue for future accented TTS research. © 2024 IEEE.

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder

Jan Melechovsky;Ambuj Mehrish;Berrak Sisman;Dorien Herremans

2024

Abstract

Accent plays a significant role in speech communication, influencing one's capability to understand as well as conveying a person's identity. This paper introduces a novel and efficient framework for accented Text-to-Speech (TTS) synthesis based on a Conditional Variational Autoencoder. It has the ability to synthesize a selected speaker's voice, and convert this to any desired target accent. Our thorough experiments validate the effectiveness of the proposed framework using both objective and subjective evaluations. The results also show remarkable performance in terms of the model's ability to manipulate accents in the synthesized speech. Overall, our proposed framework presents a promising avenue for future accented TTS research. © 2024 IEEE.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2024
			
	Titolo del volume
	
				IEEE Region 10 Annual International Conference, Proceedings/TENCON
			
	DOI
	
				https://dx.doi.org/10.1109/TENCON61640.2024.10902981
			
	Appare nelle tipologie:
	
				4.1 Articolo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Accented_Text-to-Speech_Synthesis_with_a_Conditional_Variational_Autoencoder.pdf non disponibili Tipologia: Versione dell'editore Licenza: Copyright dell'editore Dimensione 1.26 MB Formato Adobe PDF Visualizza/Apri	1.26 MB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5105962

Citazioni

ND

1

ND

social impact