ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation

There are significant challenges for speaker adaptation in textto-speech for languages that are not widely spoken or for speakers with accents or dialects that are not well-represented in the training data. To address this issue, we propose the use of the ”mixture of adapters” method. This approach involves adding multiple adapters within a backbone-model layer to learn the unique characteristics of different speakers. Our approach outperforms the baseline, with a noticeable improvement of 5% observed in speaker preference tests when using only one minute of data for each new speaker. Moreover, following the adapter paradigm, we fine-tune only the adapter parameters (11% of the total model parameters). This is a significant achievement in parameter-efficient speaker adaptation, and one of the first models of its kind. Overall, our proposed approach offers a promising solution to the speech synthesis techniques, particularly for adapting to speakers from diverse backgrounds.

ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation

Ambuj Mehrish;Abhinav Ramesh Kashyap;Li Yingting;Navonil Majumder;Soujanya Poria

2023-01-01

Abstract

There are significant challenges for speaker adaptation in textto-speech for languages that are not widely spoken or for speakers with accents or dialects that are not well-represented in the training data. To address this issue, we propose the use of the ”mixture of adapters” method. This approach involves adding multiple adapters within a backbone-model layer to learn the unique characteristics of different speakers. Our approach outperforms the baseline, with a noticeable improvement of 5% observed in speaker preference tests when using only one minute of data for each new speaker. Moreover, following the adapter paradigm, we fine-tune only the adapter parameters (11% of the total model parameters). This is a significant achievement in parameter-efficient speaker adaptation, and one of the first models of its kind. Overall, our proposed approach offers a promising solution to the speech synthesis techniques, particularly for adapting to speakers from diverse backgrounds.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2023
			
	Titolo del volume
	
				Interspeech
			
	DOI
	
				https://dx.doi.org/10.21437/Interspeech.2023-1568
			
	Appare nelle tipologie:
	
				4.1 Articolo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
mehrish23_interspeech.pdf non disponibili Tipologia: Versione dell'editore Licenza: Copyright dell'editore Dimensione 566.77 kB Formato Adobe PDF Visualizza/Apri	566.77 kB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5105952

Citazioni

ND

5

3

social impact