Tips for accurate Spanish speaker cloning?

by manugarri - opened Sep 16, 2023

Sep 16, 2023

Im having trouble getting to replicate my voice in an accurate manner. I tried creating a 3 second wav file , then running the following block:

                file_path="output.wav",
                speaker_wav="/path/to/target/speaker.wav",
                language="en")

The resulting output audio does not sound like me at all.

I tried increasing the decoder iterations, and also trying longer recordings.

Are there any guidelines on how to produce the speaker audio to improve the output quality?

unificador

Sep 18, 2023

•

edited Sep 18, 2023

Just try to change the language value from en to es

pailletjp

Sep 18, 2023

lol

manugarri

Sep 18, 2023

@unificador im sorry, the sample i used of course i changed the language to 'es' :D . its just the snippet i wrote here i copy pasted from the landing page.

Any real tips anyone?

gorkemgoknar

Coqui.ai org Sep 19, 2023

•

edited Oct 4, 2023

For better cloning:

6 seconds or more of audio refernce
No background noise/mic bumps etc
Cleaner audio file
No big silences on reference audio (at start and end especially)
Note: there is a pretty fast and working filter for microphone especially on this space, check its app.py https://huggingface.co./spaces/coqui/xtts

gorkemgoknar

Coqui.ai org Oct 27, 2023

You can now fine tune using XTTS (TTS v0.19.0 )
https://tts.readthedocs.io/en/dev/models/xtts.html

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment