Text-to-Speech
coqui

Tips for accurate Spanish speaker cloning?

#5
by manugarri - opened

Im having trouble getting to replicate my voice in an accurate manner. I tried creating a 3 second wav file , then running the following block:

                file_path="output.wav",
                speaker_wav="/path/to/target/speaker.wav",
                language="en")

The resulting output audio does not sound like me at all.

I tried increasing the decoder iterations, and also trying longer recordings.

Are there any guidelines on how to produce the speaker audio to improve the output quality?

Just try to change the language value from en to es

@unificador im sorry, the sample i used of course i changed the language to 'es' :D . its just the snippet i wrote here i copy pasted from the landing page.

Any real tips anyone?

For better cloning:

  • 6 seconds or more of audio refernce
  • No background noise/mic bumps etc
  • Cleaner audio file
  • No big silences on reference audio (at start and end especially)
    Note: there is a pretty fast and working filter for microphone especially on this space, check its app.py https://huggingface.co./spaces/coqui/xtts
Coqui.ai org

You can now fine tune using XTTS (TTS v0.19.0 )
https://tts.readthedocs.io/en/dev/models/xtts.html

Sign up or log in to comment