Spaces:
Running
on
Zero
Speaker inconsistency limits usefullness
Even when setting seed and generating speech with near identical prompts there is a noticeable difference between runs when using the same description and preset speaker voice e.g. Brenda.
This limits the usefulness of the model - are there planned improvements or tips for ensuring consistency?
I write the voice name multiple times in the prompt. I take it is a tag that Parler uses. Still does not mean that it will consistently maintain that voice.
https://huggingface.co./spaces/Pendrokar/TTS-Spaces-Arena/discussions/8
Hey
@liambarryarm
,
For the Parler-TTS versions highlighted in this demo, there are some speakers that are more consistent than others, you can find lists here. Brenda doesn't seem to rank that high (not present in the top 20 of the Mini version). Hope that helps!
Large Model - Top 20 Speakers
Speaker Similarity Score Will 0.906055 Eric 0.887598 Laura 0.877930 Alisa 0.877393 Patrick 0.873682 Rose 0.873047 Jerry 0.871582 Jordan 0.870703 Lauren 0.867432 Jenna 0.866455 Karen 0.866309 Rick 0.863135 Bill 0.862207 James 0.856934 Yann 0.856787 Emily 0.856543 Anna 0.848877 Jon 0.848828 Brenda 0.848291 Barbara 0.847998
Mini Model - Top 20 Speakers
Speaker Similarity Score Jon 0.908301 Lea 0.904785 Gary 0.903516 Jenna 0.901807 Mike 0.885742 Laura 0.882666 Lauren 0.878320 Eileen 0.875635 Alisa 0.874219 Karen 0.872363 Barbara 0.871509 Carol 0.863623 Emily 0.854932 Rose 0.852246 Will 0.851074 Patrick 0.850977 Eric 0.845459 Rick 0.845020 Anna 0.844922 Tina 0.839160
@Pendrokar
, speaker consistency doesn't work with speakers that are not present in the training dataset.Elisabeth
is not. I'd recommend using another speaker for voice consistency!
And in that case, no need to repeat the name in the prompt.
For example, you could do: Jenna speaks in a monotone tone at a slightly slower than normal pace, with the recording coming across as very clear and very close-sounding.