Have you ever train a 44k model?

#1
by LukeJacob2023 - opened

I have failed to train from scratch. The model can not learn alignment with a tiny dataset. So I want to try your solution.

Hello, how much data do you have? which language? Actually I didn't train it from scratch, I start from original F5TTS weight. But my solution might help with better alignment.

about 12 hours, Indonesia.

This method uses phonemes instead of raw text, and uses force alignment during training. Although my training vocabulary differs from the original F5TTS model, but I still utilize the pre-trained weight (i.e: only re-initialize the text embedding layer), because it has the capacity to make sound so the training should be faster. I guess you can utilize the pre-trained model instead of training from scratch.
Previously I did some experiments with LJSpeech and the model can learn with only 10 hour of dataset. I am not sure about the results if we train the model on a different language. Currently I am doing some experiments with another language (Vietnamese), maybe we will get more insight.

This comment has been hidden

hello @sinhprous , Can you complete the inference code for f5-tts_infer-cli?
And the finetune need some details steps, includes declare the language code.

When I finetune with your fork, it will gives error but can continue, is it ok?
Missing keys: ['ema_model.transformer.text_embed.text_embed.weight']
Unexpected keys: []
Missing keys: ['transformer.text_embed.text_embed.weight', 'duration_predictor.text_embed.weight', 'duration_predictor.conv_1.weight', 'duration_predictor.conv_1.bias', 'duration_predictor.norm_1.gamma', 'duration_predictor.norm_1.beta', 'duration_predictor.conv_2.weight', 'duration_predictor.conv_2.bias', 'duration_predictor.norm_2.gamma', 'duration_predictor.norm_2.beta', 'duration_predictor.proj.weight', 'duration_predictor.proj.bias']
Unexpected keys: []

it's okay because it re-init the text embedding layer and it adds a duration predictor.

okay I will complete the f5-tts_infer-cli. In the meantime, you can use the notebook to do inference

Hello, @sinhprous . I have finish training Indonesia. The result is not good, the wer is much higher than the official code.

could you share some samples? how many epochs you trained?

Sign up or log in to comment