Have you ever train a 44k model?

by LukeJacob2023 - opened 2 days ago

LukeJacob2023

2 days ago

I have failed to train from scratch. The model can not learn alignment with a tiny dataset. So I want to try your solution.

sinhprous

Owner 2 days ago

Hello, how much data do you have? which language? Actually I didn't train it from scratch, I start from original F5TTS weight. But my solution might help with better alignment.

LukeJacob2023

2 days ago

about 12 hours, Indonesia.

sinhprous

Owner 1 day ago

This method uses phonemes instead of raw text, and uses force alignment during training. Although my training vocabulary differs from the original F5TTS model, but I still utilize the pre-trained weight (i.e: only re-initialize the text embedding layer), because it has the capacity to make sound so the training should be faster. I guess you can utilize the pre-trained model instead of training from scratch.
Previously I did some experiments with LJSpeech and the model can learn with only 10 hour of dataset. I am not sure about the results if we train the model on a different language. Currently I am doing some experiments with another language (Vietnamese), maybe we will get more insight.

LukeJacob2023

1 day ago

This comment has been hidden

LukeJacob2023

1 day ago

•

edited 1 day ago

hello @sinhprous , Can you complete the inference code for f5-tts_infer-cli?
And the finetune need some details steps, includes declare the language code.

LukeJacob2023

1 day ago

When I finetune with your fork, it will gives error but can continue, is it ok?
Missing keys: ['ema_model.transformer.text_embed.text_embed.weight']
Unexpected keys: []
Missing keys: ['transformer.text_embed.text_embed.weight', 'duration_predictor.text_embed.weight', 'duration_predictor.conv_1.weight', 'duration_predictor.conv_1.bias', 'duration_predictor.norm_1.gamma', 'duration_predictor.norm_1.beta', 'duration_predictor.conv_2.weight', 'duration_predictor.conv_2.bias', 'duration_predictor.norm_2.gamma', 'duration_predictor.norm_2.beta', 'duration_predictor.proj.weight', 'duration_predictor.proj.bias']
Unexpected keys: []

sinhprous

Owner about 19 hours ago

it's okay because it re-init the text embedding layer and it adds a duration predictor.

sinhprous

Owner about 19 hours ago

okay I will complete the f5-tts_infer-cli. In the meantime, you can use the notebook to do inference

LukeJacob2023

about 9 hours ago

•

edited about 9 hours ago

Hello, @sinhprous . I have finish training Indonesia. The result is not good, the wer is much higher than the official code.

sinhprous

Owner about 2 hours ago

could you share some samples? how many epochs you trained?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment