nonoJDWAOIDAWKDA
/

Amelia2_ft_StyleTTS2

speech-synthesis

Model card Files Files and versions Community

StyleTTS2 Fine-tuned Model

This model is a fine-tuned version of StyleTTS2, containing all necessary components for inference.

Model Details

Base Model: StyleTTS2-LibriTTS
Architecture: StyleTTS2
Task: Text-to-Speech
Last Checkpoint: epoch_2nd_00005.pth

Training Details

Total Epochs: 10
Completed Epochs: 5
Total Iterations: 2472
Batch Size: 2
Max Length: 630
Learning Rate: 0.0001
Final Validation Loss: 0.376327

Model Components

The repository includes all necessary components for inference:

Main Model Components:

bert.pth
bert_encoder.pth
predictor.pth
decoder.pth
text_encoder.pth
predictor_encoder.pth
style_encoder.pth
diffusion.pth
text_aligner.pth
pitch_extractor.pth
mpd.pth
msd.pth
wd.pth

Utility Components:

ASR (Automatic Speech Recognition)
- epoch_00080.pth
- config.yml
- models.py
- layers.py
JDC (F0 Prediction)
- bst.t7
- model.py
PLBERT
- step_1000000.t7
- config.yml
- util.py

Additional Files:

text_utils.py: Text preprocessing utilities
models.py: Model architecture definitions
utils.py: Utility functions
config.yml: Model configuration
config.json: Detailed configuration and training metrics

Training Metrics

Training metrics visualization is available in training_metrics.png

Directory Structure

├── Utils/ │ ├── ASR/ │ ├── JDC/ │ └── PLBERT/ ├── model_components/ └── configs/

Usage Instructions

Load the model using the provided config.yml
Ensure all utility components (ASR, JDC, PLBERT) are in their respective directories
Use text_utils.py for text preprocessing
Follow the inference example in the StyleTTS2 documentation

Downloads last month: 2

Inference Examples

Unable to determine this model's library. Check the docs .