whisper-large-v3-turbo-common_voice_19_0-zh-TW
This model is a fine-tuned version of openai/whisper-large-v3-turbo on the JacobLinCool/common_voice_19_0_zh-TW dataset. It achieves the following results on the evaluation set:
- Loss: 0.1786
- Wer: 32.5554
- Cer: 8.6009
- Decode Runtime: 90.9833
- Wer Runtime: 0.1257
- Cer Runtime: 0.1534
Model description
This is an open-source Traditional Chinese (Taiwan) automatic speech recognition (ASR) model.
Intended uses & limitations
This model is designed to be a prompt-free ASR model for Traditional Chinese. Due to its inherited language identification (LID) system from Whisper, which supports other Chinese language variants under the same language token (zh
), we expect that performance may degrade when transcribing Simplified Chinese.
The model is free to use under the MIT license.
Training and evaluation data
This model was trained on the Common Voice Corpus 19.0 Chinese (Taiwan) Subset, containing about 50k training examples (44 hours) and 5k test examples (5 hours). This dataset is four times larger than the combination of training and validation set (train+validation
) of mozilla-foundation/common_voice_16_1, which includes about 12k examples.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 4
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 50
- training_steps: 5000
Training results
Training Loss | Epoch | Step | Validation Loss | Wer | Cer | Decode Runtime | Wer Runtime | Cer Runtime |
---|---|---|---|---|---|---|---|---|
No log | 0 | 0 | 2.7208 | 76.5011 | 20.4851 | 89.4916 | 0.1213 | 0.1639 |
1.1832 | 0.1 | 500 | 0.1939 | 39.9561 | 10.8721 | 90.0926 | 0.1222 | 0.1555 |
1.5179 | 0.2 | 1000 | 0.1774 | 37.6621 | 9.9322 | 89.8657 | 0.1225 | 0.1545 |
0.6179 | 0.3 | 1500 | 0.1796 | 36.2657 | 9.8325 | 90.2480 | 0.1198 | 0.1573 |
0.3626 | 1.0912 | 2000 | 0.1846 | 36.2258 | 9.7801 | 90.3306 | 0.1196 | 0.1539 |
0.1311 | 1.1912 | 2500 | 0.1776 | 34.8095 | 9.3214 | 90.3124 | 0.1286 | 0.1610 |
0.1263 | 1.2912 | 3000 | 0.1763 | 36.1261 | 9.3563 | 90.4271 | 0.1330 | 0.1650 |
0.2194 | 2.0825 | 3500 | 0.1891 | 34.6898 | 9.3114 | 91.1932 | 0.1320 | 0.1643 |
0.1127 | 2.1825 | 4000 | 0.1838 | 34.0714 | 9.1095 | 90.2416 | 0.1196 | 0.1529 |
0.3792 | 2.2824 | 4500 | 0.1786 | 33.1339 | 8.7679 | 90.9144 | 0.1310 | 0.1550 |
0.0606 | 3.0737 | 5000 | 0.1786 | 32.5554 | 8.6009 | 90.9833 | 0.1257 | 0.1534 |
Framework versions
- PEFT 0.13.2
- Transformers 4.46.1
- Pytorch 2.4.0
- Datasets 3.0.2
- Tokenizers 0.20.1
- Downloads last month
- 141
Model tree for JacobLinCool/whisper-large-v3-turbo-common_voice_19_0-zh-TW
Dataset used to train JacobLinCool/whisper-large-v3-turbo-common_voice_19_0-zh-TW
Space using JacobLinCool/whisper-large-v3-turbo-common_voice_19_0-zh-TW 1
Collection including JacobLinCool/whisper-large-v3-turbo-common_voice_19_0-zh-TW
Evaluation results
- Wer on JacobLinCool/common_voice_19_0_zh-TWself-reported32.555