Whisper Larget V3 GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-large-v3 on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, Wikimedia, and EUbookshop dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9885
  • Bleu: 15.23
  • Chrf: 28.15
  • Wer: 92.7060

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Bleu Chrf Validation Loss Wer
2.5918 0.0138 100 0.61 8.48 2.1791 238.2260
2.476 0.0276 200 0.63 10.43 2.1702 275.7317
2.2358 0.0414 300 4.76 19.98 2.0420 120.0810
2.1778 0.0552 400 2.78 12.85 1.9506 86.8528
1.9779 0.0690 500 4.53 18.47 1.8609 137.1905
1.9435 0.0828 600 6.67 22.37 1.7726 82.4403
1.7928 0.0966 700 4.54 17.32 1.7445 133.8586
1.9004 0.1103 800 1.58 12.65 1.7290 195.2724
1.7856 0.1241 900 4.84 17.5 1.6990 83.9262
1.6783 0.1379 1000 8.46 24.24 1.6329 113.5074
1.6095 0.1517 1100 7.35 20.22 1.6083 102.5214
1.6328 0.1655 1200 11.46 25.29 1.5267 76.5871
1.6093 0.1793 1300 6.51 17.77 1.4947 112.4719
1.5776 0.1931 1400 6.21 19.86 1.4952 90.6348
1.4767 0.2069 1500 4.86 19.57 1.4515 145.1148
1.3447 0.2207 1600 6.77 19.96 1.3974 90.5448
1.3273 0.2345 1700 4.77 16.31 1.4323 152.1837
1.4253 0.2483 1800 3.95 15.66 1.3598 173.2553
1.3505 0.2621 1900 11.25 23.4 1.3517 80.3692
1.2593 0.2759 2000 12.71 26.55 1.3236 77.5777
1.2483 0.2897 2100 17.88 32.0 1.2825 73.3003
1.161 0.3034 2200 10.08 20.69 1.2567 115.8937
1.1597 0.3172 2300 8.61 19.54 1.2581 93.8766
1.0937 0.3310 2400 12.37 25.67 1.2577 99.0095
1.0606 0.3448 2500 6.46 23.47 1.2228 172.9401
1.039 0.3586 2600 9.55 21.56 1.2186 89.7794
1.0193 0.3724 2700 3.08 17.58 1.1844 281.8100
1.1153 0.3862 2800 1.1693 2.69 18.38 350.2927
1.012 0.4 2900 1.1233 3.56 14.74 194.9122
0.8936 0.4138 3000 1.1161 5.21 17.38 158.3521
0.8893 0.4276 3100 1.1119 11.52 25.02 80.9095
0.9491 0.4414 3200 1.1213 5.93 20.91 174.0207
0.9233 0.4552 3300 1.0656 5.54 20.95 186.2224
0.8915 0.4690 3400 1.0736 7.26 23.99 155.6506
0.8296 0.4828 3500 1.0461 6.74 21.46 146.1054
0.8163 0.4966 3600 1.0706 11.35 24.11 101.8010
0.8115 0.5103 3700 1.0199 12.84 26.92 115.8487
0.8245 0.5241 3800 1.0163 12.47 24.29 101.9361
0.7988 0.5379 3900 0.9891 15.29 28.54 92.7960
0.769 0.5517 4000 0.9885 15.23 28.15 92.7060

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2+git70dfd51
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
23
Safetensors
Model size
1.54B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-large-v3-ga2en-v3.0.0-r

Finetuned
(351)
this model

Datasets used to train ymoslem/whisper-large-v3-ga2en-v3.0.0-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, Wikimedia, and EUbookshop
    self-reported
    15.230
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, Wikimedia, and EUbookshop
    self-reported
    92.706