wav2vec2-base-vietnamese-VIVOS-CommonVoice-FOSD-Control-dataset-25e-epochs

This model is a fine-tuned version of nguyenvulebinh/wav2vec2-base-vi on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3338
  • Wer: 0.1833

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 25

Training results

Training Loss Epoch Step Validation Loss Wer
16.1039 0.39 500 21.1164 1.0
10.5383 0.77 1000 15.8037 1.0
7.5435 1.16 1500 9.8785 1.0
5.1426 1.55 2000 5.9691 1.0
3.9112 1.93 2500 4.1400 1.0
3.5159 2.32 3000 3.6877 1.0
3.4056 2.71 3500 3.5166 1.0
3.384 3.09 4000 3.6170 1.0
3.3715 3.48 4500 3.5045 1.0
3.373 3.87 5000 3.4859 1.0
3.3539 4.25 5500 3.4843 1.0
3.3063 4.64 6000 3.3596 1.0
3.0749 5.03 6500 2.8515 0.9994
2.6888 5.41 7000 2.4817 1.0000
2.3404 5.8 7500 2.0490 0.9815
2.0588 6.19 8000 1.7986 0.9288
1.8428 6.57 8500 1.4945 0.8332
1.686 6.96 9000 1.3796 0.7640
1.5399 7.35 9500 1.2362 0.6927
1.4374 7.73 10000 1.1130 0.6320
1.3281 8.12 10500 1.0058 0.5705
1.2308 8.51 11000 0.8888 0.5109
1.1405 8.89 11500 0.8438 0.4524
1.0647 9.28 12000 0.7767 0.4208
1.0104 9.67 12500 0.7385 0.3777
0.9629 10.05 13000 0.6731 0.3505
0.9045 10.44 13500 0.6295 0.3317
0.8573 10.83 14000 0.6071 0.3115
0.8443 11.21 14500 0.5895 0.2984
0.7915 11.6 15000 0.5828 0.2823
0.7965 11.99 15500 0.5552 0.2714
0.7738 12.37 16000 0.5100 0.2605
0.7326 12.76 16500 0.4884 0.2499
0.7007 13.15 17000 0.4799 0.2402
0.6997 13.53 17500 0.4647 0.2331
0.68 13.92 18000 0.4469 0.2271
0.6707 14.31 18500 0.4261 0.2231
0.6557 14.69 19000 0.4145 0.2164
0.6509 15.08 19500 0.4010 0.2120
0.6649 15.47 20000 0.4038 0.2092
0.6191 15.85 20500 0.3926 0.2064
0.6385 16.24 21000 0.3882 0.2024
0.6222 16.63 21500 0.3874 0.2016
0.5792 17.01 22000 0.3873 0.2023
0.5775 17.4 22500 0.3757 0.1975
0.5647 17.79 23000 0.3626 0.1964
0.5723 18.17 23500 0.3574 0.1958
0.5573 18.56 24000 0.3530 0.1960
0.5813 18.95 24500 0.3541 0.1933
0.563 19.33 25000 0.3455 0.1926
0.5402 19.72 25500 0.3483 0.1910
0.5578 20.11 26000 0.3516 0.1915
0.5456 20.49 26500 0.3477 0.1878
0.5453 20.88 27000 0.3391 0.1882
0.5265 21.27 27500 0.3386 0.1869
0.557 21.66 28000 0.3388 0.1864
0.5526 22.04 28500 0.3373 0.1864
0.5284 22.43 29000 0.3352 0.1854
0.5351 22.82 29500 0.3373 0.1850
0.5775 23.2 30000 0.3382 0.1848
0.5292 23.59 30500 0.3371 0.1843
0.52 23.98 31000 0.3338 0.1839
0.5372 24.36 31500 0.3337 0.1829
0.5167 24.75 32000 0.3338 0.1833

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.0.1+cu117
  • Datasets 2.14.4
  • Tokenizers 0.13.3
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for tuanmanh28/wav2vec2-base-vietnamese-VIVOS-CommonVoice-FOSD-Control-dataset-25e-epochs

Finetuned
(10)
this model