File size: 4,574 Bytes
64a67e2 d206943 64a67e2 d206943 64a67e2 c248687 64a67e2 c248687 64a67e2 c248687 498f070 c248687 498f070 64a67e2 c248687 64a67e2 498f070 28bac56 64a67e2 498f070 64a67e2 498f070 64a67e2 498f070 64a67e2 498f070 64a67e2 498f070 64a67e2 498f070 64a67e2 498f070 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
---
library_name: transformers
license: apache-2.0
base_model: openai/whisper-small
tags:
- whisper-event
- generated_from_trainer
datasets:
- asierhv/composite_corpus_eu_v2.1
metrics:
- wer
model-index:
- name: Whisper Small Basque
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Mozilla Common Voice 18.0
type: mozilla-foundation/common_voice_18_0
metrics:
- name: Wer
type: wer
value: 7.63
language:
- eu
---
# Whisper Small Basque
This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co./openai/whisper-small) specifically for Basque (eu) language Automatic Speech Recognition (ASR). It was trained on the [asierhv/composite_corpus_eu_v2.1](https://huggingface.co./datasets/asierhv/composite_corpus_eu_v2.1) dataset, which is a composite corpus designed to improve Basque ASR performance.
**Key improvements and results compared to the base model:**
* **Significant WER reduction:** The fine-tuned model achieves a Word Error Rate (WER) of 9.5479 on the validation set of the `asierhv/composite_corpus_eu_v2.1` dataset, demonstrating improved accuracy compared to the base `whisper-small` model for Basque.
* **Performance on Common Voice:** When evaluated on the Mozilla Common Voice 18.0 dataset, the model achieved a WER of 7.63. This demonstrates the model's ability to generalize to other Basque speech datasets, and highlights the improved accuracy due to the larger model size.
## Model description
This model leverages the `whisper-small` architecture, which offers a balance between accuracy and computational efficiency. By fine-tuning it on a dedicated Basque speech corpus, the model specializes in accurately transcribing Basque speech. This model has a larger capacity than `whisper-base`, improving accuracy at the cost of increased computational resources.
## Intended uses & limitations
**Intended uses:**
* High-accuracy automatic transcription of Basque speech for professional applications.
* Development of advanced Basque speech-based applications that require high precision.
* Research in Basque speech processing where the highest possible accuracy is needed.
* Professional transcription services and applications requiring very high accuracy.
* Use in scenarios where a higher computational cost is justified by the significant improvement in accuracy.
**Limitations:**
* Performance is still influenced by audio quality, with challenges arising from background noise and poor recording conditions.
* Accuracy may be affected by highly dialectal or informal Basque speech.
* Despite improved performance, the model may still produce errors, particularly with complex linguistic structures or rare words.
* The small model is larger than both the base and tiny models, so inference will be slower and require more resources.
## Training and evaluation data
* **Training dataset:** [asierhv/composite_corpus_eu_v2.1](https://huggingface.co./datasets/asierhv/composite_corpus_eu_v2.1). This dataset is a comprehensive collection of Basque speech data, tailored to enhance the performance of Basque ASR systems.
* **Evaluation Dataset:** The `test` split of `asierhv/composite_corpus_eu_v2.1`.
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
* **learning_rate:** 1.25e-05
* **train_batch_size:** 32
* **eval_batch_size:** 16
* **seed:** 42
* **optimizer:** AdamW with betas=(0.9, 0.999) and epsilon=1e-08
* **lr_scheduler_type:** linear
* **lr_scheduler_warmup_steps:** 500
* **training_steps:** 10000
* **mixed_precision_training:** Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | WER |
|---------------|-------|-------|-----------------|----------|
| 0.3863 | 0.1 | 1000 | 0.4090 | 21.2189 |
| 0.1897 | 0.2 | 2000 | 0.3457 | 15.4490 |
| 0.1379 | 0.3 | 3000 | 0.3283 | 13.5756 |
| 0.1825 | 0.4 | 4000 | 0.3024 | 12.3954 |
| 0.0775 | 0.5 | 5000 | 0.3198 | 11.8771 |
| 0.0975 | 0.6 | 6000 | 0.2924 | 11.2589 |
| 0.1132 | 0.7 | 7000 | 0.2969 | 10.8468 |
| 0.0852 | 0.8 | 8000 | 0.2237 | 9.7727 |
| 0.0585 | 0.9 | 9000 | 0.2317 | 9.6291 |
| 0.0654 | 1.0 | 10000 | 0.2353 | 9.5479 |
### Framework versions
* Transformers 4.49.0.dev0
* Pytorch 2.6.0+cu124
* Datasets 3.3.1.dev0
* Tokenizers 0.21.0 |