File size: 4,574 Bytes
64a67e2
d206943
64a67e2
d206943
64a67e2
c248687
64a67e2
c248687
 
64a67e2
 
 
c248687
 
 
 
 
 
498f070
 
c248687
 
 
498f070
 
 
64a67e2
 
c248687
64a67e2
498f070
 
 
 
 
 
28bac56
64a67e2
 
498f070
64a67e2
 
 
498f070
 
 
 
 
 
 
 
 
 
 
 
 
 
64a67e2
 
 
498f070
 
64a67e2
 
 
 
 
 
 
498f070
 
 
 
 
 
 
 
 
64a67e2
498f070
64a67e2
498f070
 
 
 
 
 
 
 
 
 
 
 
64a67e2
 
 
498f070
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
library_name: transformers
license: apache-2.0
base_model: openai/whisper-small
tags:
- whisper-event
- generated_from_trainer
datasets:
- asierhv/composite_corpus_eu_v2.1
metrics:
- wer
model-index:
- name: Whisper Small Basque
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Mozilla Common Voice 18.0
      type: mozilla-foundation/common_voice_18_0
    metrics:
    - name: Wer
      type: wer
      value: 7.63
language:
- eu
---

# Whisper Small Basque

This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co./openai/whisper-small) specifically for Basque (eu) language Automatic Speech Recognition (ASR). It was trained on the [asierhv/composite_corpus_eu_v2.1](https://huggingface.co./datasets/asierhv/composite_corpus_eu_v2.1) dataset, which is a composite corpus designed to improve Basque ASR performance.

**Key improvements and results compared to the base model:**

* **Significant WER reduction:** The fine-tuned model achieves a Word Error Rate (WER) of 9.5479 on the validation set of the `asierhv/composite_corpus_eu_v2.1` dataset, demonstrating improved accuracy compared to the base `whisper-small` model for Basque.
* **Performance on Common Voice:** When evaluated on the Mozilla Common Voice 18.0 dataset, the model achieved a WER of 7.63. This demonstrates the model's ability to generalize to other Basque speech datasets, and highlights the improved accuracy due to the larger model size.

## Model description

This model leverages the `whisper-small` architecture, which offers a balance between accuracy and computational efficiency. By fine-tuning it on a dedicated Basque speech corpus, the model specializes in accurately transcribing Basque speech. This model has a larger capacity than `whisper-base`, improving accuracy at the cost of increased computational resources.

## Intended uses & limitations

**Intended uses:**

* High-accuracy automatic transcription of Basque speech for professional applications.
* Development of advanced Basque speech-based applications that require high precision.
* Research in Basque speech processing where the highest possible accuracy is needed.
* Professional transcription services and applications requiring very high accuracy.
* Use in scenarios where a higher computational cost is justified by the significant improvement in accuracy.

**Limitations:**

* Performance is still influenced by audio quality, with challenges arising from background noise and poor recording conditions.
* Accuracy may be affected by highly dialectal or informal Basque speech.
* Despite improved performance, the model may still produce errors, particularly with complex linguistic structures or rare words.
* The small model is larger than both the base and tiny models, so inference will be slower and require more resources.

## Training and evaluation data

* **Training dataset:** [asierhv/composite_corpus_eu_v2.1](https://huggingface.co./datasets/asierhv/composite_corpus_eu_v2.1). This dataset is a comprehensive collection of Basque speech data, tailored to enhance the performance of Basque ASR systems.
* **Evaluation Dataset:** The `test` split of `asierhv/composite_corpus_eu_v2.1`.

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:

* **learning_rate:** 1.25e-05
* **train_batch_size:** 32
* **eval_batch_size:** 16
* **seed:** 42
* **optimizer:** AdamW with betas=(0.9, 0.999) and epsilon=1e-08
* **lr_scheduler_type:** linear
* **lr_scheduler_warmup_steps:** 500
* **training_steps:** 10000
* **mixed_precision_training:** Native AMP

### Training results

| Training Loss | Epoch | Step  | Validation Loss | WER      |
|---------------|-------|-------|-----------------|----------|
| 0.3863        | 0.1   | 1000  | 0.4090          | 21.2189  |
| 0.1897        | 0.2   | 2000  | 0.3457          | 15.4490  |
| 0.1379        | 0.3   | 3000  | 0.3283          | 13.5756  |
| 0.1825        | 0.4   | 4000  | 0.3024          | 12.3954  |
| 0.0775        | 0.5   | 5000  | 0.3198          | 11.8771  |
| 0.0975        | 0.6   | 6000  | 0.2924          | 11.2589  |
| 0.1132        | 0.7   | 7000  | 0.2969          | 10.8468  |
| 0.0852        | 0.8   | 8000  | 0.2237          | 9.7727   |
| 0.0585        | 0.9   | 9000  | 0.2317          | 9.6291   |
| 0.0654        | 1.0   | 10000 | 0.2353          | 9.5479   |

### Framework versions

* Transformers 4.49.0.dev0
* Pytorch 2.6.0+cu124
* Datasets 3.3.1.dev0
* Tokenizers 0.21.0