Whisper-large-et
This is a Whisper-large-v2 model openai/whisper-large-v2 finetuned on around 1200 hours of diverse Estonian data.
Model description
This is a general-purpose Estonian ASR model trained in the Lab of Language Technology at TalTech.
Intended uses & limitations
This model is intended for general-purpose speech recognition, such as broadcast conversations, interviews, talks, etc.
How to use
Recommended: use faster-whisper.
For example:
Convert the HF model to CT2 format:
ct2-transformers-converter --model TalTechNLP/whisper-large-et --output_dir whisper-large-et.ct2 --copy_files tokenizer.json --quantization float16
Decode:
whisper-ctranslate2 --model_directory whisper-large-et.ct2 --task transcribe --language et --beam_size 5 some_file.mp3
Limitations and bias
Since this model was trained on mostly broadcast speech and texts from the web, it might have problems correctly decoding the following:
- Speech containing technical and other domain-specific terms
- Children's speech
- Non-native speech
- Speech recorded under very noisy conditions or with a microphone far from the speaker
- Very spontaneous and overlapping speech
Training data
Acoustic training data:
Type | Amount (h) |
---|---|
Broadcast speech | 991 |
Spontaneous speech | 53 |
Elderly speech corpus | 53 |
Talks, lectures | 49 |
Parliament speeches | 31 |
Total | 1161 |
Training procedure
Finetuned using Espnet, and then comverted to transformers format using this script. Finetuning procedure is similar to this model. Finetuning was done for 3 epochs, with model averaging at the end of training.
Update: 2023-10-03 version of the model is trained on long segments (like the original Whisper model) and is therefore especially well suited to be used e.g. with faster-whisper to transcribe long speech recordings "end-to-end" (i.e., without any prior segmentation).
Evaluation results
WER
WER results below are obtained using greedy decoding (i.e., beam size 1).
Dataset | WER |
---|---|
Common Voice 8.0 | 11.3 |
Common Voice 11.0 | 12.0 |
- Downloads last month
- 350
Evaluation results
- Test WER on Common Voice 11test set self-reported12.030
- Test CER on Common Voice 11test set self-reported3.180
- Test WER on Common Voice 8test set self-reported11.350
- Test CER on Common Voice 8test set self-reported2.750