This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the OPENSLR_SLR53 - bengali dataset. It achieves the following results on the evaluation set.
Without language model :
- WER: 0.21726385291857586
- CER: 0.04725010353701041
With 5 gram language model trained on 30M sentences randomly chosen from AI4Bharat IndicCorp dataset :
- WER: 0.15322879016421437
- CER: 0.03413696666806267
Note : 5% of a total 10935 samples have been used for evaluation. Evaluation set has 10935 examples which was not part of training training was done on first 95% and eval was done on last 5%. Training was stopped after 180k steps. Output predictions are available under files section.
Training hyperparameters
The following hyperparameters were used during training:
- dataset_name="openslr"
- model_name_or_path="facebook/wav2vec2-xls-r-300m"
- dataset_config_name="SLR53"
- output_dir="./wav2vec2-xls-r-300m-bengali"
- overwrite_output_dir
- num_train_epochs="50"
- per_device_train_batch_size="32"
- per_device_eval_batch_size="32"
- gradient_accumulation_steps="1"
- learning_rate="7.5e-5"
- warmup_steps="2000"
- length_column_name="input_length"
- evaluation_strategy="steps"
- text_column_name="sentence"
- chars_to_ignore , ? . ! - ; : " β % β β οΏ½ β β β¦ β
- save_steps="2000"
- eval_steps="3000"
- logging_steps="100"
- layerdrop="0.0"
- activation_dropout="0.1"
- save_total_limit="3"
- freeze_feature_encoder
- feat_proj_dropout="0.0"
- mask_time_prob="0.75"
- mask_time_length="10"
- mask_feature_prob="0.25"
- mask_feature_length="64"
- preprocessing_num_workers 32
Framework versions
- Transformers 4.16.0.dev0
- Pytorch 1.10.1+cu102
- Datasets 1.17.1.dev0
- Tokenizers 0.11.0
Notes
- Training and eval code modified from : https://github.com/huggingface/transformers/tree/master/examples/research_projects/robust-speech-event.
- Bengali speech data was not available from common voice or librispeech multilingual datasets, so OpenSLR53 has been used.
- Minimum audio duration of 0.5s has been used to filter the training data which excluded may be 10-20 samples.
- OpenSLR53 transcripts are not part of LM training and LM used to evaluate.
- Downloads last month
- 217
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train arijitx/wav2vec2-xls-r-300m-bengali
Spaces using arijitx/wav2vec2-xls-r-300m-bengali 2
Evaluation results
- Test WER on Open SLRself-reported0.217
- Test CER on Open SLRself-reported0.047
- Test WER with lm on Open SLRself-reported0.153
- Test CER with lm on Open SLRself-reported0.034