Fine-Tuned Parakeet RNNT 0.6B (Urdu)
This repository contains the fine-tuned version of the Parakeet RNNT 0.6B model for Urdu Automatic Speech Recognition (ASR). The base model, developed by NVIDIA NeMo and Suno.ai, was fine-tuned on the Urdu dataset from Mozilla's Common Voice 12.0. This fine-tuning enables the model to perform speech-to-text tasks in Urdu with improved accuracy and domain-specific adaptation.
Model Overview
The Parakeet RNNT is an XL version of the FastConformer Transducer with 600 million parameters, optimized for ASR tasks. The fine-tuned model supports Urdu transcription, enabling applications such as subtitling, speech analytics, and voice-assisted interfaces.
Base model details can be found on 🤗 Hugging Face.
Training Details
Dataset
The fine-tuning was performed using the Urdu dataset from Mozilla's Common Voice 12.0. This dataset provides diverse speech samples in Urdu, ensuring robust training.
Hardware
- Google Colab Pro
- NVIDIA A100 GPU
Results
The model achieved a Word Error Rate (WER) of 25.513% on the test split of the Common Voice Urdu dataset. While this may seem high, the model demonstrates impressive accuracy in many transcriptions:
- Reference: کچھ بھی ہو سکتا ہے۔
Predicted: کچھ بھی ہو سکتا ہے۔
- Reference: اورکوئی جمہوریت کو کوس رہا ہے۔
Predicted: اور کوئ جمہوریت کو کو س رہا ہے۔
This WER is slightly higher than OpenAI's Whisper model, which achieved 23% without fine-tuning (reference), but demonstrates the potential of the Parakeet RNNT with further fine-tuning.
How to Use this Model
Loading the Model
You can load the fine-tuned model using NVIDIA NeMo:
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained(model_name="hash2004/parakeet-fine-tuned-urdu")
How to Fine Tune this Model
You can find all resources on fine-tuning the Parakeet RNNT (0.6B) model on this GitHub Repository.
- Downloads last month
- 10
Dataset used to train hash2004/parakeet-fine-tuned-urdu
Evaluation results
- Test WER on Mozilla Common Voice 12.0 (Urdu)test set self-reported25.513