Model Background

This model has been trained on a unique dataset derived from parsed audio and textual data. It's important to note that the dataset originates from recordings and transcriptions of the Bible in Wayuunaiki. Due to proprietary restrictions, the dataset cannot be shared publicly.

Wayuunaiki is the native language of the Wayuu people, predominantly spoken by communities in Colombia and Venezuela. It's a part of the larger Arawakan language family. In the present day, there are a significant number of speakers in both Colombia and Venezuela, making it one of the more widely spoken indigenous languages in the region.

This model represents an initial endeavor in the journey of developing transcription models specifically for indigenous languages. The creation and improvement of such models have profound societal implications. It not only helps in preserving and promoting indigenous languages but also serves as a valuable asset for linguistic studies, helping scholars and communities alike in understanding and promoting the rich cultural tapestry of indigenous languages.

Training Dataset Details

The dataset consists of 1,835 audio recordings, each accompanied by its respective transcription. The lexical corpus encompasses approximately 3,000 unique words.

Total Audio Duration: 6241.65 seconds (approximately 1.7 hours)
Average Audio Duration: 3.41 seconds

This collection of data serves as a foundational resource for understanding and processing the Wayuunaiki language.

The test dataset can be used under the principles of 'fair use' copyright.

Model Accuracy Warning

While this model has shown promising results, it's essential to be aware of its limitations:

Based on the training and validation data, the model has a Word Error Rate (WER) of around 36%. This indicates that while it can capture the essence of most spoken content, errors can still occur.
The model particularly struggles with long vowels, leading to occasional transcription inaccuracies in such instances.
This iteration serves as a starting point and can be instrumental in refining future models. It is efficient in capturing the bulk of words, but like any machine learning model, it's not infallible.

Recommendation: Any transcription produced by this model should undergo subsequent validation and correction to ensure accuracy. This model is an excellent tool for initial drafts but must be used judiciously.

Test it yourself

Transcription	Audio Link
iseeichi chi wayuu aneekünakai nütüma Maleiwa süpüla nuꞌutünajachin aaꞌin süpüla nülaꞌajaainjatüin saainjala wayuu süpüshua sainküin mmakat	Listen here
maa akaapüꞌü tü anneerü oꞌutünapüꞌükat aaꞌin watüma wayakana judíokana shiiꞌiree sülaꞌajaanüin waainjala	Listen here

The table provides sample transcriptions alongside their corresponding audio links. These examples give users an opportunity to listen to the audios and evaluate the transcription performance of the model firsthand. By exploring these samples, users can better understand the strengths and potential areas of refinement for the model, especially concerning specific nuances in the Wayuunaiki language.

Model Description

This model is a speech recognition system trained on a dataset to transcribe audio into text. The model underwent training for 4,000 steps, achieving remarkable improvements in loss metrics during its training journey.

Training Statistics

Initial Training Loss (Step 1000): 0.016
Final Training Loss (Step 4000): 0.000200
Average Training Loss: 0.161

Validation Statistics (at the end of training)

Validation Loss: 0.567
Word Error Rate (WER): 36.3%

Performance Metrics

Training Runtime: 13,696.0441 seconds
Samples Processed Per Second: 4.673
Steps Processed Per Second: 0.292

The model demonstrated promising potential with a consistent reduction in the training loss and a competitive Word Error Rate (WER) during validation.