The luxembourgish part of my multilingual automatic speech recognition (ASR) model is the second Machine Learning (ML) STT model for Luxembourgish. The very first model has been published in May 2022 by Pr Peter Gilles of the University of Luxembourg.

My model has been trained from scratch with my customized dataset mbarnig/lb-2880-STT_CORPUS and the deep-learning-toolkit ๐Ÿธ Coqui-STT (version 1.3.0). The model was trained without punctuations with the following alphabet:

# Each line in this file represents the Unicode codepoint (UTF-8 encoded)
# associated with a numeric index.
# A line that starts with # is a comment. You can escape it with \# if you wish
# to use '#' in the Alphabet.
 
'abcdefghijklmnopqrstuvwxyz ร รกรขรครงรจรฉรซรฎรดรถรปรผ

# The last (non-comment) line needs to end with a newline.

A live inference-demo of the ASR system is available in my HuggingFace space โŒจ๏ธ ๐Ÿ‡ฑ๐Ÿ‡บ ๐Ÿ”ˆ mbarnig/lb-de-fr-en-pt-COQUI-STT.

Click the tab training metrics above to view the live Tensorboard of the model training with the small (2880 samples), with the expanded (27072 samples) dataset, each with and without data augmentation.

tensorboard

The speech recognition models for the other languages have been released by Coqui.ai in the model zoo. I use the following versions in my ASR system:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Space using mbarnig/lb-de-fr-en-pt-coqui-stt-models 1