ASR Arabic transcription does not provide diacritics
Although the model description claims to have "punctuation and diacritical marks support" this is not the case in practice. I've tried a sample audio in Arabic (see attached audio) using both commandlines provided and the result was identical (i.e. not including diacritics).
Command lines:
python3 NeMo-main/examples/asr/transcribe_speech.py pretrained_name="nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0" audio_dir="./audios"
python3 NeMo-main/examples/asr/transcribe_speech.py pretrained_name="nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0" audio_dir="./audios" decoder_type="ctc"
Result (was generated in Unicode and had to convert it to text btw):
{"audio_filepath": "./audios_short/ar-AR_sample.wav", "pred_text": "\u0647\u0644 \u0628\u0625\u0645\u0643\u0627\u0646\u0643 \u0623\u0646 \u062a\u0639\u0637\u064a\u0646\u064a \u0627\u0644\u0645\u0632\u064a\u062f \u0645\u0646 \u0627\u0644\u0642\u0647\u0648\u0629 \u0645\u0646 \u0641\u0636\u0644\u0643\u061f"}
According to perplexity this Unicode translates to:ل بإمكانك أن تعطيني المزيد من القهوة من فضلك؟
which does not include diacritics.
P.S. I had to comment out line timestamps=cfg.timestamps
in order for transcribe_speech.py
to work.
Input audio:
Can you please provide an example that you did get transcription with diacritics?
Although the model description claims to have "punctuation and diacritical marks support" this is not the case in practice. I've tried a sample audio in Arabic (see attached audio) using both commandlines provided and the result was identical (i.e. not including diacritics).
Hey
@johnkk
! Just wanted to confirm that I experience the same issue as you. I've been comparing the performance of nvidia/stt_ar_fastconformer_hybrid_large_pc_v1.0
and `nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0, and somehow both models got 100% identical WER and CER scores on Fleurs test split. I also saved the as-is generated transcripts before cleaning them for the evaluation, and they look identical too (and certainly no diacritics):
# PC model outputs
بصفة عامة، يمكن أن يظهر سلوكان عندما يبدأ المديرون في قيادة أقرانهم السابقين أحدهما هو محاولة البقاء مثل الآخرين أو الأخريات.
يتصاعد من المصنع دخانا أبيض وهذا ما أظهرته تقارير تلفزيونية.
# PCD model outputs
بصفة عامة، يمكن أن يظهر سلوكان عندما يبدأ المديرون في قيادة أقرانهم السابقين أحدهما هو محاولة البقاء مثل الآخرين أو الأخريات.
يتصاعد من المصنع دخانا أبيض وهذا ما أظهرته تقارير تلفزيونية.
Seeing that both released model weights have the same size of 424MB, I think that maybe the PC checkpoint has been uploaded twice by mistake instead of the PCD checkpoint.