nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0 · ASR Arabic transcription does not provide diacritics

ASR Arabic transcription does not provide diacritics

by johnkk - opened 4 days ago

4 days ago

•

Although the model description claims to have "punctuation and diacritical marks support" this is not the case in practice. I've tried a sample audio in Arabic (see attached audio) using both commandlines provided and the result was identical (i.e. not including diacritics).

Command lines:

python3 NeMo-main/examples/asr/transcribe_speech.py pretrained_name="nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0"  audio_dir="./audios"
python3 NeMo-main/examples/asr/transcribe_speech.py pretrained_name="nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0" audio_dir="./audios" decoder_type="ctc"

Result (was generated in Unicode and had to convert it to text btw):

{"audio_filepath": "./audios_short/ar-AR_sample.wav", "pred_text": "\u0647\u0644 \u0628\u0625\u0645\u0643\u0627\u0646\u0643 \u0623\u0646 \u062a\u0639\u0637\u064a\u0646\u064a \u0627\u0644\u0645\u0632\u064a\u062f \u0645\u0646 \u0627\u0644\u0642\u0647\u0648\u0629 \u0645\u0646 \u0641\u0636\u0644\u0643\u061f"}

According to perplexity this Unicode translates to:ل بإمكانك أن تعطيني المزيد من القهوة من فضلك؟ which does not include diacritics.

P.S. I had to comment out line timestamps=cfg.timestamps in order for transcribe_speech.py to work.

Input audio:

Can you please provide an example that you did get transcription with diacritics?

oovword

3 days ago

•

edited 3 days ago

Although the model description claims to have "punctuation and diacritical marks support" this is not the case in practice. I've tried a sample audio in Arabic (see attached audio) using both commandlines provided and the result was identical (i.e. not including diacritics).

Hey @johnkk ! Just wanted to confirm that I experience the same issue as you. I've been comparing the performance of nvidia/stt_ar_fastconformer_hybrid_large_pc_v1.0 and `nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0, and somehow both models got 100% identical WER and CER scores on Fleurs test split. I also saved the as-is generated transcripts before cleaning them for the evaluation, and they look identical too (and certainly no diacritics):

# PC model outputs
بصفة عامة، يمكن أن يظهر سلوكان عندما يبدأ المديرون في قيادة أقرانهم السابقين أحدهما هو محاولة البقاء مثل الآخرين أو الأخريات.
يتصاعد من المصنع دخانا أبيض وهذا ما أظهرته تقارير تلفزيونية.

# PCD model outputs
بصفة عامة، يمكن أن يظهر سلوكان عندما يبدأ المديرون في قيادة أقرانهم السابقين أحدهما هو محاولة البقاء مثل الآخرين أو الأخريات.
يتصاعد من المصنع دخانا أبيض وهذا ما أظهرته تقارير تلفزيونية.

Seeing that both released model weights have the same size of 424MB, I think that maybe the PC checkpoint has been uploaded twice by mistake instead of the PCD checkpoint.

lilgrigs

NVIDIA org 2 days ago

Hi @johnkk and @oovword ,

Thank you for your comments! You’re absolutely right—due to a mistake, the PC checkpoint was released instead of the PCD checkpoint. Apologies for the oversight. The correct checkpoint has now been published.

oovword

about 24 hours ago

@lilgrigs Happy to help!

lilgrigs changed discussion status to closed about 11 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment