Model Card for Wav2Vec2 Large with Common Phone

This is a multilingual phone recognition model optimized with the Common Phone dataset. It was created in the scope of the PhD thesis Phonetic Transfer Learning from Healthy References for the Analysis of Pathological Speech by Philipp Klumpp to analyze pathological speech signals.

Find the Source Code to use this model on GITHUB.

To cite this work, please use the following BibTex snippet:

@phdthesis{klumpp2024phdthesis,
  author  = "Philipp Klumpp",
  title   = "Phonetic Transfer Learning from Healthy References for the Analysis of Pathological Speech",
  school  = "Friedrich-Alexander-Universit{\"a}t Erlangen-N{\"u}rnberg",
  address = "Erlangen, Germany",
  year    = 2024,
  month   = may
}

Model Details

Wav2Vec2 model with linear projection to CTC blank token + 101 phone symbols from the International Phonetic Alphabet (IPA). The model uses 16 kHz audio to predict the most probable sequence of uttered IPA phones.

Model Description

This model was created to analyze pathological speech signals. It was optimized with Common Phone, a multilingual corpus for robust acoustic modelling. It comprises more than 11.000 speakers which were carefully selected from Mozilla's Common Voice dataset. Results in terms of phone error rate (PER) in percent:

Language	Test PER
English	11.0
French	9.9
German	9.8
Italian	9.1
Russian	6.6
Spanish	8.8
Average	9.2

Developed by: Philipp Klumpp
Model type: Wav2Vec2
Languages: Multilingual (English, French, German, Italian, Russian, Spanish)
License: Creative Commons Zero 1.0 (CC0)
Finetuned from model: Wav2Vec2 XLSR-53
Finetuning dataset: Common Phone as published in Common Phone: A Multilingual Dataset for Robust Acoustic Modelling

Model Sources [optional]

Repository: GitHub
Paper: The final print of the thesis will be linked here.

Contact

Philipp Klumpp