Automatic Speech Recognition
Transformers
Safetensors
wav2vec2
mms
xlsr
Inference Endpoints

Massively Multilingual Speech (MMS) - Finetuned ASR - ALL

This is a checkpoint of MMS Zero-shot project, a model to transcribe the speech of almost any language using only a small amount of unlabeled text in the new language. The approach is based on a multilingual acoustic model trained on data in 1,150 languages (leveraging the data of MMS) which outputs transcriptions in an intermediate representation (uroman tokens). A small amount of text in the new, unseen language is then also mapped to the this intermediate representation and at infernce time, this mapping, with an optional language model, enables transcribing a new language.

Table Of Content

Example

Please have a look at the official space for an example on using the model.

Model details

  • Developed by: Jinming Zhao et al.

  • Model type: Scaling A Simple Approach to Zero-Shot Speech Recognition

  • License: CC-BY-NC 4.0 license

  • Num parameters: 300 million

  • Cite as:

    @article{zhao2024scaling,
      title={Scaling A Simple Approach to Zero-Shot Speech Recognition},
      author={Zhao, Jinming and Pratap, Vineel and Auli, Michael},
      journal={arXiv preprint arXiv:2407.17852},
      year={2024}
    }
    

Additional Links

Downloads last month
770
Safetensors
Model size
315M params
Tensor type
F32
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for mms-meta/mms-zeroshot-300m

Finetunes
8 models

Datasets used to train mms-meta/mms-zeroshot-300m

Spaces using mms-meta/mms-zeroshot-300m 3