Hibiki ASR Phonemizer

This model is a Phoneme Level Speech Recognition network, originally a fine-tuned version of openai/whisper-large-v3 on a mixture of Different Japanese datasets.

it can detect, transcribe and do the following:

  • non-speech sounds such as gasp, erotic moans, laughter, etc.
  • adding punctuations more faithfully. Don't use this model without the post processing functions I wrote below, or you'll get less than ideal performance. check the notebook.

to reverse the process and get the graphemes, use this model.


How to use

Check here -> Notebook

Intended uses & limitations

No restrictions is imposed by me, but proceed at your own risk, The User (You) are entirely responisble for their actions.

Training and evaluation data

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 24
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 5000

Compute and Duration

  • 1x A100(40G)
  • 64gb RAM
  • BF16
  • 14hrs

Framework versions

  • Transformers 4.41.1
  • Pytorch 2.4.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
54
Safetensors
Model size
1.54B params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for Respair/Hibiki_ASR_Phonemizer_v0.2

Finetuned
(350)
this model