Whisper Large V3 Japanese Phone Accent
This is a Whisper model designed to transcribe Japanese speech into Katakana with pitch accent annotations. The model is built upon the whisper-large-v3-turbo and has been fine-tuned using a subset (1/20) of the Galgame-Speech dataset, as well as the jsut-5000 dataset.
Training Data:
- Stage 1: Audio from the Galgame-Speech dataset was used. The text was converted into Katakana sequences with pitch accent annotations using pyopenjtalk.
- Stage 2: JSUT-5000 dataset, using its original training set with pitch accent annotations. The data was split into 90% for training and 10% for evaluation.
Evaluation Results:
- The model achieved a CER (Character Error Rate) of approximately 4% on the JSUT-5000 test set, which is an improvement over the 7% CER of pyopenjtalk.
- Training only with Stage 1 resulted in a CER of 13%, with errors including specific misreadings and misclassification between on'yomi (音èª) and kun'yomi (訓èª) readings. This was improved in Stage 2.
We are currently seeking Japanese pitch accent annotated datasets. If you have such data, please reach out!
- Downloads last month
- 180
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for AkitoP/whisper-large-v3-japense-phone_accent
Base model
openai/whisper-large-v3
Finetuned
openai/whisper-large-v3-turbo