Automatic Speech Recognition
ESPnet
multilingual
audio
speech-translation
language-identification
pyf98 commited on
Commit
920e145
·
verified ·
1 Parent(s): 3719821

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -14,7 +14,8 @@ license: cc-by-4.0
14
  ---
15
 
16
  [OWSM-CTC](https://aclanthology.org/2024.acl-long.549/) (Peng et al., ACL 2024) is an encoder-only speech foundation model based on hierarchical multi-task self-conditioned CTC.
17
- It is trained on 180k hours of public audio data for multilingual speech recognition, any-to-any speech translation, and language identification, which follows the design of the project, [Open Whisper-style Speech Model (OWSM)](https://www.wavlab.org/activities/2024/owsm/).
 
18
 
19
  This model is initialized with [OWSM-CTC v3.1](https://huggingface.co/pyf98/owsm_ctc_v3.1_1B) and then fine-tuned on [v3.2 data](https://arxiv.org/abs/2406.09282) for 225k steps.
20
 
 
14
  ---
15
 
16
  [OWSM-CTC](https://aclanthology.org/2024.acl-long.549/) (Peng et al., ACL 2024) is an encoder-only speech foundation model based on hierarchical multi-task self-conditioned CTC.
17
+
18
+ This model is trained on 180k hours of public audio data for multilingual speech recognition, any-to-any speech translation, and language identification, which follows the design of the project, [Open Whisper-style Speech Model (OWSM)](https://www.wavlab.org/activities/2024/owsm/).
19
 
20
  This model is initialized with [OWSM-CTC v3.1](https://huggingface.co/pyf98/owsm_ctc_v3.1_1B) and then fine-tuned on [v3.2 data](https://arxiv.org/abs/2406.09282) for 225k steps.
21