This model whas trained with two A100 40 GB, 128 GB RAM and 2 x Xeon 48 Core 2.4 GHz
- Time spent ~ 7 hours
- Count of train dataset - 118k of audio samples from Mozilla Common Voice 17
Example of usage
from transformers import pipeline
import gradio as gr
import time
pipe = pipeline(
model="dvislobokov/whisper-large-v3-turbo-russian",
tokenizer="dvislobokov/whisper-large-v3-turbo-russian",
task='automatic-speech-recognition',
device='cpu'
)
def transcribe(audio):
start = time.time()
text = pipe(audio, return_timestamps=True)['text']
print(time.time() - start)
return text
iface = gr.Interface(
fn=transcribe,
inputs=gr.Audio(sources=['microphone', 'upload'], type='filepath'),
outputs='text'
)
iface.launch(share=True)
- Downloads last month
- 82
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for dvislobokov/whisper-large-v3-turbo-russian
Base model
openai/whisper-large-v3
Finetuned
openai/whisper-large-v3-turbo