openai/whisper-large-v3-turbo · How do I return timestamps on inference serverless api???

tshroveOntelio

Dec 10, 2024

How do I return timestamps on inference serverless api???

trystoh

Dec 14, 2024

***Beware, the timestamps reset after about 30seconds, and so are somewhat useless as is.

"from huggingface_hub.inference._common import _b64_encode"

^^^ (for converting audio first into a b64 encoding for this method to work)

using requests: "import requests"

YetAnotherAIEnthusiast101

Dec 15, 2024

•

edited Dec 15, 2024

Unfortunately, I have not have much luck with this as well. I wanted to enable verbose responses to get the detected language but so far, no luck.

First, I tried:

#################################################
files = {
"file": open(audio_file_path, "rb"),
"model": "openai/whisper-large-v3-turbo",
"response_format": "verbose_json",
"timestamp_granularities[]": "word",
}

response = requests.post(url, headers=headers, files=files)
#################################################

Results: Transcription was a success but I got only the transcribed text.

Next, I tried:

#################################################
options = {
"parameters":
{
"return_timestamps": True,
response_format:"verbose_json"
}
}

files = {
"file": ("audio.mp3", output_audio_io, "audio/mpeg")
}
response = requests.post(api_url, headers=headers, files=files, json=options)
#################################################

Results: Transcription was a success but I got only the transcribed text.

*** Update: using the helpful comment from @trystoh , I was able to get timestamp, though I am still trying to enable verbose

*** Update2: OOps, according to https://huggingface.co./docs/api-inference/tasks/automatic-speech-recognition, looks like I cannot get verbose responses... If anyone can point me to the right direction I would greatly appreciate it.

trystoh

Dec 15, 2024

Understood, have you tried using a byte 64 encoded audio file?

In the docs it uses tricky logic like, “If not using parameters you can also just use an audio file directly” so you may have to first convert the to byte 64 encoding.

Let me know if I misunderstood, I spent all day on this 😂

tshroveOntelio

Dec 16, 2024

yeah, when I use the byte64 encoded audio file, it tells me the payload is too big.