Automatic Speech Recognition
ESPnet
English
audio
audio_captioning