how to handle input audio files with either white noise or general noise and no speech
it seems this model performs extremely well when there's actual discernable language/conversation, but if i test with an audio clip that contains non-discernable noise, it produces a bunch of gibberish. is there any way to prevent it from generating gibberish?
here's an example of gibberish produced from white noise audio:
2023-12-14 01:35:51,945 [INFO] Takk for watching! 1 tbsps of butter 1 tbsps of flour 1 tbsps of baking powder 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1.5 kg of pork belly [0:00:03.359228s]
Use VAD and cut no-speech chunks.
https://huggingface.co./pyannote/voice-activity-detection
https://github.com/snakers4/silero-vad
thanks this worked perfectly for me!