Is there an empiric explanation to the speed parameter of the generation function?
Hi! As I said in a previous discussion I posted, thanks for all the amazing work. Kokoro is amazing and I think it is bringing great things to the world of TTS synthesizers.
What I mean with the question of the title is that if the speed adjustment of the generated audio specified by the "speed" parameter has some sort of reliable/mathematical explanation or it is kind of random? With mathematical explanation I'm referring to if it has a similar effect that with, e.g. time stretching algorithms, for which the speed parameter normally refer to the time length ratio and a certain value will always return the same audio length.
In this case, although audio generation is made with generative AI and, of course, this speed parameter is used by the generative process of the audio, can be made some kind of explanation to this parameter? As, for example, if I generate two speech audios, one with speed=1 and another with speed=0.8, the second audio speed is x0.8 times the first one? Or the length of the first audio is 0.8 times the length of the second audio?
Thanks in advance!