Text-to-Speech
English

Adding defined period pauses to the input text file

#61
by vijay120 - opened

Is there a way to add pauses to the text file to that TTS will pause for "x" number of seconds before continuing with the next sentence? Currently I need to create separate text files and merge the output TTS manually with a "x" second pause.

i second this. would be awesome:

  1. to have the ability to add pauses in a format like [BREAK=3] seconds or something like that
  2. control delay between sentences. now its too fast
  3. add other tags for filler words, etc

I had same issue and my approach was to handle it manually. This approach is available using both torch tensors or numpy arrays, but I think the syntaxis may change a little bit:

  1. I generate the speech segment. Kokoro returns them as numpy arrays but I convert them into torch tensors. It isn't necessary to do this conversion.
  2. I manually create a silence. As Kokoro's audios sample rate is 24000 Hz, to generate a silence of 3 seconds, it could be done as
    silence = torch.zeros(1, 3*24000). Using numpy arrays is the same but np.zeros instead of torch.zeros.
  3. Then I continue generating all the different segments that I need and all this segments, both speech and silence, are appended to a python list.
  4. When I finish with the generation I concatenate all segments with torch.cat() or np.concatenate().

However, it would be very amazing to have like a list of special tokens to perform this kind of things with the model itself. Not only pauses but also laughts, emotion, etc.

Anyways, I hope this is useful to perform the task you are commenting :)

I found when using a series of punctuation marks it inserts a pause. However if you do too many, there is a weird noice like a breath but very unnatural sounding for the voice I was using. Insert the following in your text.
, . , . , . , .

I implemented defined period pause functionality here: https://github.com/vijay120/kokoro-tts?tab=readme-ov-file#input-file-formatting

You just have to provide input file like this:

Welcome to the presentation
PAUSE_2.5
This text comes after a 2.5 second pause
PAUSE_10
And this comes after a 10 second pause

And it will output an audio file with the appropriate pauses between the audio for the defined number of seconds. Let me know if this works for y'all.

@vijay120 , thanks for implementing this feature and providing the link. Using Windows CMD, I've installed everything per your instructions without any issues, but when I run a command, I keep getting the error 'kokoro-tts' is not recognized as an internal or external command when running from the terminal. Would you happen to know what may cause this? Much appreciated :)

Sign up or log in to comment