ESPnet VITS Text-to-Speech (TTS) Model for ONNX

espnet/kan-bayashi_vctk_vits exported to ONNX. This model is an ONNX export using the espnet_onnx library.

Usage with txtai

txtai has a built in Text to Speech (TTS) pipeline that makes using this model easy.

Note the following example requires txtai >= 7.5

import soundfile as sf

from txtai.pipeline import TextToSpeech

# Build pipeline
tts = TextToSpeech("NeuML/vctk-vits-onnx")

# Generate speech with speaker id
speech, rate = tts("Say something here", speaker=15)

# Write to file
sf.write("out.wav", speech, rate)

Usage with ONNX

This model can also be run directly with ONNX provided the input text is tokenized. Tokenization can be done with ttstokenizer.

Note that the txtai pipeline has additional functionality such as batching large inputs together that would need to be duplicated with this method.

import numpy as np
import onnxruntime
import soundfile as sf
import yaml

from ttstokenizer import TTSTokenizer

# This example assumes the files have been downloaded locally
with open("vctk-vits-onnx/config.yaml", "r", encoding="utf-8") as f:
    config = yaml.safe_load(f)

# Create model
model = onnxruntime.InferenceSession(
    "vctk-vits-onnx/model.onnx",
    providers=["CPUExecutionProvider"]
)

# Create tokenizer
tokenizer = TTSTokenizer(config["token"]["list"])

# Tokenize inputs
inputs = tokenizer("Say something here")

# Generate speech
outputs = model.run(None, {"text": inputs, "sids": np.array([15])})

# Write to file
sf.write("out.wav", outputs[0], 22050)

How to export

More information on how to export ESPnet models to ONNX can be found here.

Speaker reference

The CSTR VCTK Corpus includes speech data uttered by native speakers of English with various accents.

When using this model, set a speaker id from the reference table below. The ref column corresponds to the id in the VCTK dataset.

SPEAKER REF AGE GENDER ACCENTS REGION
1 225 23 F English Southern England
2 226 22 M English Surrey
3 227 38 M English Cumbria
4 228 22 F English Southern England
5 229 23 F English Southern England
6 230 22 F English Stockton-on-tees
7 231 23 F English Southern England
8 232 23 M English Southern England
9 233 23 F English Staffordshire
10 234 22 F Scottish West Dumfries
11 236 23 F English Manchester
12 237 22 M Scottish Fife
13 238 22 F Northern Irish Belfast
14 239 22 F English SW England
15 240 21 F English Southern England
16 241 21 M Scottish Perth
17 243 22 M English London
18 244 22 F English Manchester
19 245 25 M Irish Dublin
20 246 22 M Scottish Selkirk
21 247 22 M Scottish Argyll
22 248 23 F Indian
23 249 22 F Scottish Aberdeen
24 250 22 F English SE England
25 251 26 M Indian
26 252 22 M Scottish Edinburgh
27 253 22 F Welsh Cardiff
28 254 21 M English Surrey
29 255 19 M Scottish Galloway
30 256 24 M English Birmingham
31 257 24 F English Southern England
32 258 22 M English Southern England
33 259 23 M English Nottingham
34 260 21 M Scottish Orkney
35 261 26 F Northern Irish Belfast
36 262 23 F Scottish Edinburgh
37 263 22 M Scottish Aberdeen
38 264 23 F Scottish West Lothian
39 265 23 F Scottish Ross
40 266 22 F Irish Athlone
41 267 23 F English Yorkshire
42 268 23 F English Southern England
43 269 20 F English Newcastle
44 270 21 M English Yorkshire
45 271 19 M Scottish Fife
46 272 23 M Scottish Edinburgh
47 273 23 M English Suffolk
48 274 22 M English Essex
49 275 23 M Scottish Midlothian
50 276 24 F English Oxford
51 277 23 F English NE England
52 278 22 M English Cheshire
53 279 23 M English Leicester
54 280 Unknown
55 281 29 M Scottish Edinburgh
56 282 23 F English Newcastle
57 283 24 F Irish Cork
58 284 20 M Scottish Fife
59 285 21 M Scottish Edinburgh
60 286 23 M English Newcastle
61 287 23 M English York
62 288 22 F Irish Dublin
63 292 23 M Northern Irish Belfast
64 293 22 F Northern Irish Belfast
65 294 33 F American San Francisco
66 295 23 F Irish Dublin
67 297 20 F American New York
68 298 19 M Irish Tipperary
69 299 25 F American California
70 300 23 F American California
71 301 23 F American North Carolina
72 302 20 M Canadian Montreal
73 303 24 F Canadian Toronto
74 304 22 M Northern Irish Belfast
75 305 19 F American Philadelphia
76 306 21 F American New York
77 307 23 F Canadian Ontario
78 308 18 F American Alabama
79 310 21 F American Tennessee
80 311 21 M American Iowa
81 312 19 F Canadian Hamilton
82 313 24 F Irish County Down
83 314 26 F South African Cape Town
84 316 20 M Canadian Alberta
85 317 23 F Canadian Hamilton
86 318 32 F American Napa
87 323 19 F South African Pretoria
88 326 26 M Australian Sydney
89 329 23 F American
90 330 26 F American
91 333 19 F American Indiana
92 334 18 M American Chicago
93 335 25 F New Zealand English
94 336 18 F South African Johannesburg
95 339 21 F American Pennsylvania
96 340 18 F Irish Dublin
97 341 26 F American Ohio
98 343 27 F Canadian Alberta
99 345 22 M American Florida
100 347 26 M South African Johannesburg
101 351 21 F Northern Irish Derry
102 360 19 M American New Jersey
103 361 19 F American New Jersey
104 362 29 F American
105 363 22 M Canadian Toronto
106 364 23 M Irish Donegal
107 374 28 M Australian English
Downloads last month
204
Inference Examples
Inference API (serverless) has been turned off for this model.

Dataset used to train NeuML/vctk-vits-onnx

Collection including NeuML/vctk-vits-onnx