chaanks commited on
Commit
af96964
1 Parent(s): eb26af5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ inference: false
4
+ tags:
5
+ - Vocoder
6
+ - HiFIGAN
7
+ - speech-synthesis
8
+ - speechbrain
9
+ license: "apache-2.0"
10
+ datasets:
11
+ - LJSpeech
12
+ ---
13
+
14
+
15
+ <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
16
+ <br/><br/>
17
+
18
+ # Vocoder with HiFIGAN Unit trained on LJSpeech
19
+
20
+ This repository provides all the necessary tools for using a [HiFiGAN Unit](https://arxiv.org/abs/2104.00355) vocoder trained with [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
21
+
22
+ The pre-trained model take as input discrete self-supervised representations and produces a waveform as output. Typically, this model is utilized on top of a speech-to-unit translation model that converts an input utterance from a source language into a sequence of discrete speech units in a target language.
23
+ To generate the discrete self-supervised representations, we employ a K-means clustering model trained on the 6th layer of HuBERT, with `k=100`.
24
+
25
+ ## Install SpeechBrain
26
+
27
+ First of all, please install tranformers and SpeechBrain with the following command:
28
+
29
+ ```
30
+ pip install speechbrain transformers==4.28.0
31
+ ```
32
+
33
+ Please notice that we encourage you to read our tutorials and learn more about
34
+ [SpeechBrain](https://speechbrain.github.io).
35
+
36
+
37
+ ### Transcribing your own audio files
38
+
39
+ ```python
40
+ from speechbrain.pretrained import UnitHIFIGAN
41
+
42
+ hifi_gan_unit = UnitHIFIGAN.from_hparams(source="speechbrain/hifigan-unit-hubert-l6-k100-ljspeech")
43
+ codes = torch.randint(0, 99, (100,))
44
+ waveform = hifi_gan.decode_unit(codes)
45
+
46
+ ```
47
+
48
+ ### Inference on GPU
49
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
50
+
51
+
52
+ ### Limitations
53
+ The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
54
+
55
+ #### Referencing SpeechBrain
56
+
57
+ ```
58
+ @misc{SB2021,
59
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
60
+ title = {SpeechBrain},
61
+ year = {2021},
62
+ publisher = {GitHub},
63
+ journal = {GitHub repository},
64
+ howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
65
+ }
66
+ ```
67
+
68
+ #### About SpeechBrain
69
+ SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
70
+
71
+ Website: https://speechbrain.github.io/
72
+
73
+ GitHub: https://github.com/speechbrain/speechbrain