lucasnewman commited on
Commit
663e962
·
verified ·
1 Parent(s): 866facc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -2
README.md CHANGED
@@ -4,6 +4,53 @@ tags:
4
  - mlx
5
  ---
6
 
7
- This model has been converted from Pytorch to .safetensors for MLX.
8
 
9
- See [F5-TTS](https://huggingface.co/SWivid/F5-TTS) for the original checkpoint.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - mlx
5
  ---
6
 
7
+ # F5 TTS MLX
8
 
9
+ [F5 TTS](https://arxiv.org/abs/2410.06885) for the [MLX](https://github.com/ml-explore/mlx) framework.
10
+
11
+ This model is reshaped for MLX from the original weights and is designed for use with [f5-tts-mlx](https://github.com/lucasnewman/f5-tts-mlx)
12
+
13
+ F5 TTS is a non-autoregressive, zero-shot text-to-speech system using a flow-matching mel spectrogram generator with a diffusion transformer (DiT).
14
+
15
+ You can listen to a [sample here](https://s3.amazonaws.com/lucasnewman.datasets/f5tts/sample.wav) that was generated in ~11 seconds on an M3 Max MacBook Pro.
16
+
17
+ See [F5-TTS](https://huggingface.co/SWivid/F5-TTS) for the original checkpoint.
18
+
19
+ ## Installation
20
+
21
+ ```bash
22
+ pip install f5-tts-mlx
23
+ ```
24
+
25
+ ## Usage
26
+
27
+ ```bash
28
+ python -m f5_tts_mlx.generate --text "The quick brown fox jumped over the lazy dog."
29
+ ```
30
+
31
+ If you want to use your own reference audio sample, make sure it's a mono, 24kHz wav file of around 5-10 seconds:
32
+
33
+ ```bash
34
+ python -m f5_tts_mlx.generate \
35
+ --text "The quick brown fox jumped over the lazy dog."
36
+ --ref-audio /path/to/audio.wav
37
+ --ref-text "This is the caption for the reference audio."
38
+ ```
39
+
40
+ You can convert an audio file to the correct format with ffmpeg like this:
41
+
42
+ ```bash
43
+ ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav
44
+ ```
45
+
46
+ See [here](https://github.com/lucasnewman/f5-tts-mlx/tree/main/f5_tts_mlx) for more options to customize generation.
47
+
48
+
49
+
50
+ You can load a pretrained model from Python like this:
51
+
52
+ ```python
53
+ from f5_tts_mlx.generate import generate
54
+
55
+ audio = generate(text = "Hello world.", ...)
56
+ ```