Safetensors
File size: 3,525 Bytes
191065c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ad79bc4
 
191065c
 
 
 
 
 
 
 
 
e59e80f
191065c
 
29253d4
 
 
 
 
 
191065c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35871da
 
191065c
 
 
 
 
29253d4
191065c
 
 
 
 
 
 
 
 
 
 
29253d4
 
 
 
 
 
 
 
 
 
191065c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
license: cc-by-nc-4.0
base_model:
- SWivid/F5-TTS
---

This is a pruned and re-organized version of [SWivid/F5-TTS](https://huggingface.co./SWivid/F5-TTS), made to be used with the `fairytaler` Python library, an unofficial reimplementation of F5TTS made for fast and lightweight inference.

# Installation

Fairytaler assumes you have a working CUDA environment to install into.

```
pip install fairytaler
```

This will install [the reimplementation library](https://github.com/painebenjamin/fairytaler/).

# How to Use

You do not need to pre-download anything, necessary data will be downloaded at runtime.

## Command Line

Use the `fairytaler` binary from the command line like so:

```sh
fairytaler examples/reference.wav examples/reference.txt "Fairytaler is an unofficial minimal re-implementation of F5 TTS."
```

| Reference Audio | Generated Audio |
| --------------- | --------------- |
| <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/SBSzkafZSdjIQERVpDcqf.wav"></audio> | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/5VGepj6y7wb4qd0-p-IQq.wav"></audio> |

*Reference audio sourced from [DiPCo](https://huggingface.co./datasets/benjamin-paine/dinner-party-corpus)*

Many options are available, for complete documentation run `fairytaler --help`.

## Python

```py
from fairytaler import F5TTSPipeline

pipeline = F5TTSPipeline.from_pretrained("benjamin-paine/fairytaler", device="auto")
output_wav_file = pipeline(
  text="Hello, this is some test audio!",
  reference_audio="examples/reference.wav",
  reference_text="examples/reference.txt",
  output_save=True
)
print(f"Output saved to {output_wav_file}")
```

The full execution signature is:

```py
def __call__(
    self,
    text: Union[str, List[str]],
    reference_audio: AudioType,
    reference_text: str,
    reference_sample_rate: Optional[int]=None,
    seed: SeedType=None,
    speed: float=1.0,
    sway_sampling_coef: float=-1.0,
    target_rms: float=0.1,
    cross_fade_duration: float=0.15,
    punctuation_pause_duration: float=0.10,
    num_steps: int=32,
    cfg_strength: float=2.0,
    fix_duration: Optional[float]=None,
    use_tqdm: bool=False,
    output_format: AUDIO_OUTPUT_FORMAT_LITERAL="wav",
    output_save: bool=False,
    chunk_callback: Optional[Callable[[AudioResultType], None]]=None,
    chunk_callback_format: AUDIO_OUTPUT_FORMAT_LITERAL="float",
) -> AudioResultType
```

Format values are `wav`, `ogg`, `flac`, `mp3`, `float` and `int`. Passing `output_save=True` will save to file, not passing it will return the data directly.

# Citations

```
@misc{chen2024f5ttsfairytalerfakesfluent,
      title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, 
      author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
      year={2024},
      eprint={2410.06885},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2410.06885}, 
}

@misc{vansegbroeck2019dipcodinnerparty,
      title={DiPCo -- Dinner Party Corpus}, 
      author={Maarten Van Segbroeck and Ahmed Zaid and Ksenia Kutsenko and Cirenia Huerta and Tinh Nguyen and Xuewen Luo and Björn Hoffmeister and Jan Trmal and Maurizio Omologo and Roland Maas},
      year={2019},
      eprint={1909.13447},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/1909.13447}, 
}
```