German Voice Cloning TTS Model using F5-TTS Architecture

A German Text-to-Speech system capable of cloning voices from a few seconds of reference audio, built on the F5-TTS architecture.

Model Details

Developed by: Johanna Reiml and team at KI-Servicezentrum, Hasso-Plattner-Institut (HPI)
Base Model: SWivid/F5-TTS
Paper: F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Key Features & Capabilities

Generates natural-sounding German speech from text
Clones voices using minimal reference audio (few seconds)
Suitable for audiobooks, voice assistants, and accessibility applications

Technical Specifications

Download checkpoints from the directories F5TTS_Base (vocos) or F5TTS_Base_bigvgan (bigvgan).

Datasets: Common Voice (Mozilla) and Emilia_DE
Process: Fine-tuned checkpoints of base F5-TTS model
Trained on Hardware: 8x NVIDIA H100

Contact

AI Service Center: [email protected]
Johanna Reiml: [email protected]
Enes Suermeli: [email protected]
Kajo Kratzenstein: [email protected]
Carlos Menke: [email protected]

Acknowledgements

The authors acknowledge the financial support by the German Federal Ministry for Education and Research (BMBF) through the project «KI-Servicezentrum Berlin Brandenburg» (01IS22092).

aihpi
/

F5-TTS-German