|
---
|
|
license: other
|
|
license_name: coqui-public-model-license
|
|
license_link: https://coqui.ai/cpml
|
|
library_name: coqui
|
|
pipeline_tag: text-to-speech
|
|
widget:
|
|
- text: "Once when I was six years old I saw a magnificent picture"
|
|
---
|
|
|
|
# โTTS_v2 - Peter Drury Fine-Tuned Model
|
|
|
|
This repository hosts a fine-tuned version of the โTTS model, utilizing 2.3 minutes of unique voice lines from Peter Drury, The voice lines were sourced from he's podcast with JOE on youtube, can be found here:
|
|
[Peter Drury RANKS His Best Commentary Moments & Reveals Commentary Secrets! MESSI WIN WORLD CUP!](https://www.youtube.com/watch?v=ibT6PINpyaw&t)
|
|
|
|
![Peter Drury](peterdrury.jpg)
|
|
|
|
Listen to a sample of the โTTS_v2 - Peter Drury Fine-Tuned Model:
|
|
|
|
<audio controls>
|
|
<source src="https://huggingface.co./kodoqmc/XTTS-v2_PeterDrury/resolve/main/fromtts.wav" type="audio/wav">
|
|
Your browser does not support the audio element.
|
|
</audio>
|
|
|
|
Here's a Peter Drury mp3 voice line clip from the training data:
|
|
|
|
<audio controls>
|
|
<source src="https://huggingface.co./kodoqmc/XTTS-v2_PeterDrury/resolve/main/reference.wav" type="audio/wav">
|
|
Your browser does not support the audio element.
|
|
</audio>
|
|
|
|
## Features
|
|
- ๐๏ธ **Voice Cloning**: Realistic voice cloning with just a short audio clip.
|
|
- ๐ **Multi-Lingual Support**: Generates speech in 17 different languages while maintaining Peter Drury's voice.
|
|
- ๐ **Emotion & Style Transfer**: Captures the emotional tone and style of the original voice.
|
|
- ๐ **Cross-Language Cloning**: Maintains the unique voice characteristics across different languages.
|
|
- ๐ง **High-Quality Audio**: Outputs at a 24kHz sampling rate for clear and high-fidelity audio.
|
|
|
|
## Supported Languages
|
|
The model supports the following 17 languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko), and Hindi (hi).
|
|
|
|
## Usage in Roll Cage
|
|
๐ค๐ฌ Boost your AI experience with this Ollama add-on! Enjoy real-time audio ๐๏ธ and text ๐ chats, LaTeX rendering ๐, agent automations โ๏ธ, workflows ๐, text-to-image ๐โก๏ธ๐ผ๏ธ, image-to-text ๐ผ๏ธโก๏ธ๐ค, image-to-video ๐ผ๏ธโก๏ธ๐ฅ transformations. Fine-tune text ๐, voice ๐ฃ๏ธ, and image ๐ผ๏ธ gens. Includes Windows macro controls ๐ฅ๏ธ and DuckDuckGo search.
|
|
|
|
[ollama_agent_roll_cage (OARC)](https://github.com/Leoleojames1/ollama_agent_roll_cage) is a completely local Python & CMD toolset add-on for the Ollama command line interface. The OARC toolset automates the creation of agents, giving the user more control over the likely output. It provides SYSTEM prompt templates for each ./Modelfile, allowing users to design and deploy custom agents quickly. Users can select which local model file is used in agent construction with the desired system prompt.
|
|
|
|
## CoquiTTS and Resources
|
|
- ๐ธ๐ฌ **CoquiTTS**: [Coqui TTS on GitHub](https://github.com/coqui-ai/TTS)
|
|
- ๐ **Documentation**: [ReadTheDocs](https://tts.readthedocs.io/en/latest/)
|
|
- ๐ฉโ๐ป **Questions**: [GitHub Discussions](https://github.com/coqui-ai/TTS/discussions)
|
|
- ๐ฏ **Community**: [Discord](https://discord.gg/5eXr5seRrv)
|
|
|
|
## License
|
|
This model is licensed under the [Coqui Public Model License](https://coqui.ai/cpml). Read more about the origin story of CPML [here](https://coqui.ai/blog/tts/cpml).
|
|
|
|
## Contact
|
|
Join our ๐ธCommunity on [Discord](https://discord.gg/fBC58unbKE) and follow us on [Twitter](https://twitter.com/coqui_ai). For inquiries, email us at [email protected].
|
|
|
|
Using ๐ธTTS API:
|
|
|
|
```python
|
|
from TTS.api import TTS
|
|
|
|
tts = TTS(model_path="D:/AI/ollama_agent_roll_cage/AgentFiles/Ignored_TTS/XTTS-v2_PeterDrury/",
|
|
config_path="D:/AI/ollama_agent_roll_cage/AgentFiles/Ignored_TTS/XTTS-v2_PeterDrury/config.json", progress_bar=False, gpu=True).to(self.device)
|
|
|
|
# generate speech by cloning a voice using default settings
|
|
tts.tts_to_file(text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
|
|
file_path="output.wav",
|
|
speaker_wav="/path/to/target/speaker.wav",
|
|
language="en")
|
|
|
|
```
|
|
|
|
Using ๐ธTTS Command line:
|
|
|
|
```console
|
|
tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
|
|
--text "Bugรผn okula gitmek istemiyorum." \
|
|
--speaker_wav /path/to/target/speaker.wav \
|
|
--language_idx tr \
|
|
--use_cuda true
|
|
```
|
|
|
|
Using the model directly:
|
|
|
|
```python
|
|
from TTS.tts.configs.xtts_config import XttsConfig
|
|
from TTS.tts.models.xtts import Xtts
|
|
|
|
config = XttsConfig()
|
|
config.load_json("/path/to/xtts/config.json")
|
|
model = Xtts.init_from_config(config)
|
|
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True)
|
|
model.cuda()
|
|
|
|
outputs = model.synthesize(
|
|
"It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
|
|
config,
|
|
speaker_wav="/data/TTS-public/_refclips/3.wav",
|
|
gpt_cond_len=3,
|
|
language="en",
|
|
)
|
|
```
|
|
|