@hexgrad on Hugging Face: "📣 Looking for labeled, high-quality synthetic audio/TTS data 📣 Have you been…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

hexgrad

posted an update 6 days ago

Post

5096

📣 Looking for labeled, high-quality synthetic audio/TTS data 📣 Have you been or are you currently calling API endpoints from OpenAI, ElevenLabs, etc? Do you have labeled audio data sitting around gathering dust? Let's talk! Join https://discord.gg/QuGxSWBfQy or comment down below.

If your data exceeds quantity & quality thresholds and is approved into the next hexgrad/Kokoro-82M training mix, and you permissively DM me the data under an effective Apache license, then I will DM back the corresponding voicepacks for YOUR data if/when the next Apache-licensed Kokoro base model drops.

What does this mean? If you've been calling closed-source TTS or audio API endpoints to:
- Build voice agents
- Make long-form audio, like audiobooks or podcasts
- Handle customer support, etc
Then YOU can contribute to the training mix and get useful artifacts in return. ❤️

More details at hexgrad/Kokoro-82M#21

hexgrad

6 days ago

•

edited 6 days ago

TLDR: 🚨 Trade Offer 🚨
I receive: Synthetic Audio w/ Text Labels
You receive: Trained Voicepacks for an 82M Apache TTS model
Join https://discord.gg/QuGxSWBfQy to discuss

to-be

2 days ago

In what kind of format do you want this?

Alibrown

5 days ago

Hi, i test it today. Nice work. Will be ther german to in future?

hexgrad

5 days ago

It's simple: what you put in is what you get out. 😄 German support in the future depends mostly on how much German data (synthetic audio + text labels) is contributed.

wadmusa

4 days ago

tell me about quantum machanic

kalmuraee

about 18 hours ago

If you are looking for Arabic data, There are Common Voice data , SADA, MASC , MGB-2 , MGB-3 and MGB-5

zhengjian1996

about 17 hours ago

你好，我是腾皇

CcyberNinja

about 17 hours ago

hallo

In this post