59 12 40

Aayan Mishra

Spestly

https://aayan-mishra.vercel.app/

AI & ML interests

None yet

Recent Activity

published a model 2 days ago

Spestly/Axis-1-14B

updated a model 5 days ago

Spestly/Everest-N1-360M

published a model 5 days ago

Spestly/Everest-N1-360M

View all activity

Organizations

Spestly's activity

published a model 2 days ago

Spestly/Axis-1-14B

Updated 2 days ago

updated a model 5 days ago

Spestly/Everest-N1-360M

Text Generation • Updated 5 days ago • 18

published a model 5 days ago

Spestly/Everest-N1-360M

Text Generation • Updated 5 days ago • 18

published a model 9 days ago

Spestly/Carbon-1-14B-R1

Updated 9 days ago • 1

updated a collection 11 days ago

DeepOpus-1

Collection

A powerful hybrid model (Intuitive and CoT) designed for various tasks! - Developed by Spestly on behalf of Lambda Go Labs • 2 items • Updated 11 days ago

New activity in open-llm-leaderboard/open_llm_leaderboard 12 days ago

Error in benchmarking model - rubenroy/Gilgamesh-72B

#1115 opened 12 days ago by

Spestly

liked a model 12 days ago

Wan-AI/Wan2.1-T2V-14B

Text-to-Video • Updated 11 days ago • 186k • • 956

liked a Space 13 days ago

0x Mini

💬

Chat with a friendly AI assistant

updated a model 13 days ago

Spestly/Kyro-n1.1-3B

Updated 13 days ago • 10

published a model 15 days ago

Spestly/Kyro-n1.1-3B

Updated 13 days ago • 10

reacted to hexgrad's post with 🔥 15 days ago

Post

5640

I wrote an article about G2P: https://hf.co/blog/hexgrad/g2p

G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit.

Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio.

Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.