text-to-speech - a MerlinLi Collection

MerlinLi 's Collections

Agent

Yi-LLM

Chinese-Speech-Data

llm-structured-data

mm-lm

text-to-speech

updated Sep 22

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23 • 29
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Paper • 2306.15687 • Published Jun 23, 2023
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5 • 34
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15 • 11
Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts

Paper • 2307.07218 • Published Jul 14, 2023 • 26
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

Paper • 2306.03509 • Published Jun 6, 2023 • 4
parler-tts/dac_44khZ_8kbps

Updated Apr 10 • 1.18k • 15
parler-tts/parler_tts_mini_v0.1

Text-to-Speech • Updated Apr 30 • 24.8k • 346
Wenetspeech4TTS/WenetSpeech4TTS

Updated Jul 25 • 2.16k • 65
liuhuadai/AudioLCM

Text-to-Audio • Updated Jun 6 • 6 • 5
kyutai/mimi

Feature Extraction • Updated Sep 18 • 1.41M • 83