Vietnamese speech dataset Collection for speech-related tasks: speech-to-text & text-to-speech β’ 26 items β’ Updated 1 day ago β’ 21
VoxPopuli v2 Collection A collection of checkpoints from the second VoxPopuli release. β’ 35 items β’ Updated Jan 16, 2024 β’ 6
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages Paper β’ 2503.23542 β’ Published 19 days ago β’ 10
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 β’ 3 items β’ Updated 23 days ago β’ 89
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! β’ 30 items β’ Updated Jun 12, 2024 β’ 236
WhisperLah Collection A collection of Whisper-variants for Singapore languages, e.g. English, Mandarin, Bahasa Malaysia, Tamil β’ 3 items β’ Updated Nov 27, 2024 β’ 1
Whisper pruned Collection Pruned / trimmed versions of whisper models with unnecessary languages removed. β’ 5 items β’ Updated Jan 30 β’ 1
Speech-to-Text dataset Collection Malay and Singlish Speech-to-Text dataset, semisupervised from different models and services. β’ 19 items β’ Updated Dec 23, 2024 β’ 1
distil-large-v3 Collection This collection contains the model repositories for distil-large-v3, which provides support for the most popular Whisper libraries. β’ 4 items β’ Updated Mar 21, 2024 β’ 6
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. β’ 39 items β’ Updated Nov 28, 2024 β’ 361
MaLLaM π Collection Pretrain from scratch 4096 context length on 90B tokens Malaysian text, https://huggingface.co./papers/2401.14680 β’ 10 items β’ Updated Dec 23, 2024 β’ 14
VinaLLaMA Collection Second Generation, Most Powerful Open-Source Vietnamese LLMs. β’ 8 items β’ Updated Feb 9, 2024 β’ 12