view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer By Pringled and 1 other • Oct 14, 2024 • 77
view article Article Atlaset Dataset for Moroccan Darija: From Data Collection, Analysis, to Model Trainings By atlasia and 1 other • 4 days ago • 18
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28, 2024 • 192
Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning Paper • 2407.21139 • Published Jul 30, 2024 • 5
view article Article Arabic RAG Leaderboard: A Comprehensive Framework for Evaluating Arabic Language Retrieval Systems By Navid-AI and 1 other • 28 days ago • 11
view article Article Darija Chatbot Arena: Making LLMs Compete in the Moroccan Dialect By atlasia and 2 others • 28 days ago • 13
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 199
Arabic Matryoshka Embedding Models Collection A collection of advanced Arabic Matryoshka Embedding Models designed for efficient and high-performance Arabic NLP, available publicly on Hugging Face • 11 items • Updated 26 days ago • 13
view article Article Train 400x faster Static Embedding Models with Sentence Transformers Jan 15 • 156
view article Article TerjamaBench: A Cultural Benchmark for English-Darija Machine Translation By imomayiz and 4 others • Jan 10 • 30
view article Article Finding Moroccan Arabic (Darija) in Fineweb 2 By omarkamali and 3 others • Dec 8, 2024 • 22
view article Article Memory-efficient Diffusion Transformers with Quanto and Diffusers Jul 30, 2024 • 64