Multilingual LLM Evaluation Collection Multilingual Evaluation Benchmarks • 6 items • Updated 12 days ago • 8
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark S Collection SEACrowd is a community movement project aimed at centralizing and standardizing AI resources for Southeast Asian languages, cultures, and/or regions. • 3 items • Updated Jun 18 • 6
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22 • 56
Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 32 items • Updated 28 days ago • 62
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback Paper • 2410.19133 • Published Oct 24 • 11
Multilingual RewardBench Collection Multilingual Reward Model Evaluation Dataset and Results • 2 items • Updated Oct 26 • 4
M-RewardBench: Evaluating Reward Models in Multilingual Settings Paper • 2410.15522 • Published Oct 20 • 11
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper • 2407.19672 • Published Jul 29 • 55
Consent in Crisis: The Rapid Decline of the AI Data Commons Paper • 2407.14933 • Published Jul 20 • 12
Reward Bench Collection Datasets, spaces, and models for the reward model benchmark! • 5 items • Updated 28 days ago • 8
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12 • 65
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Paper • 2406.10118 • Published Jun 14 • 30
Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark Paper • 2311.09122 • Published Nov 15, 2023 • 7
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 27