BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 12
M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models Paper • 2406.16783 • Published Jun 24, 2024 • 4
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content Paper • 2406.11811 • Published Jun 17, 2024 • 16