Structured 3D Latents for Scalable and Versatile 3D Generation Paper β’ 2412.01506 β’ Published Dec 2, 2024 β’ 56
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types Paper β’ 2409.09269 β’ Published Sep 14, 2024 β’ 8
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper β’ 2409.09214 β’ Published Sep 13, 2024 β’ 51
CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models Paper β’ 2405.13974 β’ Published May 22, 2024 β’ 9
Building and better understanding vision-language models: insights and future directions Paper β’ 2408.12637 β’ Published Aug 22, 2024 β’ 124
view article Article Multimodal Augmentation for Documents: Recovering βComprehensionβ in βReading and Comprehensionβ task By danaaubakirova β’ May 16, 2024 β’ 17