Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Paper • 2503.03983 • Published 4 days ago • 18
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval Paper • 2503.04644 • Published 3 days ago • 19
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 4 days ago • 65
ABC: Achieving Better Control of Multimodal Embeddings using VLMs Paper • 2503.00329 • Published 9 days ago • 18
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers Paper • 2502.20545 • Published 10 days ago • 20
OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment Paper • 2502.18965 • Published 12 days ago • 21
Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions Paper • 2503.00501 • Published 9 days ago • 11
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published 6 days ago • 65
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents Paper • 2502.18017 • Published 13 days ago • 18
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published 10 days ago • 26
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Paper • 2502.20395 • Published 10 days ago • 43