Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published 6 days ago • 102
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 7 days ago • 76
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published 11 days ago • 53
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published 10 days ago • 23
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published 9 days ago • 72
How far can we go with ImageNet for Text-to-Image generation? Paper • 2502.21318 • Published 12 days ago • 25
Language Models' Factuality Depends on the Language of Inquiry Paper • 2502.17955 • Published 16 days ago • 30
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper • 2502.19414 • Published 14 days ago • 18
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Paper • 2502.17258 • Published 17 days ago • 73
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 21 days ago • 178
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 25 days ago • 142
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published 27 days ago • 39
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation Paper • 2502.08047 • Published 29 days ago • 26
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published 27 days ago • 32
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper • 2502.07870 • Published 29 days ago • 43