Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 4 days ago • 64
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published 8 days ago • 51
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published 7 days ago • 20
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published 6 days ago • 65
How far can we go with ImageNet for Text-to-Image generation? Paper • 2502.21318 • Published 9 days ago • 25
Language Models' Factuality Depends on the Language of Inquiry Paper • 2502.17955 • Published 13 days ago • 29
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper • 2502.19414 • Published 11 days ago • 18
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Paper • 2502.17258 • Published 13 days ago • 72
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 17 days ago • 177
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 21 days ago • 141
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published 24 days ago • 38
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation Paper • 2502.08047 • Published 26 days ago • 26
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published 24 days ago • 32
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper • 2502.07870 • Published 26 days ago • 43
Goku: Flow Based Video Generative Foundation Models Paper • 2502.04896 • Published about 1 month ago • 95
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models Paper • 2402.14207 • Published Feb 22, 2024 • 8