Submitted by ChocoWu 43 Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation · 11 authors 3
Submitted by tarsur909 28 CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis · 9 authors 1
Submitted by ChenYi99 26 Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 · 7 authors 2
Submitted by weizhiwang 18 Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources · 5 authors 6
Submitted by wbhu-tc 15 GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors · 6 authors 1
Submitted by xw-eric 15 Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents · 6 authors 1
Submitted by pabloruizponce 13 MixerMDM: Learnable Composition of Human Motion Diffusion Models · 5 authors 1
Submitted by akhaliq 12 Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems? · 7 authors 3
Submitted by Ray121381 11 Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models · 10 authors 1
Submitted by ColorfulAI 11 OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts · 6 authors 1
Submitted by hbXNov 10 When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning · 7 authors
Submitted by deepkyu 9 Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features · 9 authors 1
Submitted by carboncoo 9 AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization · 12 authors 2
Submitted by AndrewZhou924 9 Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models · 8 authors 1
Submitted by akhaliq 6 Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead · 11 authors 1
Submitted by xk-huang 5 m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models · 5 authors 1
Submitted by akhaliq 5 Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs · 4 authors 1
Submitted by MaksimSTW 4 Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base · 9 authors 1
Submitted by MrezaPRZ 3 Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL · 8 authors 1
Submitted by onandon 2 DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting · 2 authors 1
Submitted by akhaliq 2 ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning · 5 authors 1
Submitted by rdkarim 1 MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing · 3 authors 1