ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding Paper • 2501.05452 • Published 4 days ago • 4
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 3 days ago • 21
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? Paper • 2501.05510 • Published 4 days ago • 21
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints Paper • 2501.03841 • Published 6 days ago • 33
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published 3 days ago • 33
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains Paper • 2501.05707 • Published 3 days ago • 3
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning Paper • 2501.04698 • Published 5 days ago • 5
Multi-subject Open-set Personalization in Video Generation Paper • 2501.06187 • Published 3 days ago • 4
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities Paper • 2308.12966 • Published Aug 24, 2023 • 8
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution Paper • 2501.05040 • Published 4 days ago • 11