Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions Paper • 2503.03278 • Published 4 days ago • 12
ABC: Achieving Better Control of Multimodal Embeddings using VLMs Paper • 2503.00329 • Published 8 days ago • 18
Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content Paper • 2503.02357 • Published 5 days ago • 7
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface Paper • 2503.01342 • Published 6 days ago • 7
Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression Paper • 2503.02812 • Published 5 days ago • 7
SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models Paper • 2503.02876 • Published 5 days ago • 4
How far can we go with ImageNet for Text-to-Image generation? Paper • 2502.21318 • Published 9 days ago • 25
Tell me why: Visual foundation models as self-explainable classifiers Paper • 2502.19577 • Published 11 days ago • 10
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation Paper • 2502.20388 • Published 10 days ago • 14
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published 10 days ago • 26
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published 10 days ago • 27
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published 11 days ago • 56
LaTIM: Measuring Latent Token-to-Token Interactions in Mamba Models Paper • 2502.15612 • Published 16 days ago • 4
An Overview of Large Language Models for Statisticians Paper • 2502.17814 • Published 12 days ago • 4
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published 12 days ago • 69