Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published 15 days ago • 35
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published 13 days ago • 19
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion Paper • 2412.09593 • Published 13 days ago • 17
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters Paper • 2412.00174 • Published 26 days ago • 22
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published Nov 22 • 42
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper • 2411.14982 • Published Nov 22 • 16
Multimodal-SAE Collection The collection of the sae that hooked on llava • 4 items • Updated about 1 month ago • 4
Insight-V Collection Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models • 5 items • Updated Nov 22 • 9
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21 • 21
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Paper • 2411.13503 • Published Nov 20 • 30
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D Paper • 2411.02336 • Published Nov 4 • 23
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Paper • 2410.19355 • Published Oct 25 • 23
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image Paper • 2409.17280 • Published Sep 25 • 9
LMMs-Eval-Lite Collection Making Lite version of the dataset to accelerate holistic evaluation during model development! • 20 items • Updated Oct 4 • 2
LLaVA-OneVision Collection a model good at arbitrary types of visual input • 15 items • Updated Oct 5 • 20
Oryx Collection Oryx: One Multi-Modal LLM for On-Demand Spatial-Temporal Understanding • 6 items • Updated 15 days ago • 15
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published Sep 19 • 24
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion Paper • 2409.12957 • Published Sep 19 • 18
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published Sep 17 • 25