-
High-Quality Image Restoration Following Human Instructions
Paper • 2401.16468 • Published • 12 -
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding
Paper • 2401.15708 • Published • 11 -
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support
Paper • 2401.14688 • Published • 13 -
TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
Paper • 2401.14828 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2401.09865
-
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 33 -
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 11 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 16 -
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 26
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 9 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 16 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 59 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 73
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 14 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 47 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 40
-
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
Paper • 2312.04655 • Published • 20 -
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
Paper • 2312.07536 • Published • 16 -
Clockwork Diffusion: Efficient Generation With Model-Step Distillation
Paper • 2312.08128 • Published • 12 -
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Paper • 2312.07661 • Published • 16
-
De-Diffusion Makes Text a Strong Cross-Modal Interface
Paper • 2311.00618 • Published • 21 -
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 57 -
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Paper • 2311.13231 • Published • 26 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 47