Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets Paper • 2311.15127 • Published Nov 25, 2023 • 12
Learning Transferable Visual Models From Natural Language Supervision Paper • 2103.00020 • Published Feb 26, 2021 • 11
U-Net: Convolutional Networks for Biomedical Image Segmentation Paper • 1505.04597 • Published May 18, 2015 • 8
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models Paper • 2112.10741 • Published Dec 20, 2021 • 3
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published Apr 22 • 21
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Paper • 2307.01952 • Published Jul 4, 2023 • 82
Photorealistic Video Generation with Diffusion Models Paper • 2312.06662 • Published Dec 11, 2023 • 23
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers Paper • 2106.10270 • Published Jun 18, 2021 • 2
Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation Paper • 2403.07500 • Published Mar 12
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing Paper • 2305.14720 • Published May 24, 2023 • 2
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Paper • 2405.17414 • Published May 27 • 10
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture Paper • 2301.08243 • Published Jan 19, 2023 • 6
Revisiting Feature Prediction for Learning Visual Representations from Video Paper • 2404.08471 • Published Feb 15 • 1
Guiding Instruction-based Image Editing via Multimodal Large Language Models Paper • 2309.17102 • Published Sep 29, 2023 • 3
SDXL-Lightning: Progressive Adversarial Diffusion Distillation Paper • 2402.13929 • Published Feb 21 • 27
I4VGen: Image as Stepping Stone for Text-to-Video Generation Paper • 2406.02230 • Published Jun 4 • 16
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 30
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild Paper • 2211.14758 • Published Nov 27, 2022 • 1
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion Paper • 2403.12008 • Published Mar 18 • 19
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment Paper • 2310.11513 • Published Oct 17, 2023 • 1
InstructVideo: Instructing Video Diffusion Models with Human Feedback Paper • 2312.12490 • Published Dec 19, 2023 • 17
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2 • 52
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation Paper • 2310.19512 • Published Oct 30, 2023 • 15
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors Paper • 2310.12190 • Published Oct 18, 2023 • 10
PALP: Prompt Aligned Personalization of Text-to-Image Models Paper • 2401.06105 • Published Jan 11 • 47
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition Paper • 2402.15504 • Published Feb 23 • 21
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts Paper • 2408.03209 • Published Aug 6 • 21
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos Paper • 2312.10300 • Published Dec 16, 2023 • 1
Colorful Diffuse Intrinsic Image Decomposition in the Wild Paper • 2409.13690 • Published Sep 20 • 12
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers Paper • 2410.10629 • Published Oct 14 • 8
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Paper • 2410.19355 • Published Oct 25 • 23
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published Nov 4 • 33
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published Nov 7 • 37
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper • 2411.07126 • Published Nov 11 • 28
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 12 days ago • 131