Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.04324

M-A-P Full Paper List

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Paper • 2306.00107 • Published May 31, 2023 • 3
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Paper • 2309.08730 • Published Sep 15, 2023 • 1
ChatMusician: Understanding and Generating Music Intrinsically with LLM

Paper • 2402.16153 • Published Feb 25 • 56
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

Paper • 2401.11944 • Published Jan 22 • 27

Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5 • 65
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

Paper • 2402.04324 • Published Feb 6 • 23
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

Paper • 2402.05195 • Published Feb 7 • 18
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19 • 48

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Paper • 2402.05054 • Published Feb 7 • 25
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

Paper • 2402.04324 • Published Feb 6 • 23
Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5 • 65

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Paper • 2401.15977 • Published Jan 29 • 37
Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23 • 86
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Paper • 2307.04725 • Published Jul 10, 2023 • 64
Boximator: Generating Rich and Controllable Motions for Video Synthesis

Paper • 2402.01566 • Published Feb 2 • 26

Learning Universal Predictors

Paper • 2401.14953 • Published Jan 26 • 19
Anything in Any Scene: Photorealistic Video Object Insertion

Paper • 2401.17509 • Published Jan 30 • 16
SymbolicAI: A framework for logic-based approaches combining generative models and solvers

Paper • 2402.00854 • Published Feb 1 • 19
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

Paper • 2401.17093 • Published Jan 30 • 19

One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning

Paper • 2306.07967 • Published Jun 13, 2023 • 24
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

Paper • 2306.07954 • Published Jun 13, 2023 • 113
TryOnDiffusion: A Tale of Two UNets

Paper • 2306.08276 • Published Jun 14, 2023 • 73
Seeing the World through Your Eyes

Paper • 2306.09348 • Published Jun 15, 2023 • 33

Text to Video Papers

HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models

Paper • 2312.00079 • Published Nov 30, 2023 • 14
VideoBooth: Diffusion-based Video Generation with Image Prompts

Paper • 2312.00777 • Published Dec 1, 2023 • 21
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Paper • 2311.18775 • Published Nov 30, 2023 • 6
Generative Powers of Ten

Paper • 2312.02149 • Published Dec 4, 2023 • 4

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs