DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Paper • 2412.18597 • Published 1 day ago • 12
Training-free Regional Prompting for Diffusion Transformers Paper • 2411.02395 • Published Nov 4 • 25
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation Paper • 2411.04989 • Published Nov 7 • 14
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper • 2411.04709 • Published Nov 5 • 25
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Paper • 2410.19355 • Published Oct 25 • 23
CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models Paper • 2410.18505 • Published Oct 24 • 10
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design Paper • 2410.05677 • Published Oct 8 • 14
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization Paper • 2410.06244 • Published Oct 8 • 19
GLEE: A Unified Framework and Benchmark for Language-based Economic Environments Paper • 2410.05254 • Published Oct 7 • 80
Agent S: An Open Agentic Framework that Uses Computers Like a Human Paper • 2410.08164 • Published Oct 10 • 24
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models Paper • 2410.07133 • Published Oct 9 • 18
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published Oct 10 • 49
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Paper • 2410.10774 • Published Oct 14 • 25
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14 • 54