Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published 7 days ago • 3
MotiF: Making Text Count in Image Animation with Motion Focal Loss Paper • 2412.16153 • Published 5 days ago • 3
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Paper • 2412.18597 • Published 1 day ago • 10
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval Paper • 2412.15443 • Published 6 days ago • 6
PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models Paper • 2412.18608 • Published 1 day ago • 5
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Paper • 2412.14922 • Published 7 days ago • 73
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published 3 days ago • 32
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching Paper • 2412.17153 • Published 3 days ago • 28
Large Motion Video Autoencoding with Cross-modal Video VAE Paper • 2412.17805 • Published 2 days ago • 20
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 2 days ago • 23
Revisiting In-Context Learning with Long Context Language Models Paper • 2412.16926 • Published 4 days ago • 19
Outcome-Refining Process Supervision for Code Generation Paper • 2412.15118 • Published 6 days ago • 14
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought Paper • 2412.17498 • Published 3 days ago • 15
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 3 days ago • 34
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation Paper • 2412.13649 • Published 8 days ago • 18