CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs Paper • 2503.01378 • Published 7 days ago • 3
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Paper • 2503.03751 • Published 4 days ago • 18
RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification Paper • 2503.02537 • Published 6 days ago • 10 • 3
RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification Paper • 2503.02537 • Published 6 days ago • 10
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion Paper • 2503.01183 • Published 7 days ago • 26
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models Paper • 2503.01774 • Published 6 days ago • 37
Mobius: Text to Seamless Looping Video Generation via Latent Shift Paper • 2502.20307 • Published 10 days ago • 16
Guardians of the Agentic System: Preventing Many Shots Jailbreak with Agentic System Paper • 2502.16750 • Published 14 days ago • 10
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published 11 days ago • 26
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published 10 days ago • 28
KV-Edit: Training-Free Image Editing for Precise Background Preservation Paper • 2502.17363 • Published 13 days ago • 32
Language Models' Factuality Depends on the Language of Inquiry Paper • 2502.17955 • Published 13 days ago • 29
SIFT: Grounding LLM Reasoning in Contexts via Stickers Paper • 2502.14922 • Published 18 days ago • 29
MoM: Linear Sequence Modeling with Mixture-of-Memories Paper • 2502.13685 • Published 19 days ago • 33
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Paper • 2502.13128 • Published 19 days ago • 37
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper • 2502.12900 • Published 20 days ago • 76