Collections
Discover the best community collections!
Collections including paper arxiv:2411.17698
-
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 89 -
AnimateAnything: Consistent and Controllable Animation for Video Generation
Paper • 2411.10836 • Published • 23 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 111 -
MagicQuill: An Intelligent Interactive Image Editing System
Paper • 2411.09703 • Published • 57
-
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 108 -
Video-Guided Foley Sound Generation with Multimodal Controls
Paper • 2411.17698 • Published • 7 -
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Paper • 2412.01064 • Published • 25 -
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Paper • 2412.01169 • Published • 11
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 10 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 13 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 18