Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2411.17698

Video-Guided Foley Sound Generation with Multimodal Controls

Paper • 2411.17698 • Published 29 days ago • 7

Video-Guided Foley Sound Generation with Multimodal Controls

Paper • 2411.17698 • Published 29 days ago • 7

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22 • 89
AnimateAnything: Consistent and Controllable Animation for Video Generation

Paper • 2411.10836 • Published Nov 16 • 23
LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15 • 111
MagicQuill: An Intelligent Interactive Image Editing System

Paper • 2411.09703 • Published Nov 14 • 57

Omni-Generation

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17 • 108
Video-Guided Foley Sound Generation with Multimodal Controls

Paper • 2411.17698 • Published 29 days ago • 7
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait

Paper • 2412.01064 • Published 24 days ago • 25
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Paper • 2412.01169 • Published 24 days ago • 11

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30 • 10
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5 • 13
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5 • 18

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs