Gemini: A Family of Highly Capable Multimodal Models Paper • 2312.11805 • Published Dec 19, 2023 • 44
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis Paper • 2312.13314 • Published Dec 20, 2023 • 7
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 257
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Paper • 2312.09911 • Published Dec 15, 2023 • 53
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models Paper • 2312.09608 • Published Dec 15, 2023 • 13
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing Paper • 2312.11392 • Published Dec 18, 2023 • 19
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models Paper • 2312.14091 • Published Dec 21, 2023 • 15
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models Paper • 2410.02416 • Published Oct 3 • 26
FashionComposer: Compositional Fashion Image Generation Paper • 2412.14168 • Published 7 days ago • 16
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 10 days ago • 41
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 12 days ago • 131
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published 13 days ago • 19
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs Paper • 2412.08347 • Published 15 days ago • 4
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published 8 days ago • 13