-
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 42 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 60 -
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
Paper • 2404.05674 • Published • 13 -
Agentless: Demystifying LLM-based Software Engineering Agents
Paper • 2407.01489 • Published • 42
Collections
Discover the best community collections!
Collections including paper arxiv:2404.05674
-
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Paper • 2403.17804 • Published • 16 -
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
Paper • 2403.16990 • Published • 25 -
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 30 -
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 11
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 38 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 19
-
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Paper • 2312.13964 • Published • 18 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 258 -
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 69 -
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model
Paper • 2401.02330 • Published • 14
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 45 -
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Paper • 2312.14125 • Published • 44 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 30 -
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Paper • 2401.01256 • Published • 19
-
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 50 -
FreeU: Free Lunch in Diffusion U-Net
Paper • 2309.11497 • Published • 64 -
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Paper • 2310.00426 • Published • 61 -
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 81