-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 16 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 11 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 47
Collections
Discover the best community collections!
Collections including paper arxiv:2312.16886
-
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 44 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 12 -
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices
Paper • 2312.16886 • Published • 19 -
Lenna: Language Enhanced Reasoning Detection Assistant
Paper • 2312.02433 • Published • 2
-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 142 -
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model
Paper • 2401.02330 • Published • 14 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 87 -
Visual Instruction Tuning
Paper • 2304.08485 • Published • 13
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 144 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 28 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 21 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 65
-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 258 -
AppAgent: Multimodal Agents as Smartphone Users
Paper • 2312.13771 • Published • 51 -
ControlRoom3D: Room Generation using Semantic Proxy Rooms
Paper • 2312.05208 • Published • 8 -
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices
Paper • 2312.16886 • Published • 19