On Speculative Decoding for Multimodal Large Language Models Paper • 2404.08856 • Published Apr 13 • 13
BLINK: Multimodal Large Language Models Can See but Not Perceive Paper • 2404.12390 • Published Apr 18 • 24
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19 • 30