LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models Paper โข 2312.02949 โข Published Dec 5, 2023 โข 11
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents Paper โข 2311.05437 โข Published Nov 9, 2023 โข 48
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V Paper โข 2310.11441 โข Published Oct 17, 2023 โข 26
Semantic-SAM: Segment and Recognize Anything at Any Granularity Paper โข 2307.04767 โข Published Jul 10, 2023 โข 21