-
38
Llama 3.2V 11B Cot
💬Generate descriptions and answers by combining text and images
-
Xkev/Llama-3.2V-11B-cot
Image-Text-to-Text • Updated • 4.29k • 150 -
Xkev/LLaVA-CoT-100k
Viewer • Updated • 98.6k • 2.51k • 84 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 124
Guowei Xu PRO
Xkev
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 6 hours ago
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
upvoted
a
paper
about 6 hours ago
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
upvoted
a
paper
13 days ago
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image
Generation
Organizations
None yet