Deping Zhang's picture
4 1

Deping Zhang

Deping

AI & ML interests

Deep Reinforcement Learning, Computer Vision, Large Language Models ( especially their "emergence" capabilities), Theoretical Condensed Matter Physics ( superconductivity, ferromagnetism)

Recent Activity

updated a collection 7 days ago
LLM_VLM_R1
updated a collection 7 days ago
LLM_VLM_R1
updated a collection 7 days ago
LLM_VLM_R1
View all activity

Organizations

None yet

Deping's activity

reacted to vladbogo's post with ❤️ 12 months ago
view post
Post
1388
A new paper introduces Visual CoT, a new approach that enhances multi-modal large language models with visual chain-of-thought reasoning capabilities. This allows language models to dynamically identify and focus on specific regions within images that are most relevant for answering questions, mimicking human-like efficient visual reasoning.

Keypoints:
* Introduces the 373k Visual CoT dataset with bounding box annotations highlighting essential image regions
* Proposes a multi-turn pipeline for focusing on relevant visual inputs
* Achieves strong results on multi-modal benchmarks

Paper: Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models (2403.16999)
Code, data and other resources: https://github.com/deepcs233/Visual-CoT

Congrats to the authors for their work!