Ksenia Se

Kseniase

AI & ML interests

None yet

Recent Activity

replied to their post 2 days ago
9 Multimodal Chain-of-Thought methods How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer. Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source: 1. KAM-CoT -> https://huggingface.co./papers/2401.12863 This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy 2. Multimodal Visualization-of-Thought (MVoT) -> https://huggingface.co./papers/2501.07542 Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality 3. Compositional CoT (CCoT) -> https://huggingface.co./papers/2311.17076 Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks 4. URSA -> https://huggingface.co./papers/2501.04686 Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification 5. MM-Verify -> https://huggingface.co./papers/2502.13383 Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning 6. Duty-Distinct CoT (DDCoT) -> https://huggingface.co./papers/2310.16436 Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process 7. Multimodal-CoT from Amazon Web Services -> https://huggingface.co./papers/2302.00923 A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs 8. Graph-of-Thought (GoT) -> https://huggingface.co./papers/2305.16582 This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks More in the comments👇
posted an update 2 days ago
9 Multimodal Chain-of-Thought methods How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer. Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source: 1. KAM-CoT -> https://huggingface.co./papers/2401.12863 This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy 2. Multimodal Visualization-of-Thought (MVoT) -> https://huggingface.co./papers/2501.07542 Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality 3. Compositional CoT (CCoT) -> https://huggingface.co./papers/2311.17076 Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks 4. URSA -> https://huggingface.co./papers/2501.04686 Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification 5. MM-Verify -> https://huggingface.co./papers/2502.13383 Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning 6. Duty-Distinct CoT (DDCoT) -> https://huggingface.co./papers/2310.16436 Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process 7. Multimodal-CoT from Amazon Web Services -> https://huggingface.co./papers/2302.00923 A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs 8. Graph-of-Thought (GoT) -> https://huggingface.co./papers/2305.16582 This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks More in the comments👇
View all activity

Organizations

Turing Post's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Sandbox's profile picture

Posts 16

view post
Post
1556
9 Multimodal Chain-of-Thought methods

How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer.

Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source:

1. KAM-CoT -> KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning (2401.12863)
This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy

2. Multimodal Visualization-of-Thought (MVoT) -> Imagine while Reasoning in Space: Multimodal Visualization-of-Thought (2501.07542)
Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality

3. Compositional CoT (CCoT) -> Compositional Chain-of-Thought Prompting for Large Multimodal Models (2311.17076)
Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks

4. URSA -> URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics (2501.04686)
Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification

5. MM-Verify -> MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification (2502.13383)
Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning

6. Duty-Distinct CoT (DDCoT) -> DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models (2310.16436)
Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process

7. Multimodal-CoT from Amazon Web Services -> Multimodal Chain-of-Thought Reasoning in Language Models (2302.00923)
A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs

8. Graph-of-Thought (GoT) -> Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models (2305.16582)
This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks

More in the comments👇

Articles 38

Article
1

FOD#93: When AI meant Ambient Intelligence

models

None public yet

datasets

None public yet