
5CD-AI/Vintern-1B-v3_5
Image-Text-to-Text
•
Updated
•
4.21k
•
30
Additionally, we feed generated with structured prediction JSON data and feed them and text into DeepSeek-R1 Llama 70B to generate a chain of thought that can explain the extraction process.
Why don't you use R1 original (>600B) to get the best results?