VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper β’ 2501.01957 β’ Published 10 days ago β’ 35
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper β’ 2501.01957 β’ Published 10 days ago β’ 35
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper β’ 2501.01957 β’ Published 10 days ago β’ 35
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption Paper β’ 2412.09283 β’ Published Dec 12, 2024 β’ 19