Self-Training Enables Video Instruction Tuning with Any Supervision
Orr Zohar PRO
orrzohar
AI & ML interests
Large Multi-Modal Models, Foundation Models, Video Understanding
Recent Activity
upvoted
a
paper
about 20 hours ago
OpenAI o1 System Card
upvoted
a
paper
about 20 hours ago
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
upvoted
a
paper
about 23 hours ago
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Organizations
Collections
2
interesting Video-LLMs
-
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper • 2406.12275 • Published • 29 -
VILA: On Pre-training for Visual Language Models
Paper • 2312.07533 • Published • 20 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 51 -
Long Context Transfer from Language to Vision
Paper • 2406.16852 • Published • 32