DINOv2: Learning Robust Visual Features without Supervision Paper • 2304.07193 • Published Apr 14, 2023 • 6
Intuitive physics understanding emerges from self-supervised pretraining on natural videos Paper • 2502.11831 • Published 20 days ago • 18
Cluster and Predict Latents Patches for Improved Masked Image Modeling Paper • 2502.08769 • Published 25 days ago • 4
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4 • 61