Running 531 531 Scaling test-time compute π Enhance math problem solving by scaling test-time compute
Running 544 544 Vision Arena (Testing VLMs side-by-side) πΌ Analyze images to detect and label objects