Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement Paper • 2411.06558 • Published Nov 10 • 34
SlimLM: An Efficient Small Language Model for On-Device Document Assistance Paper • 2411.09944 • Published Nov 15 • 12
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing Paper • 2411.19460 • Published 27 days ago • 10
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published 19 days ago • 46
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment Paper • 2412.04814 • Published 20 days ago • 45
Moto: Latent Motion Token as the Bridging Language for Robot Manipulation Paper • 2412.04445 • Published 20 days ago • 21
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies Paper • 2412.10345 • Published 12 days ago • 2
Learning Universal Policies via Text-Guided Video Generation Paper • 2302.00111 • Published Jan 31, 2023
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 13 days ago • 75
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes Paper • 2412.11100 • Published 11 days ago • 5
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning Paper • 2412.09858 • Published 13 days ago • 1
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling Paper • 2412.15084 • Published 6 days ago • 12