RepVideo: Rethinking Cross-Layer Representation for Video Generation Paper • 2501.08994 • Published 12 days ago • 15
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published 17 days ago • 40
Multi-task retriever fine-tuning for domain-specific and efficient RAG Paper • 2501.04652 • Published 19 days ago • 10
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 21 days ago • 66
HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving Paper • 2412.20735 • Published 29 days ago • 11
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper • 2412.21187 • Published 28 days ago • 36
Slow Perception: Let's Perceive Geometric Figures Step-by-step Paper • 2412.20631 • Published 29 days ago • 14
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published 28 days ago • 23
IDOL: Instant Photorealistic 3D Human Creation from a Single Image Paper • 2412.14963 • Published Dec 19, 2024 • 6
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published Dec 19, 2024 • 26
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published Dec 18, 2024 • 14
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 13
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers Paper • 2412.12571 • Published Dec 17, 2024 • 8
Learning from Massive Human Videos for Universal Humanoid Pose Control Paper • 2412.14172 • Published Dec 18, 2024 • 10
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer Paper • 2412.13871 • Published Dec 18, 2024 • 18
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes Paper • 2412.11100 • Published Dec 15, 2024 • 6