OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published Dec 27, 2024 • 81
How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System? Paper • 2412.18495 • Published Dec 24, 2024 • 8
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published Dec 24, 2024 • 37
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition Paper • 2412.19712 • Published Dec 27, 2024 • 14
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper • 2412.19326 • Published Dec 26, 2024 • 18
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published Dec 16, 2024 • 54
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published Dec 25, 2024 • 97
Slow Perception: Let's Perceive Geometric Figures Step-by-step Paper • 2412.20631 • Published 29 days ago • 14
Facilitating large language model Russian adaptation with Learned Embedding Propagation Paper • 2412.21140 • Published 28 days ago • 16
Efficiently Serving LLM Reasoning Programs with Certaindex Paper • 2412.20993 • Published 29 days ago • 35
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published 28 days ago • 23
Bringing Objects to Life: 4D generation from 3D objects Paper • 2412.20422 • Published 30 days ago • 34