Learning Getting-Up Policies for Real-World Humanoid Robots Paper • 2502.12152 • Published 20 days ago • 37
Diverse Inference and Verification for Advanced Reasoning Paper • 2502.09955 • Published 24 days ago • 17
view article Article Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios By pratikbhavsar and 1 other • 26 days ago • 16
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC Paper • 2502.14282 • Published 18 days ago • 18
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 17 days ago • 177
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published 24 days ago • 38
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published 26 days ago • 54
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation Paper • 2502.09411 • Published 24 days ago • 18
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges Paper • 2502.08680 • Published 26 days ago • 11
CoT-Valve: Length-Compressible Chain-of-Thought Tuning Paper • 2502.09601 • Published 24 days ago • 14
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data Paper • 2502.08468 • Published 25 days ago • 13
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models Paper • 2502.09390 • Published 24 days ago • 16
Exploring the Potential of Encoder-free Architectures in 3D LMMs Paper • 2502.09620 • Published 24 days ago • 25
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper • 2502.09621 • Published 24 days ago • 27