SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 17 days ago • 94
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published Jan 21 • 53
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published Jan 20 • 92
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Paper • 2307.16789 • Published Jul 31, 2023 • 100