Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models Paper • 2503.01763 • Published 6 days ago • 4 • 2
FLAME: A Federated Learning Benchmark for Robotic Manipulation Paper • 2503.01729 • Published 6 days ago • 4 • 2
Benchmarking Large Language Models for Multi-Language Software Vulnerability Detection Paper • 2503.01449 • Published 7 days ago • 4 • 2
CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs Paper • 2503.01378 • Published 7 days ago • 3 • 2
SwiLTra-Bench: The Swiss Legal Translation Benchmark Paper • 2503.01372 • Published 7 days ago • 2 • 2
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention Paper • 2502.14866 • Published 17 days ago • 12 • 2
Expect the Unexpected: FailSafe Long Context QA for Finance Paper • 2502.06329 • Published 28 days ago • 126 • 4
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published 27 days ago • 60 • 6
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published 27 days ago • 60 • 6
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 136 • 5
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper • 2407.20183 • Published Jul 29, 2024 • 42 • 4