PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World Paper • 2412.17589 • Published 3 days ago • 8 • 2
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World Paper • 2412.17589 • Published 3 days ago • 8
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks Paper • 2412.15204 • Published 6 days ago • 31
Measuring Mathematical Problem Solving With the MATH Dataset Paper • 2103.03874 • Published Mar 5, 2021 • 3
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25 • 60
Data Contamination Report from the 2024 CONDA Shared Task Paper • 2407.21530 • Published Jul 31 • 10 • 3
Data Contamination Report from the 2024 CONDA Shared Task Paper • 2407.21530 • Published Jul 31 • 10 • 3
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI Paper • 2406.12753 • Published Jun 18 • 14
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI Paper • 2406.12753 • Published Jun 18 • 14