On the Acquisition of Shared Grammatical Representations in Bilingual Language Models Paper • 2503.03962 • Published 4 days ago • 3
LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation Paper • 2503.02972 • Published 5 days ago • 23
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users Paper • 2503.02268 • Published 6 days ago • 8
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published 8 days ago • 51
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs Paper • 2502.17422 • Published 13 days ago • 7
Introducing Visual Perception Token into Multimodal Large Language Model Paper • 2502.17425 • Published 13 days ago • 14
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? Paper • 2502.17535 • Published 13 days ago • 8
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model Paper • 2502.18906 • Published 12 days ago • 11
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Paper • 2502.19361 • Published 11 days ago • 26
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models Paper • 2502.14302 • Published 18 days ago • 9
VLM^2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues Paper • 2502.12084 • Published 20 days ago • 29
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published 26 days ago • 54
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 18 days ago • 177
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking Paper • 2502.09083 • Published 25 days ago • 4
Intuitive physics understanding emerges from self-supervised pretraining on natural videos Paper • 2502.11831 • Published 21 days ago • 18
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation Paper • 2502.08826 • Published 25 days ago • 17
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training Paper • 2502.11196 • Published 21 days ago • 22
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? Paper • 2502.12215 • Published 21 days ago • 16