The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
Abstract
Large Reasoning Models (LRMs) represent a breakthrough in AI problem-solving capabilities, but their effectiveness in interactive environments can be limited. This paper introduces and analyzes overthinking in LRMs. A phenomenon where models favor extended internal reasoning chains over environmental interaction. Through experiments on software engineering tasks using SWE Bench Verified, we observe three recurring patterns: Analysis Paralysis, Rogue Actions, and Premature Disengagement. We propose a framework to study these behaviors, which correlates with human expert assessments, and analyze 4018 trajectories. We observe that higher overthinking scores correlate with decreased performance, with reasoning models exhibiting stronger tendencies toward overthinking compared to non-reasoning models. Our analysis reveals that simple efforts to mitigate overthinking in agentic environments, such as selecting the solution with the lower overthinking score, can improve model performance by almost 30% while reducing computational costs by 43%. These results suggest that mitigating overthinking has strong practical implications. We suggest that by leveraging native function-calling capabilities and selective reinforcement learning overthinking tendencies could be mitigated. We also open-source our evaluation framework and dataset to facilitate research in this direction at https://github.com/AlexCuadron/Overthinking.
Community
New discovery! LLMs are just like humans!
Overthinking GREATLY HURTS their performance
If we select the solution with the lower overthinking score. We improve model performance by almost 30% while reducing costs by 43% (o1_low)
Is reasoning really the future of LLMs?
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs (2025)
- Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models (2025)
- Vintix: Action Model via In-Context Reinforcement Learning (2025)
- Logical Reasoning in Large Language Models: A Survey (2025)
- Training Language Models to Reason Efficiently (2025)
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning (2025)
- Demystifying Long Chain-of-Thought Reasoning in LLMs (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper