Outcome-Refining Process Supervision for Code Generation
Abstract
Large Language Models have demonstrated remarkable capabilities in code generation, yet they often struggle with complex programming tasks that require deep algorithmic reasoning. While process supervision through learned reward models shows promise in guiding reasoning steps, it requires expensive training data and suffers from unreliable evaluation. We propose Outcome-Refining Process Supervision, a novel paradigm that treats outcome refinement itself as the process to be supervised. Our framework leverages concrete execution signals to ground the supervision of reasoning steps, while using tree-structured exploration to maintain multiple solution trajectories simultaneously. Experiments demonstrate that our approach enables even smaller models to achieve high success accuracy and performance metrics on competitive programming tasks, creates more reliable verification than traditional reward models without requiring training PRMs. Our approach achieves significant improvements across 5 models and 3 datasets: an average of 26.9% increase in correctness and 42.2% in efficiency. The results suggest that providing structured reasoning space with concrete verification signals is crucial for solving complex programming tasks. We open-source all our code and data at: https://github.com/zhuohaoyu/ORPS
Community
Building Better Reasoning Code LLMs: A Process Supervision Approach to Complex Code Generation
The recent release of OpenAI's o1 model has demonstrated unprecedented performance in complex reasoning tasks by incorporating extensive CoT reasoning during inference time. While several recent studies have attempted to replicate o1's success in mathematical reasoning, developing similar capabilities for more complex domains like code generation remains a significant challenge. We introduce Outcome-Refining Process Supervision (ORPS), a novel framework that enhances LLMs' code generation abilities by treating the refinement of execution outcomes as the process to be supervised. Through concrete execution signals and tree-structured exploration during inference, ORPS enables models to perform deep reasoning with step-by-step verification and refinement. Our methodology achieves substantial improvements across multiple benchmark datasets, with an average increase of 26.9% in correctness and 42.2% in code generation efficiency. Notably, we achieve these gains without requiring expensive reward model training, demonstrating that even smaller models can achieve remarkable performance improvements on competitive programming tasks through structured reasoning. This work provides insights into how outcome-guided process supervision during inference time can enhance complex code generation capabilities, advancing our understanding of building more effective reasoning systems.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation (2024)
- CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models (2024)
- Planning-Driven Programming: A Large Language Model Programming Workflow (2024)
- Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS (2024)
- o1-Coder: an o1 Replication for Coding (2024)
- Tree-of-Code: A Tree-Structured Exploring Framework for End-to-End Code Generation and Execution in Complex Task Handling (2024)
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper