KodCode-V1 is the largest fully-synthetic open-source dataset providing verifiable solutions and tests for coding tasks.

KodCode
community
AI & ML interests
Better coding data for all π§‘
Recent Activity
View all activity
Organization Card
π± KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
KodCode is the largest fully-synthetic open-source dataset providing verifiable solutions and tests for coding tasks. It contains 12 distinct subsets spanning various domains (from algorithmic to package-specific knowledge) and difficulty levels (from basic coding exercises to interview and competitive programming challenges). KodCode is designed for both supervised fine-tuning (SFT) and RL tuning.
πΈοΈ Project Website | π Technical Report | πΎ Github Repo | π€ KodCode-V1 (For RL) | π€ KodCode-V1-SFT-R1 (for SFT)
Collections
1
models
None public yet