Activity Feed

AI & ML interests

Better coding data for all 🧑

Recent Activity

🐱 KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding

KodCode is the largest fully-synthetic open-source dataset providing verifiable solutions and tests for coding tasks. It contains 12 distinct subsets spanning various domains (from algorithmic to package-specific knowledge) and difficulty levels (from basic coding exercises to interview and competitive programming challenges). KodCode is designed for both supervised fine-tuning (SFT) and RL tuning.

πŸ•ΈοΈ Project Website | πŸ“„ Technical Report | πŸ’Ύ Github Repo | πŸ€— KodCode-V1 (For RL) | πŸ€— KodCode-V1-SFT-R1 (for SFT)

models

None public yet