Bhadresh Savani's picture

Bhadresh Savani

bhadresh-savani

AI & ML interests

NLP, Deep Learning, ML

Recent Activity

Organizations

Flax Community's profile picture ONNXConfig for all's profile picture HugGAN Community's profile picture Keras Dreambooth Event's profile picture

bhadresh-savani's activity

reacted to lin-tan's post with 🔥 about 1 month ago
view post
Post
1423
Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks.
- Leaderboard https://lt-asset.github.io/REPOCOD/
- Dataset: lt-asset/REPOCOD
@jiang719 @shanchao @Yiran-Hu1007
Compared to #SWEBench, RepoCod tasks are
- General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues.
- With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8).

Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with
- Whole function generation
- Repository-level context
- Validation with test cases, and
- Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00)

Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with:
- 67 repository-level, 67 file-level, and 66 self-contains tasks
- Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens)
- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.
- Dataset: lt-asset/REPOCOD_Lite

#LLM4code #LLM #CodeGeneration #Security
  • 1 reply
·
upvoted an article 4 months ago
upvoted an article 5 months ago
upvoted an article 5 months ago
view article
Article

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

59
upvoted 2 articles 5 months ago
view article
Article

Serverless Inference with Hugging Face and NVIDIA NIMs

28
view article
Article

Google Cloud TPUs made available to Hugging Face users

19
New activity in bhadresh-savani/photo-to-cartoon 5 months ago

Create README.md

#3 opened 5 months ago by
Rikz
upvoted 2 articles 5 months ago
view article
Article

Our Transformers Code Agent beats the GAIA benchmark!

48
view article
Article

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

224
liked a Space 5 months ago