21 13 35

Bhadresh Savani

bhadresh-savani

https://www.linkedin.com/in/bhadreshsavani/

AI & ML interests

NLP, Deep Learning, ML

Recent Activity

upvoted an article 3 days ago

Hugging Face and JFrog partner to make AI Security more transparent

upvoted an article 6 days ago

Trace & Evaluate your Agent with Arize Phoenix

updated a Space 7 days ago

bhadresh-savani/AlfredAgent

View all activity

Organizations

bhadresh-savani's activity

upvoted an article 3 days ago

Article

Hugging Face and JFrog partner to make AI Security more transparent

6 days ago

• 18

upvoted an article 6 days ago

Article

Trace & Evaluate your Agent with Arize Phoenix

10 days ago

• 29

updated a Space 7 days ago

AlfredAgent

📊

Generate answers by searching and analyzing the web

published a Space 7 days ago

AlfredAgent

📊

Generate answers by searching and analyzing the web

updated a model 16 days ago

bhadresh-savani/gemma-2-2B-it-thinking-function_calling-V0

Updated 16 days ago

published a model 16 days ago

bhadresh-savani/gemma-2-2B-it-thinking-function_calling-V0

Updated 16 days ago

liked a Space 23 days ago

233

Agent Leaderboard

💬

Ranking of LLMs for agentic tasks

liked a Space 27 days ago

Unit 1 Certification - AI Agent Fundamentals

🎓

Display a message with certification information

upvoted an article about 1 month ago

Article

How to deploy and fine-tune DeepSeek models on AWS

Jan 30

• 51

liked a model 3 months ago

Datou1111/shou_xin

Text-to-Image • Updated Dec 9, 2024 • 2.2k • 865

reacted to lin-tan's post with 🔥 4 months ago

Post

1441

Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks.
- Leaderboard https://lt-asset.github.io/REPOCOD/
- Dataset: lt-asset/REPOCOD
@jiang719 @shanchao @Yiran-Hu1007
Compared to #SWEBench, RepoCod tasks are
- General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues.
- With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8).

Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with
- Whole function generation
- Repository-level context
- Validation with test cases, and
- Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00)

Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with:
- 67 repository-level, 67 file-level, and 66 self-contains tasks
- Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens)
- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.
- Dataset: lt-asset/REPOCOD_Lite

#LLM4code #LLM #CodeGeneration #Security