Ashvini Kumar Jindal

akjindal53244

AI & ML interests

NLP

Recent Activity

reacted to albertvillanova's post with 👍 3 months ago
🚨 We’ve just released a new tool to compare the performance of models in the 🤗 Open LLM Leaderboard: the Comparator 🎉 https://huggingface.co./spaces/open-llm-leaderboard/comparator Want to see how two different versions of LLaMA stack up? Let’s walk through a step-by-step comparison of LLaMA-3.1 and LLaMA-3.2. 🦙🧵👇 1/ Load the Models' Results - Go to the 🤗 Open LLM Leaderboard Comparator: https://huggingface.co./spaces/open-llm-leaderboard/comparator - Search for "LLaMA-3.1" and "LLaMA-3.2" in the model dropdowns. - Press the Load button. Ready to dive into the results! 2/ Compare Metric Results in the Results Tab 📊 - Head over to the Results tab. - Here, you’ll see the performance metrics for each model, beautifully color-coded using a gradient to highlight performance differences: greener is better! 🌟 - Want to focus on a specific task? Use the Task filter to hone in on comparisons for tasks like BBH or MMLU-Pro. 3/ Check Config Alignment in the Configs Tab ⚙️ - To ensure you’re comparing apples to apples, head to the Configs tab. - Review both models’ evaluation configurations, such as metrics, datasets, prompts, few-shot configs... - If something looks off, it’s good to know before drawing conclusions! ✅ 4/ Compare Predictions by Sample in the Details Tab 🔍 - Curious about how each model responds to specific inputs? The Details tab is your go-to! - Select a Task (e.g., MuSR) and then a Subtask (e.g., Murder Mystery) and then press the Load Details button. - Check out the side-by-side predictions and dive into the nuances of each model’s outputs. 5/ With this tool, it’s never been easier to explore how small changes between model versions affect performance on a wide range of tasks. Whether you’re a researcher or enthusiast, you can instantly visualize improvements and dive into detailed comparisons. 🚀 Try the 🤗 Open LLM Leaderboard Comparator now and take your model evaluations to the next level!
View all activity

Articles

Organizations

Blog-explorers's profile picture MetaMath's profile picture Upaya's profile picture 2A2I's profile picture MLX Community's profile picture Agent Collab's profile picture LinkedIn's profile picture Hugging Face Party @ PyTorch Conference's profile picture rg-preview's profile picture

akjindal53244's activity

New activity in akjindal53244/Llama-3.1-Storm-8B about 1 month ago

Adding Evaluation Results

#9 opened about 1 month ago by
T145
reacted to albertvillanova's post with 👍 3 months ago
view post
Post
1959
🚨 We’ve just released a new tool to compare the performance of models in the 🤗 Open LLM Leaderboard: the Comparator 🎉
open-llm-leaderboard/comparator

Want to see how two different versions of LLaMA stack up? Let’s walk through a step-by-step comparison of LLaMA-3.1 and LLaMA-3.2. 🦙🧵👇

1/ Load the Models' Results
- Go to the 🤗 Open LLM Leaderboard Comparator: open-llm-leaderboard/comparator
- Search for "LLaMA-3.1" and "LLaMA-3.2" in the model dropdowns.
- Press the Load button. Ready to dive into the results!

2/ Compare Metric Results in the Results Tab 📊
- Head over to the Results tab.
- Here, you’ll see the performance metrics for each model, beautifully color-coded using a gradient to highlight performance differences: greener is better! 🌟
- Want to focus on a specific task? Use the Task filter to hone in on comparisons for tasks like BBH or MMLU-Pro.

3/ Check Config Alignment in the Configs Tab ⚙️
- To ensure you’re comparing apples to apples, head to the Configs tab.
- Review both models’ evaluation configurations, such as metrics, datasets, prompts, few-shot configs...
- If something looks off, it’s good to know before drawing conclusions! ✅

4/ Compare Predictions by Sample in the Details Tab 🔍
- Curious about how each model responds to specific inputs? The Details tab is your go-to!
- Select a Task (e.g., MuSR) and then a Subtask (e.g., Murder Mystery) and then press the Load Details button.
- Check out the side-by-side predictions and dive into the nuances of each model’s outputs.

5/ With this tool, it’s never been easier to explore how small changes between model versions affect performance on a wide range of tasks. Whether you’re a researcher or enthusiast, you can instantly visualize improvements and dive into detailed comparisons.

🚀 Try the 🤗 Open LLM Leaderboard Comparator now and take your model evaluations to the next level!
New activity in akjindal53244/Llama-3.1-Storm-8B 4 months ago

Languages report ?

1
#7 opened 4 months ago by
nicolollo