Spaces:
Running
on
CPU Upgrade
Updated precision to bfloat16 and use_chat_template to false for pankajmathur/orca_mini_v8_0_70b and pankajmathur/orca_mini_v8_1_70b
Hi @alozowski and Team,
First of all, Great work on new UI of Open LLM LB, It looks stunning.
I submitted 2 of the new series of Orca_Mini_v8_* models fine tuned on Llama-3.3-70B-Instruct for evaluation via UI but initially used wrong precision and chat_template flag.
Now, I have opened 2 MR for these 2 models to fix these mistakes, could you please have a look and Let me know, if you need additional details on this:
- https://huggingface.co./datasets/open-llm-leaderboard/requests/discussions/74/
- https://huggingface.co./datasets/open-llm-leaderboard/requests/discussions/75/
Regards,
Pankaj
Hi @pankajmathur ,
Thanks for opening the issue! I corrected both of your requests manually, it should be fine now
I'm closing this discussion, feel free to open a new one in case of any questions
Thank You for swift turnaround, appreciated.
Hi @alozowski ,
Happy Monday, just reaching out to make sense out of following eval requests commits for model "pankajmathur/orca_mini_v8_0_70b", the below commit shows file rename and changes from wrong "params": 35.277,
https://huggingface.co./datasets/open-llm-leaderboard/requests/commit/5660c4c4b9156fa0f15d99be7eee061d5de24764#d2h-741276
Does the model failed to evaluate and these changes reflect re submission for evaluation again?
If it is true, can we submit "pankajmathur/orca_mini_v8_1_70b" again too, It shows it is failed too?
https://huggingface.co./datasets/open-llm-leaderboard/requests/commit/8b40ba212c48dc470be4f661b67cc085ed456477#d2h-702908
Is there any reason they are failing? Just for background, I have successfully evaluated both of them on my own servers, before submitting them to HF Open LLM LB, using:
https://huggingface.co./docs/leaderboards/open_llm_leaderboard/about#reproducibility
lm_eval --model hf --model_args pretrained=pankajmathur/orca_mini_v8_1_70b,dtype=bfloat16,parallelize=True --tasks leaderboard --output_path lm_eval_results/leaderboard --batch_size auto
and they are updated for both model cards:
https://huggingface.co./pankajmathur/orca_mini_v8_0_70b
https://huggingface.co./pankajmathur/orca_mini_v8_1_70b
Again, thanks again for helping out on this really appreciated.
Regards,
Pankaj