Failed orca_mini_v8_* Evaluation

#1051
by pankajmathur - opened

Opening new discussion, as suggested in previous comment on another discussion:

Hi @alozowski ,

Happy Monday, just reaching out to make sense out of following eval requests commits for model "pankajmathur/orca_mini_v8_0_70b", the below commit shows file rename and changes from wrong "params": 35.277,
https://huggingface.co./datasets/open-llm-leaderboard/requests/commit/5660c4c4b9156fa0f15d99be7eee061d5de24764#d2h-741276
Does the model failed to evaluate and these changes reflect re submission for evaluation again?

If it is true, can we submit "pankajmathur/orca_mini_v8_1_70b" again too, as It shows it is failed too?
https://huggingface.co./datasets/open-llm-leaderboard/requests/commit/8b40ba212c48dc470be4f661b67cc085ed456477#d2h-702908

Is there any reason they are failing? Just for background, I have successfully evaluated both of them on my own servers, before submitting them to HF Open LLM LB, using:

https://huggingface.co./docs/leaderboards/open_llm_leaderboard/about#reproducibility

lm_eval --model hf --model_args pretrained=pankajmathur/orca_mini_v8_1_70b,dtype=bfloat16,parallelize=True --tasks leaderboard --output_path lm_eval_results/leaderboard --batch_size auto

and these results are now updated for both model cards:
https://huggingface.co./pankajmathur/orca_mini_v8_0_70b
https://huggingface.co./pankajmathur/orca_mini_v8_1_70b

Again, thanks again for helping out on this really appreciated.

Regards,
Pankaj

Sign up or log in to comment