open-llm-leaderboard/open_llm_leaderboard · Updated precision to bfloat16 and use_chat_template to false for pankajmathur/orca_mini_v8_0_70b and pankajmathur/orca_mini_v8_1

11 days ago

First of all, Great work on new UI of Open LLM LB, It looks stunning.
I submitted 2 of the new series of Orca_Mini_v8_* models fine tuned on Llama-3.3-70B-Instruct for evaluation via UI but initially used wrong precision and chat_template flag.
Now, I have opened 2 MR for these 2 models to fix these mistakes, could you please have a look and Let me know, if you need additional details on this:

Regards,
Pankaj

alozowski

Open LLM Leaderboard org 9 days ago

Hi @pankajmathur ,

Thanks for opening the issue! I corrected both of your requests manually, it should be fine now

I'm closing this discussion, feel free to open a new one in case of any questions

alozowski changed discussion status to closed 9 days ago

pankajmathur

8 days ago

Thank You for swift turnaround, appreciated.

pankajmathur

3 days ago

Hi @alozowski ,

Happy Monday, just reaching out to make sense out of following eval requests commits for model "pankajmathur/orca_mini_v8_0_70b", the below commit shows file rename and changes from wrong "params": 35.277,
https://huggingface.co./datasets/open-llm-leaderboard/requests/commit/5660c4c4b9156fa0f15d99be7eee061d5de24764#d2h-741276
Does the model failed to evaluate and these changes reflect re submission for evaluation again?

If it is true, can we submit "pankajmathur/orca_mini_v8_1_70b" again too, It shows it is failed too?
https://huggingface.co./datasets/open-llm-leaderboard/requests/commit/8b40ba212c48dc470be4f661b67cc085ed456477#d2h-702908

Is there any reason they are failing? Just for background, I have successfully evaluated both of them on my own servers, before submitting them to HF Open LLM LB, using:
https://huggingface.co./docs/leaderboards/open_llm_leaderboard/about#reproducibility

lm_eval --model hf --model_args pretrained=pankajmathur/orca_mini_v8_1_70b,dtype=bfloat16,parallelize=True --tasks leaderboard --output_path lm_eval_results/leaderboard --batch_size auto

and they are updated for both model cards:
https://huggingface.co./pankajmathur/orca_mini_v8_0_70b
https://huggingface.co./pankajmathur/orca_mini_v8_1_70b

Again, thanks again for helping out on this really appreciated.

Regards,
Pankaj

Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

Updated precision to bfloat16 and use_chat_template to false for pankajmathur/orca_mini_v8_0_70b and pankajmathur/orca_mini_v8_1_70b