Model evaluation failed

#494
by adamo1139 - opened

Hello :)

Evaluation of Yi-34B-200K-DARE-merge-v5 by @brucethemoose failed.

https://huggingface.co./datasets/open-llm-leaderboard/requests/blob/main/brucethemoose/Yi-34B-200K-DARE-merge-v5_eval_request_False_bfloat16_Original.json

Brucethemoose loaded this model with transformers using 4bit, while reducing max_position_embeddings (vram limitations), and it worked fine. Based on other open similar recently opened discussions, it seems like you have issues with the cluster doing the evaluations. If this is what caused this issue, can you please add it to the queue once again after compute cluster is operational?

Yeah, thanks for posting this. I saw Tess and one of my old merges fail this way as well.

As adamo suggested, I think the leaderboard needs a check for context size? Basically if its enormous, clamp it to something reasonable like 32K to avoid CUDA OOMs on the test bench.

I haven't seen it fail for any 200k model but I don't follow it closely for most of them. My best guess is that your model failed evaluation due to cluster-wide connectivity or processing issue.

https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard/discussions/489

https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard/discussions/492

https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard/discussions/493

https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard/discussions/485

Quoting Clémentine in #485

Side note - our eval cluster changed and we are in full debugging mode (connectivity issues) so it might take a couple days for us to come back to you.

Open LLM Leaderboard org

Hi ! The connectivity issues on the cluster have been fixed, and your model should be on the leaderboard :)
Don't hesitate to re-open the issue if your model failed.

SaylorTwift changed discussion status to closed

Sign up or log in to comment