open-llm-leaderboard/open_llm_leaderboard · No good way to identify number of activated parameters causes MIxtral evaluation failures

Apr 16, 2024

•

edited Apr 16, 2024

Hey @clefourrier I noticed all the 8x22B finetunes failed
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1 @lewtun ?
migtissera/Tess-2.0-Mixtral-8x22B @migtissera
0-hero/Matter-0.2-8x22B mine
and maybe a few more I missed

MaziyarPanahi

Apr 16, 2024

•

edited Apr 24, 2024

Let me not open a new issue since it's related. Is 141B too big for the leaderboard?
https://huggingface.co./datasets/open-llm-leaderboard/requests/blob/main/MaziyarPanahi/Goku-8x22B-v0.1_eval_request_False_float16_Original.json

clefourrier

Open LLM Leaderboard org Apr 17, 2024

Hi all!
As you can see from the job ids (-1), the jobs were not launched - this is because our backend assumes that the models have 140B activated parameters (which is too big for the cluster, hence skipped), not 140B total parameters with considerably less activated. I'm unsure there is an easy way for us to make the difference automatically at the moment, but we'll gladly update our backend and re-submit your models once we can get this information.

clefourrier changed discussion title from 8x22B's failing to No good way to identify number of activated parameters causes MIxtral evaluation failures Apr 18, 2024

migtissera

Apr 24, 2024

Hey, is this fixed now or still waiting?

alozowski

Open LLM Leaderboard org Apr 25, 2024

Hi everyone!

Thanks to @SaylorTwift , now we can submit moe models bigger than 140B for evaluation, thus I resubmitted this one for @MaziyarPanahi

Please, provide me with the requests files for similar models to resubmit

MaziyarPanahi

Apr 25, 2024

Fantastic! Thanks @alozowski and @SaylorTwift

migtissera

Apr 25, 2024

Here's mine: https://huggingface.co./datasets/open-llm-leaderboard/requests/blob/main/migtissera/Tess-2.0-Mixtral-8x22B_eval_request_False_float16_Original.json

Thanks!

0-hero

Apr 25, 2024

Could you please rerun this as well
https://huggingface.co./datasets/open-llm-leaderboard/requests/blob/main/0-hero/Matter-0.2-8x22B_eval_request_False_bfloat16_Original.json

alozowski

Open LLM Leaderboard org Apr 25, 2024

Resubmitted both migtissera/Tess-2.0-Mixtral-8x22B and 0-hero/Matter-0.2-8x22B 👍

In that case I close this discussion, if there are any problems with models evaluations please open new ones for each model

alozowski changed discussion status to closed Apr 25, 2024

migtissera

Apr 26, 2024

Says FAILED

0-hero

Apr 26, 2024

I think all 3 failed again

MaziyarPanahi

Apr 26, 2024

Yes, I created a separate discussion for my models. 2 of the failed models were 8B, so something else might have happened.

alozowski changed discussion status to open Apr 26, 2024

alozowski

Open LLM Leaderboard org Apr 26, 2024

Hi everyone!

Hmm, I see, all these models have indeed failed, let me investigate

MaziyarPanahi

Apr 27, 2024

Hi @alozowski
Just as an FYI, the last attempt at my model just failed: https://huggingface.co./datasets/open-llm-leaderboard/requests/blob/main/MaziyarPanahi/Goku-8x22B-v0.1_eval_request_False_float16_Original.json

migtissera

Apr 28, 2024

Hey, any update here on the Tess model? Do you want me to open a separate ticket to track it? This is the model: https://huggingface.co./datasets/open-llm-leaderboard/requests/blob/main/migtissera/Tess-2.0-Mixtral-8x22B_eval_request_False_float16_Original.json

MaziyarPanahi

Apr 28, 2024

There seems to be something going on with the LB eval cluster, at least for some large models. Even my Llama-3-70B submission has been running for the last 2 days. https://huggingface.co./datasets/open-llm-leaderboard/requests/blob/main/MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.1_eval_request_False_bfloat16_Original.json

clefourrier

Open LLM Leaderboard org Apr 29, 2024

Hi!

We're still looking up ways to launch moe models correctly on our backend - and we've also had network failures on our cluster last week. We'll keep you posted as soon as we have updates.

@MaziyarPanahi , what you are reporting is normal and unrelated to the current issue :) When the research cluster is full, the evaluation jobs are cancelled and rescheduled, but we keep the status to "running" to keep it simple for end users. It's likely your model was "running, cancelled, rescheduled, running, ..."

MaziyarPanahi

Apr 29, 2024

Hi @clefourrier

Thanks for the update regarding MoE models, appreciate it.

but we keep the status to "running" to keep it simple for end users. It's likely your model was "running, cancelled, rescheduled, running, ..."

I didn't know that, it makes sense now. Thank you :)

migtissera

May 4, 2024

Could you please resubmit Tess? https://huggingface.co./datasets/open-llm-leaderboard/requests/blob/main/migtissera/Tess-2.0-Mixtral-8x22B_eval_request_False_float16_Original.json

clefourrier

Open LLM Leaderboard org May 4, 2024

•

edited May 4, 2024

Doing it right now, tell me if it works.

MaziyarPanahi

May 4, 2024

To this day, the only 8x22B models in the Leaderboard are from MistralAI. I don't believe we have ever had a successful eval on any 8x22B fine-tuned models. @clefourrier is the issue resolved and the only limitation is to find free resources? Or we still don't know if the MoE models with this size might get rejected?

clefourrier

Open LLM Leaderboard org May 4, 2024

These ones we launched manually when they came out because they were important for the community.
Good question, I think @SaylorTwift took a look at the backend side so I'll let him answer.
(The main problem we had was (as indicated in the title) identifying the number of activated params in MoEs.)

migtissera

May 9, 2024

Tess-Mixtral failed again. It was running for a while: https://huggingface.co./datasets/open-llm-leaderboard/requests/blob/main/migtissera/Tess-2.0-Mixtral-8x22B_eval_request_False_float16_Original.json

Can you guys investigate/resubmit?

MaziyarPanahi

May 12, 2024

Hi @alozowski
My 8x22B model also failed again. Could you please re-submit it? https://huggingface.co./datasets/open-llm-leaderboard/requests/blob/main/MaziyarPanahi/Goku-8x22B-v0.1_eval_request_False_float16_Original.json

Many thanks

SaylorTwift

Open LLM Leaderboard org May 13, 2024

hi! your models failed during download, it has been requeued. however, the cluster is really ful atm so it might take a bit for your model to be ran.

MaziyarPanahi

May 14, 2024

hi! your models failed during download, it has been requeued. however, the cluster is really ful atm so it might take a bit for your model to be ran.

I think that's a good news, it got an instance to start the download process at least :)
Thanks @SaylorTwift appreciate the help

MaziyarPanahi

May 27, 2024

Hi @alozowski

My 8x22B model failed, could you please re-submit it.

https://huggingface.co./datasets/open-llm-leaderboard/requests/blob/main/MaziyarPanahi/Goku-8x22B-v0.1_eval_request_False_float16_Original.json

Many thanks :)

alozowski

Open LLM Leaderboard org May 27, 2024

•

edited May 27, 2024

Since all three models failed, I've resubmitted all of them:

0-hero/Matter-0.2-8x22B request file
MaziyarPanahi/Goku-8x22B-v0.1 request file
migtissera/Tess-2.0-Mixtral-8x22B request file

I'll keep an eye on them and check the evaluation status

MaziyarPanahi

May 30, 2024

Sorry @alozowski , but the model failed again

MaziyarPanahi/Goku-8x22B-v0.1 request file

migtissera

Jun 4, 2024

Hey there!

Tess-Mixtral-8x22B failed again. https://huggingface.co./datasets/open-llm-leaderboard/requests/blob/main/migtissera/Tess-2.0-Mixtral-8x22B_eval_request_False_float16_Original.json

Could you resubmit?

migtissera

Jun 5, 2024

Hey there!

Any update here?

Thanks!

clefourrier

Open LLM Leaderboard org Jun 5, 2024

•

edited Jun 5, 2024

Hey there!

Edit: We're having a problem with the parameter estimation of these models and can't launch them at the moment as they are estimated as 140B models, therefore requesting multiple nodes. It would not be too big of a problem if we were not so compute tight atm.
I won't relaunch them for now, but once we do our big update we'll investigate this a bit more. You can also ping people from the safetensors side for the params estimation.

migtissera

Jun 5, 2024

Yeah, they're definitely beefy! Okay, sounds good Clementine!

clefourrier changed discussion status to closed Jun 25, 2024