Model eval request FAILED ... how do we know the root cause?

#7
by karimouda - opened

I requested an eval 3 times for our new model but all failed although the eval runs successfully using lighteval

Lighteval Run Commands

!git clone https://github.com/huggingface/lighteval.git
%cd lighteval
!pip install -e . && pip install accelerate
!wget https://raw.githubusercontent.com/huggingface/lighteval/main/examples/tasks/all_arabic_tasks.txt -O examples/tasks/all_arabic_tasks.txt
%env HF_DATASETS_TRUST_REMOTE_CODE=1
!accelerate launch -m
lighteval accelerate
--model_args="pretrained=silma-ai/SILMA-9B-Instruct-v0.1.1,trust_remote_code=True"
--custom_tasks community_tasks/arabic_evals.py
--tasks examples/tasks/all_arabic_tasks.txt
--override_batch_size 1 --save_details --output_dir="./output_gpt2"

Model request file below
https://huggingface.co./datasets/OALL/requests/commit/c6a182a11b637ed7787bbedab46f63d5c690f1a9

My question: How can we determine the cause of the failure on your side so we could resolve the issue?

Screenshot 2024-07-28 at 08.58.48.png

Open Arabic LLM Leaderboard org

Hey @karimouda ,
Apologies for the late reply, well i see that the model is based on Gemma2 9B which also fails to run (we are still investigating the issue)
The main issue is that you are launching your evals with the trust_remote_code=True tag which we don't support !

Thanks Ali for your response. Is there anything we could do on our side to make it work or we should wait until the Gemma2 issue is resolved ?

Also as far as I understood, the trust_remote_code=True is mandatory for the Arabic datasets used in Lighteval, is there a way we could the run the eval without it?

alielfilali01 changed discussion status to closed

Sign up or log in to comment