Spaces:
Running
Running
Reproducing ZebraLogic results
#4
by
js2042
- opened
I've been struggling to reproduce the results in ZeroEval/result_dir/zebra-grid.summary.md
. The only difference in configuration is using HuggingFace engine instead of VLLM. Since temperature is set to 0.0, I cannot see where a difference in results could have come from. If I'm making any obvious mistakes I'd be grateful to know!
Myself:
bash zero_eval_local.sh -d zebra-grid -m Qwen/Qwen2-7B-Instruct -p Qwen2-7B-Instruct -s 2 -f hf
ZeroEval/scripts/_ZebraLogic.md
:
bash zero_eval_local.sh -d zebra-grid -m Qwen/Qwen2-7B-Instruct -p Qwen2-7B-Instruct -s 4
Model | Mode | N_Mode | N_Size | Puzzle Acc | Small Puzzle Acc | Medium Puzzle Acc | Large Puzzle Acc | XL Puzzle Acc | Cell Acc | No Answer | Total Puzzles | Reason Lens |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Qwen2-7B-Instruct (allenai) | greedy | single | 1 | 8.4 | 26.25 | 0 | 0 | 0 | 22.06 | 24.4 | 1000 | 1473.23 |
Qwen2-7B-Instruct (myself) | greedy | single | 1 | 7.3 | 22.5 | 0.36 | 0 | 0 | 22.52 | 24.5 | 1000 | 1504.05 |