Spaces:
Running
on
CPU Upgrade
[FLAG] Garrulus and Turdus based models
Based on https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard/discussions/526:
- https://huggingface.co./udkai/Turdus: "A less contaminated version" is still a contaminated version (thanks to the author for acknowledging it), even if "all 5-non Winograde metrics [...] to be 0.2% higher than the underlying model."
At the time of writing, all 7B models with better average score than https://huggingface.co./mlabonne/NeuralBeagle14-7B appear to be contaminated (thanks to the authors for their transparency regarding contamination):
- https://huggingface.co./eren23/slerp-test-turdus-beagle: merge using Turdus
- https://huggingface.co./leveldevai: models are (successive) merges using Turdus
- https://huggingface.co./abideen/NexoNimbus-7B: merge using Garrulus
- https://huggingface.co./alnrg2arg/test2_3: merge using the previous model
- https://huggingface.co./nfaheem/Marcoroni-7b-DPO-Merge: merge using Turdus
Also a few with lower average score:
- https://huggingface.co./CultriX/MergeTrix-7B: "uses udkai/Turdus"
- https://huggingface.co./liminerity/Blur-7b-v1.21: merge using Turdus
As I explained here: https://www.reddit.com/r/LocalLLaMA/comments/19acvq2/huge_issue_with_truthfulqa_contamination_and/ Turdus, among other models, Is not only contaminated from it's finetuning, but also from it's lineage. There is also license issue I explained (UPDATE: it looks like Turdus fixed their license).
I request assistance of volunteers as Leaderboard maintainer is not willing to take action and asked me to flag all the models, which I can't do because of rate limits (1 post/comment per 24 hours as new user).
Or maybe we can escalate this issue to other HF staff? This is getting out of hand.
I reported Turdus for contamination 2 days ago, and for license issue 1 day ago, but no action is taken.
Also I have a question, how come HuggingFaceH4/zephyr-7b-beta is under MIT license when parent model (Mistral) is under Apache-2?
EDIT: It seems like rate limit is finally lifted/increased, so I can post more.
What's the proper way - report models on their individual pages or make singe post to track them from one location?
Hi, thanks
@MichaelKarpe
for the detailed report, I flagged the models you cited.
@ifjeakeiq
The best way to report models for flagging is to open a discussion on the leaderboard with the name [FLAG] model_name
and add a description explaining why the model should be flagged. You can group multiple models in one discussion just like what
@MichaelKarpe
did. Thanks for your help ! :)
Since
@SaylorTwift
tagged all relevant models, closing this issue.
Thanks a lot
@MichaelKarpe
!