Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1013

[FLAG] Garrulus and Turdus based models

#548

by MichaelKarpe - opened Jan 19

Discussion

MichaelKarpe

Jan 19

•

edited Jan 19

Based on https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard/discussions/526:

https://huggingface.co./udkai/Turdus: "A less contaminated version" is still a contaminated version (thanks to the author for acknowledging it), even if "all 5-non Winograde metrics [...] to be 0.2% higher than the underlying model."

At the time of writing, all 7B models with better average score than https://huggingface.co./mlabonne/NeuralBeagle14-7B appear to be contaminated (thanks to the authors for their transparency regarding contamination):

https://huggingface.co./eren23/slerp-test-turdus-beagle: merge using Turdus
https://huggingface.co./leveldevai: models are (successive) merges using Turdus
https://huggingface.co./abideen/NexoNimbus-7B: merge using Garrulus
https://huggingface.co./alnrg2arg/test2_3: merge using the previous model
https://huggingface.co./nfaheem/Marcoroni-7b-DPO-Merge: merge using Turdus

Also a few with lower average score:

https://huggingface.co./CultriX/MergeTrix-7B: "uses udkai/Turdus"
https://huggingface.co./liminerity/Blur-7b-v1.21: merge using Turdus

MichaelKarpe changed discussion title from [FLAG] udkai/Turdus to [FLAG] Turdus-based models Jan 19

MichaelKarpe changed discussion title from [FLAG] Turdus-based models to [FLAG] Garrulus and Turdus based models Jan 19

ifjeakeiq

Jan 22

•

edited Jan 22

As I explained here: https://www.reddit.com/r/LocalLLaMA/comments/19acvq2/huge_issue_with_truthfulqa_contamination_and/ Turdus, among other models, Is not only contaminated from it's finetuning, but also from it's lineage. There is also license issue I explained (UPDATE: it looks like Turdus fixed their license).
I request assistance of volunteers as Leaderboard maintainer is not willing to take action and asked me to flag all the models, which I can't do because of rate limits (1 post/comment per 24 hours as new user).
Or maybe we can escalate this issue to other HF staff? This is getting out of hand.
I reported Turdus for contamination 2 days ago, and for license issue 1 day ago, but no action is taken.
Also I have a question, how come HuggingFaceH4/zephyr-7b-beta is under MIT license when parent model (Mistral) is under Apache-2?

EDIT: It seems like rate limit is finally lifted/increased, so I can post more.
What's the proper way - report models on their individual pages or make singe post to track them from one location?

SaylorTwift

Open LLM Leaderboard org Jan 22

Hi, thanks @MichaelKarpe for the detailed report, I flagged the models you cited.
@ifjeakeiq The best way to report models for flagging is to open a discussion on the leaderboard with the name [FLAG] model_name
and add a description explaining why the model should be flagged. You can group multiple models in one discussion just like what @MichaelKarpe did. Thanks for your help ! :)

clefourrier

Open LLM Leaderboard org Jan 22

Since @SaylorTwift tagged all relevant models, closing this issue.
Thanks a lot @MichaelKarpe !

clefourrier changed discussion status to closed Jan 22

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment