Unpopular opinion: Open Source takes courage to do !
Not everyone is brave enough to release what they have done (the way they've done it) to the wild to be judged ! It really requires a high level of "knowing wth are you doing" ! It's kind of a super power !
Well, this is a bit late but consider given our recent blog a read if you are interested in Evaluation.
You don't have to be into Arabic NLP in order to read it, the main contribution we are introducing is a new evaluation measure for NLG. We made the fisrt application of this measure on Arabic for now and we will be working with colleagues from the community to expand it to other languages.
I feel like this incredible resource hasn't gotten the attention it deserves in the community!
@clefourrier and generally the HuggingFace evaluation team put together a fantastic guidebook covering a lot about ๐๐ฉ๐๐๐จ๐๐ง๐๐ข๐ก from basics to advanced tips.
Don't you think we should add a tag "Evaluation" for datasets that are meant to be benchmarks and not for training ?
At least, when someone is collecting a group of datasets from an organization or let's say the whole hub can filter based on that tag and avoid somehow contaminating their "training" data.
@mariagrandury (SomosNLP) and team releases the Spanish leaderboard !!! It is impressive how they choosed to design this leaderboard and how it support 4 languages (all part of Spain ofc).
Are the servers down or what ? Am i the only one experiencing this error :
HfHubHTTPError: 500 Server Error: Internal Server Errorfor url: https://huggingface.co./api/datasets/...../)
Internal Error- We're working hard to fix this as soon as possible!
Datapluck: Portability Tool for Huggingface Datasets
"I found myself recently whipping up notebooks just to pull huggingface datasets locally, annotate or operate changes and update them again. This happened often enough that I made a cli tool out of it, which I've been using successfully for the last few months.
While huggingface uses open formats, I found the official toolchain relatively low-level and not adapted to quick operations such as what I am doing." ~ @omarkamali