evaluate-metric (Evaluate Metric)

Organization Card

🤗 Evaluate provides access to a wide range of evaluation tools. It covers a range of modalities such as text, computer vision, audio, etc. as well as tools to evaluate models or datasets.

It has three types of evaluations:

Metric: measures the performance of a model on a given dataset, usually by comparing the model's predictions to some ground truth labels -- these are covered in this space.
Comparison: used to compare the performance of two or more models on a single test dataset., e.g. by comparing their predictions to ground truth labels and computing their agreement -- covered in the Evaluate Comparison Spaces.
Measurement: for gaining more insights on datasets and model predictions based on their properties and characteristics -- covered in the Evaluate Measurement Spaces.

All three types of evaluation supported by the 🤗 Evaluate library are meant to be mutually complementary, and help our community carry out more mindful and responsible evaluation!

SacreBLEU

models

None public yet

datasets

None public yet

Evaluate Metric

AI & ML interests

Recent Activity

spaces 54

BLEU

SQuAD v2

ROUGE

Frugalscore

sMAPE

SacreBLEU

models

datasets

AI & ML interests

Recent Activity

Team members 5

spaces 54 Sort: Recently updated

BLEU

SQuAD v2

ROUGE

Frugalscore

sMAPE

SacreBLEU

models

datasets

spaces 54