Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Paper
•
2412.03304
•
Published
•
17
None defined yet.
reflection.py
.ds = load_dataset("HuggingFaceH4/OpenHermesPreferences", split="train")
# Get the categories of the source dataset
# ['airoboros2.2', 'CamelAI', 'caseus_custom', ...]
sources = ds.unique("source")
# Filter for a subset
ds_filtered = ds.filter(lambda x : x["source"] in ["metamath", "EvolInstruct_70k"], num_proc=6)