Great reward model, what dataset did you use to train?

#1
by zolicsaki - opened

Specifically I was wondering if you trained it on lmsys chatbot arena conversations, because your model is performing so well when evaluated on those preferences. Thanks for the help!

https://huggingface.co./datasets/lmsys/chatbot_arena_conversations

InternLM org

Sorry for the late reply. We did use a portion of this dataset. We performed data cleaning and filtering, including removing toxic and unsafe data, to ensure quality and safety.

Sign up or log in to comment