Model Card for Llama-3-OffsetBias-RM-8B

Llama-3-OffsetBias-RM-8B is a reward model trained on OffsetBias dataset. It is trained to be more robust on various evaluation biases commonly found in evaluation models. The model is introduced in paper OffsetBias: Leveraging Debiased Data for Tuning Evaluators.

Model Details

Model Description

Llama-3-OffsetBias-RM-8B uses sfairXC/FsfairX-LLaMA3-RM-v0.1 as base model, which is built with Meta Llama 3. An intermediate reward model is trained from from Llama-3-8B-Instruct using a subset of dataset used in training of FsfairX-LLaMA3-RM model, combined with NCSOFT/offsetbias dataset. The intermediate model is then merged with FsfairX-LLaMA3-RM model to create Llama-3-OffsetBias-RM-8B.

Developed by: NC Research
Language(s) (NLP): English
License: META LLAMA 3 COMMUNITY LICENSE AGREEMENT
Finetuned from model: sfairXC/FsfairX-LLaMA3-RM-v0.1

Model Sources

💻 Repository: https://github.com/ncsoft/offsetbias
📜 Paper: OffsetBias: Leveraging Debiased Data for Tuning Evaluators
🤗 Dataset: https://huggingface.co./datasets/NCSOFT/offsetbias

Uses

Direct Use

from transformers import AutoTokenizer, pipeline
import torch

model_name = "NCSOFT/Llama-3-OffsetBias-RM-8B"
rm_tokenizer = AutoTokenizer.from_pretrained(model_name)
rm_pipe = pipeline(
    "sentiment-analysis",
    model=model_name,
    device="auto",
    tokenizer=rm_tokenizer,
    model_kwargs={"torch_dtype": torch.bfloat16}
)

pipe_kwargs = {
    "return_all_scores": True,
    "function_to_apply": "none",
    "batch_size": 1
}

chat = [
 {"role": "user", "content": "Hello, how are you?"},
 {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
 {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

test_texts = [rm_tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False).replace(rm_tokenizer.bos_token, "")]
pipe_outputs = rm_pipe(test_texts, **pipe_kwargs)
rewards = [output[0]["score"] for output in pipe_outputs]

Evaluation

RewardBench Result

Metric	Score
Chat	97.21
Chat Hard	80.70
Safety	89.01
Reasoning	90.60

EvalBiasBench Result

Metric	Score
Length	82.4
Concreteness	92.9
Empty Reference	46.2
Content Continuation	100.0
Nested Instruction	83.3
Familiar Knowledge	58.3

Citation

@misc{park2024offsetbias,
      title={OffsetBias: Leveraging Debiased Data for Tuning Evaluators},
      author={Junsoo Park and Seungyeon Jwa and Meiying Ren and Daeyoung Kim and Sanghyuk Choi},
      year={2024},
      eprint={2407.06551},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

NCSOFT
/

Llama-3-OffsetBias-RM-8B