How to use:
# install open assistant model_training module (e.g. run `pip install -e .` in `model/` directory of open-assistant repository)
import model_training.models.reward_model # noqa: F401 (registers reward model for AutoModel loading)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
input_text = "<|prompter|>Hi how are you?<|endoftext|><|assistant|>Hi, I am Open-Assistant a large open-source language model trained by LAION AI. How can I help you today?<|endoftext|>"
inputs = tokenizer(input_text, return_tensors="pt")
score = rm(**inputs).logits[0].cpu().detach()
print(score)
wandb: https://wandb.ai/open-assistant/reward-model/runs/hdp2gnko checkpoint-10000
configuration:
oasst-rm-1-pythia-1.4b:
is_reward_model: true
pooling: last
datasets:
- oasst_export:
lang: "en,es,de,fr"
input_file_path: 2023-03-27_oasst_research_ready_synth.jsonl.gz
val_split: 0.1
- augment_oasst:
input_file_path: augmented_latin_cyrillic_oasst_2023-03-27.jsonl
- anthropic_rlhf:
fraction: 0.1
max_val_set: 1000
- shp:
max_val_set: 1000
- hellaswag:
fraction: 0.5
max_val_set: 1000
- webgpt:
val_split: 0.05
max_val_set: 1000
- hf_summary:
fraction: 0.1
max_val_set: 250
use_custom_sampler: true
sort_by_length: false
model_name: andreaskoepf/pythia-1.4b-gpt4all-pretrain
learning_rate: 8e-6
residual_dropout: 0.01
weight_decay: 0.0
dtype: float32
max_length: 2048
use_flash_attention: true
warmup_steps: 50
gradient_accumulation_steps: 4
per_device_train_batch_size: 1
per_device_eval_batch_size: 5
num_train_epochs: 2
eval_steps: 500
save_steps: 1000
- Downloads last month
- 5