metadata

library_name: transformers
license: other
base_model: meta-llama/Llama-3.2-1B
tags:
  - llama-factory
  - full
  - generated_from_trainer
model-index:
  - name: GuardReasoner 1B
    results: []
pipeline_tag: text-classification
language:
  - en
metrics:
  - f1

GuardReasoner 1B

This model is a fine-tuned version of meta-llama/Llama-3.2-1B via R-SFT and HS-DPO. It is based on the paper GuardReasoner: Towards Reasoning-based LLM Safeguards.

The training data of R-SFT can be found in GuardReasonerTrain.

@article{GuardReasoner,
  title={GuardReasoner: Towards Reasoning-based LLM Safeguards},
  author={Liu, Yue and Gao, Hongcheng and Zhai, Shengfang and Jun, Xia and Wu, Tianyi and Xue, Zhiwei and Chen, Yulin and Kawaguchi, Kenji and Zhang, Jiaheng and Hooi, Bryan},
  journal={arXiv preprint arXiv:2501.18492},
  year={2025}
}