duyntnet's picture
Upload README.md
dada1a9 verified
metadata
license: other
language:
  - en
pipeline_tag: text-generation
inference: false
tags:
  - transformers
  - gguf
  - imatrix
  - gemma-2-9b-it-WPO-HB

Quantizations of https://huggingface.co./wzhouad/gemma-2-9b-it-WPO-HB

Inference Clients/UIs


From original readme

gemma-2-9b-it finetuned by hybrid WPO, utilizing two types of data:

  1. On-policy sampled gemma outputs based on Ultrafeedback prompts.
  2. GPT-4-turbo outputs based on Ultrafeedback prompts.

In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.

We provide our training data at wzhouad/gemma-2-ultrafeedback-hybrid.