duyntnet's picture
Upload README.md
dada1a9 verified
---
license: other
language:
- en
pipeline_tag: text-generation
inference: false
tags:
- transformers
- gguf
- imatrix
- gemma-2-9b-it-WPO-HB
---
Quantizations of https://huggingface.co./wzhouad/gemma-2-9b-it-WPO-HB
### Inference Clients/UIs
* [llama.cpp](https://github.com/ggerganov/llama.cpp)
* [KoboldCPP](https://github.com/LostRuins/koboldcpp)
* [ollama](https://github.com/ollama/ollama)
* [jan](https://github.com/janhq/jan)
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
* [GPT4All](https://github.com/nomic-ai/gpt4all)
---
# From original readme
gemma-2-9b-it finetuned by hybrid WPO, utilizing two types of data:
1. On-policy sampled gemma outputs based on Ultrafeedback prompts.
2. GPT-4-turbo outputs based on Ultrafeedback prompts.
In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.
We provide our training data at [wzhouad/gemma-2-ultrafeedback-hybrid](https://huggingface.co./datasets/wzhouad/gemma-2-ultrafeedback-hybrid).