|
--- |
|
license: other |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
inference: false |
|
tags: |
|
- transformers |
|
- gguf |
|
- imatrix |
|
- gemma-2-9b-it-WPO-HB |
|
--- |
|
Quantizations of https://huggingface.co./wzhouad/gemma-2-9b-it-WPO-HB |
|
|
|
### Inference Clients/UIs |
|
* [llama.cpp](https://github.com/ggerganov/llama.cpp) |
|
* [KoboldCPP](https://github.com/LostRuins/koboldcpp) |
|
* [ollama](https://github.com/ollama/ollama) |
|
* [jan](https://github.com/janhq/jan) |
|
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui) |
|
* [GPT4All](https://github.com/nomic-ai/gpt4all) |
|
--- |
|
|
|
# From original readme |
|
|
|
gemma-2-9b-it finetuned by hybrid WPO, utilizing two types of data: |
|
1. On-policy sampled gemma outputs based on Ultrafeedback prompts. |
|
2. GPT-4-turbo outputs based on Ultrafeedback prompts. |
|
|
|
In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair. |
|
|
|
We provide our training data at [wzhouad/gemma-2-ultrafeedback-hybrid](https://huggingface.co./datasets/wzhouad/gemma-2-ultrafeedback-hybrid). |