--- license: other language: - en pipeline_tag: text-generation inference: false tags: - transformers - gguf - imatrix - gemma-2-9b-it-WPO-HB --- Quantizations of https://huggingface.co./wzhouad/gemma-2-9b-it-WPO-HB ### Inference Clients/UIs * [llama.cpp](https://github.com/ggerganov/llama.cpp) * [KoboldCPP](https://github.com/LostRuins/koboldcpp) * [ollama](https://github.com/ollama/ollama) * [jan](https://github.com/janhq/jan) * [text-generation-webui](https://github.com/oobabooga/text-generation-webui) * [GPT4All](https://github.com/nomic-ai/gpt4all) --- # From original readme gemma-2-9b-it finetuned by hybrid WPO, utilizing two types of data: 1. On-policy sampled gemma outputs based on Ultrafeedback prompts. 2. GPT-4-turbo outputs based on Ultrafeedback prompts. In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair. We provide our training data at [wzhouad/gemma-2-ultrafeedback-hybrid](https://huggingface.co./datasets/wzhouad/gemma-2-ultrafeedback-hybrid).