duyntnet
/

gemma-2-9b-it-WPO-HB-imatrix-GGUF

Text Generation

gemma-2-9b-it-WPO-HB

Model card Files Files and versions Community

duyntnet commited on Jan 4

Commit

dada1a9

·

verified ·

1 Parent(s): 818ab23

Upload README.md

Files changed (1) hide show

README.md +32 -0

README.md ADDED Viewed

	@@ -0,0 +1,32 @@

+---
+license: other
+language:
+- en
+pipeline_tag: text-generation
+inference: false
+tags:
+- transformers
+- gguf
+- imatrix
+- gemma-2-9b-it-WPO-HB
+---
+Quantizations of https://huggingface.co/wzhouad/gemma-2-9b-it-WPO-HB
+### Inference Clients/UIs
+* [llama.cpp](https://github.com/ggerganov/llama.cpp)
+* [KoboldCPP](https://github.com/LostRuins/koboldcpp)
+* [ollama](https://github.com/ollama/ollama)
+* [jan](https://github.com/janhq/jan)
+* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
+* [GPT4All](https://github.com/nomic-ai/gpt4all)
+---
+# From original readme
+gemma-2-9b-it finetuned by hybrid WPO, utilizing two types of data:
+1. On-policy sampled gemma outputs based on Ultrafeedback prompts.
+2. GPT-4-turbo outputs based on Ultrafeedback prompts.
+In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.
+We provide our training data at [wzhouad/gemma-2-ultrafeedback-hybrid](https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid).