Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ datasets:
|
|
21 |
|
22 |
This is the [llamafile](https://github.com/Mozilla-Ocho/llamafile) for [Dolphin 2.9 Llama 3 8b](https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b).
|
23 |
|
24 |
-
Quick tests show it's good but not as sharp as the base model, using just some few shot prompts looking for precision when asking about
|
25 |
|
26 |
Notably, [cognitivecomputations](https://huggingface.co/cognitivecomputations) uses a single EOS token. This fixes the garbled output bug. Hooray! It may however prevent some intended behavior of Llama3's internal monologue/thoughts that adds to the model's apparent sharpness. Download Meta's original weights and load manually in python to see what it's capable of as a comparison. We're all awaiting any fixes to llama.cpp and/or the base gguf structure. In the meantime this dolphin is a good fix and excellent work.
|
27 |
|
@@ -29,7 +29,7 @@ conversion notes:
|
|
29 |
I converted the original safetensors to f32 to preserve the fidelity from bf16, then quantized ggufs from there. Not sure what most ggufs on hf are doing if they don't say.
|
30 |
|
31 |
size notes:
|
32 |
-
Windows users, go for q3-k-
|
33 |
|
34 |
|
35 |
I just copied the original model card this time.
|
|
|
21 |
|
22 |
This is the [llamafile](https://github.com/Mozilla-Ocho/llamafile) for [Dolphin 2.9 Llama 3 8b](https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b).
|
23 |
|
24 |
+
Quick tests show it's good but not as sharp as the base model, using just some few shot prompts looking for precision when asking specifics about methods in a process. More tests will have to be done to compare this and WizardLM-7B to see how much the finetuning/new EOS did to Llama-3-8B.
|
25 |
|
26 |
Notably, [cognitivecomputations](https://huggingface.co/cognitivecomputations) uses a single EOS token. This fixes the garbled output bug. Hooray! It may however prevent some intended behavior of Llama3's internal monologue/thoughts that adds to the model's apparent sharpness. Download Meta's original weights and load manually in python to see what it's capable of as a comparison. We're all awaiting any fixes to llama.cpp and/or the base gguf structure. In the meantime this dolphin is a good fix and excellent work.
|
27 |
|
|
|
29 |
I converted the original safetensors to f32 to preserve the fidelity from bf16, then quantized ggufs from there. Not sure what most ggufs on hf are doing if they don't say.
|
30 |
|
31 |
size notes:
|
32 |
+
Windows users, go for q3-k-s. FreeBSD users, you're the real heroes. Others, use the biggest one that works on your machine.
|
33 |
|
34 |
|
35 |
I just copied the original model card this time.
|