gobean
/

dolphin-2.9-llama3-8b.llamafile

Generated from Trainer

Model card Files Files and versions Community

gobean commited on Apr 21

Commit

f0dae06

•

1 Parent(s): e8a33a8

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ datasets:
 This is the [llamafile](https://github.com/Mozilla-Ocho/llamafile) for [Dolphin 2.9 Llama 3 8b](https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b).
-Quick tests show it's good but not as sharp as the base model, using just some few shot prompts looking for precision when asking about history and science. More tests will have to be done to compare this and WizardLM-7B to see how much the finetuning/new EOS did to Llama-3-8B.
 Notably, [cognitivecomputations](https://huggingface.co/cognitivecomputations) uses a single EOS token. This fixes the garbled output bug. Hooray! It may however prevent some intended behavior of Llama3's internal monologue/thoughts that adds to the model's apparent sharpness. Download Meta's original weights and load manually in python to see what it's capable of as a comparison. We're all awaiting any fixes to llama.cpp and/or the base gguf structure. In the meantime this dolphin is a good fix and excellent work.
@@ -29,7 +29,7 @@ conversion notes:
 I converted the original safetensors to f32 to preserve the fidelity from bf16, then quantized ggufs from there. Not sure what most ggufs on hf are doing if they don't say.
 size notes:
-Windows users, go for q3-k-m. Others, use the biggest one that works on your machine. FreeBSD users, you're the real heroes.
 I just copied the original model card this time.

 This is the [llamafile](https://github.com/Mozilla-Ocho/llamafile) for [Dolphin 2.9 Llama 3 8b](https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b).
+Quick tests show it's good but not as sharp as the base model, using just some few shot prompts looking for precision when asking specifics about methods in a process. More tests will have to be done to compare this and WizardLM-7B to see how much the finetuning/new EOS did to Llama-3-8B.
 Notably, [cognitivecomputations](https://huggingface.co/cognitivecomputations) uses a single EOS token. This fixes the garbled output bug. Hooray! It may however prevent some intended behavior of Llama3's internal monologue/thoughts that adds to the model's apparent sharpness. Download Meta's original weights and load manually in python to see what it's capable of as a comparison. We're all awaiting any fixes to llama.cpp and/or the base gguf structure. In the meantime this dolphin is a good fix and excellent work.
 I converted the original safetensors to f32 to preserve the fidelity from bf16, then quantized ggufs from there. Not sure what most ggufs on hf are doing if they don't say.
 size notes:
+Windows users, go for q3-k-s. FreeBSD users, you're the real heroes. Others, use the biggest one that works on your machine.
 I just copied the original model card this time.