Where did the BF16 come from?
As far as I'm aware, the original model was trained in FP8. You have a BF16 version here - where did the extra half of the model come from?
Maybe the additional bits are zero in the BF16 representation?
Maybe the additional bits are zero in the BF16 representation?
Maybe! Wouldn't that be crazy, though?
As far as I'm aware, the original model was trained in FP8. You have a BF16 version here - where did the extra half of the model come from?
Maybe the additional bits are zero in the BF16 representation?
We converted it to BF16 using Deepseek's instructions. You can read more about our process in our blogpost: https://unsloth.ai/blog/deepseekr1-dynamic
Also we uploaded the bf16 version here: https://huggingface.co./unsloth/DeepSeek-R1-BF16
We converted it to BF16 using Deepseek's instructions. You can read more about our process in our blogpost: https://unsloth.ai/blog/deepseekr1-dynamic
I couldn't find any mention of BF16 on that page. Which instructions are you referring to? And can you clarify - is the second 671GB just empty space? What's the rationale there?