Where did the BF16 come from?

#10
by gshpychka - opened

As far as I'm aware, the original model was trained in FP8. You have a BF16 version here - where did the extra half of the model come from?

Maybe the additional bits are zero in the BF16 representation?

Maybe the additional bits are zero in the BF16 representation?

Maybe! Wouldn't that be crazy, though?

As far as I'm aware, the original model was trained in FP8. You have a BF16 version here - where did the extra half of the model come from?

Maybe the additional bits are zero in the BF16 representation?

We converted it to BF16 using Deepseek's instructions. You can read more about our process in our blogpost: https://unsloth.ai/blog/deepseekr1-dynamic

Also we uploaded the bf16 version here: https://huggingface.co./unsloth/DeepSeek-R1-BF16

We converted it to BF16 using Deepseek's instructions. You can read more about our process in our blogpost: https://unsloth.ai/blog/deepseekr1-dynamic

I couldn't find any mention of BF16 on that page. Which instructions are you referring to? And can you clarify - is the second 671GB just empty space? What's the rationale there?

Sign up or log in to comment