File size: 455 Bytes
fbc71cb |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
---
license: mit
base_model:
- deepseek-ai/DeepSeek-V3
pipeline_tag: text-generation
library_name: transformers
---
# DeepSeek V3 FP16 Atten NaN
This is a minimal reproduceable sample to let the final layer of DeepSeek V3's attention output NaNs when using data type float16.
Run the `run.py` to see the NaNs.
Weights are converted to bfloat16 using the original float8 e4m3fn, then converted to float16, then extracted from the final layer's attention. |