license: mit | |
base_model: | |
- deepseek-ai/DeepSeek-V3 | |
pipeline_tag: text-generation | |
library_name: transformers | |
# DeepSeek V3 FP16 Atten NaN | |
This is a minimal reproduceable sample to let the final layer of DeepSeek V3's attention output NaNs when using data type float16. | |
Run the `run.py` to see the NaNs. | |
Weights are converted to bfloat16 using the original float8 e4m3fn, then converted to float16, then extracted from the final layer's attention. |