--- license: mit base_model: - deepseek-ai/DeepSeek-V3 pipeline_tag: text-generation library_name: transformers --- # DeepSeek V3 FP16 Atten NaN This is a minimal reproduceable sample to let the final layer of DeepSeek V3's attention output NaNs when using data type float16. Run the `run.py` to see the NaNs. Weights are converted to bfloat16 using the original float8 e4m3fn, then converted to float16, then extracted from the final layer's attention.