File size: 455 Bytes
fbc71cb
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
---
license: mit
base_model:
- deepseek-ai/DeepSeek-V3
pipeline_tag: text-generation
library_name: transformers
---
# DeepSeek V3 FP16 Atten NaN
This is a minimal reproduceable sample to let the final layer of DeepSeek V3's attention output NaNs when using data type float16.

Run the `run.py` to see the NaNs.

Weights are converted to bfloat16 using the original float8 e4m3fn, then converted to float16, then extracted from the final layer's attention.