v2ray commited on
Commit
fbc71cb
·
verified ·
1 Parent(s): 5c89bcb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - deepseek-ai/DeepSeek-V3
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
+ ---
8
+ # DeepSeek V3 FP16 Atten NaN
9
+ This is a minimal reproduceable sample to let the final layer of DeepSeek V3's attention output NaNs when using data type float16.
10
+
11
+ Run the `run.py` to see the NaNs.
12
+
13
+ Weights are converted to bfloat16 using the original float8 e4m3fn, then converted to float16, then extracted from the final layer's attention.