v2ray
/

DeepSeek-V3-FP16-Atten-NaN

Text Generation

Inference Endpoints

Model card Files Files and versions Community

v2ray commited on 21 days ago

Commit

fbc71cb

·

verified ·

1 Parent(s): 5c89bcb

Create README.md

Files changed (1) hide show

README.md +13 -0

README.md ADDED Viewed

	@@ -0,0 +1,13 @@

+---
+license: mit
+base_model:
+- deepseek-ai/DeepSeek-V3
+pipeline_tag: text-generation
+library_name: transformers
+---
+# DeepSeek V3 FP16 Atten NaN
+This is a minimal reproduceable sample to let the final layer of DeepSeek V3's attention output NaNs when using data type float16.
+Run the `run.py` to see the NaNs.
+Weights are converted to bfloat16 using the original float8 e4m3fn, then converted to float16, then extracted from the final layer's attention.