v2ray
/

DeepSeek-V3-FP16-Atten-NaN

Text Generation

Inference Endpoints

Model card Files Files and versions Community

DeepSeek-V3-FP16-Atten-NaN / README.md

v2ray's picture

Create README.md

fbc71cb verified 2 months ago

|

history blame contribute delete

455 Bytes

	---
	license: mit
	base_model:
	- deepseek-ai/DeepSeek-V3
	pipeline_tag: text-generation
	library_name: transformers
	---
	# DeepSeek V3 FP16 Atten NaN
	This is a minimal reproduceable sample to let the final layer of DeepSeek V3's attention output NaNs when using data type float16.

	Run the `run.py` to see the NaNs.

	Weights are converted to bfloat16 using the original float8 e4m3fn, then converted to float16, then extracted from the final layer's attention.