Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM

#62

by Bobcuicui - opened Jan 15

Jan 15

I use AutoModelForCausalLM.from_pretrained to load DeepSeek_V3, it raises below warning:

then I print the model state keys, it only has 60 layers, however, the deepseek v3 weight actual has 61 layers, the last layer is missing.
model = AutoModelForCausalLM.from_pretrained(
"path/to/deepseek_v3_bf16,
device_map="cpu",
torch_dtype="auto",
trust_remote_code=True,)
print(model.state_dict().keys())

How to fix it? Thank you~

cassanof

Jan 15

the 61st layer is the MPT layer, not actually part of the model

Bobcuicui

about 1 month ago

ok, I got it, Thank you~ @cassanof

Bobcuicui changed discussion status to closed about 1 month ago

ksyang

4 days ago

•

edited 4 days ago

Clarification: The model's state dict defines 61 layers (0 ~ 60) according to config.json, but the released safetensors contain tensors for 62 layers (0 ~ 61). The extra layer appears to be a Multi-Token Prediction (MTP) layer.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment