Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM
I use AutoModelForCausalLM.from_pretrained to load DeepSeek_V3, it raises below warning:
then I print the model state keys, it only has 60 layers, however, the deepseek v3 weight actual has 61 layers, the last layer is missing.
model = AutoModelForCausalLM.from_pretrained(
"path/to/deepseek_v3_bf16,
device_map="cpu",
torch_dtype="auto",
trust_remote_code=True,)
print(model.state_dict().keys())
How to fix it? Thank you~
the 61st layer is the MPT layer, not actually part of the model
Clarification: The model's state dict defines 61 layers (0 ~ 60) according to config.json, but the released safetensors contain tensors for 62 layers (0 ~ 61). The extra layer appears to be a Multi-Token Prediction (MTP) layer.