【Q】shared_head weights of MTP

#68
by huang11 - opened

Dear Official Team,

I noticed the following description in the README: "shared_head: Shares parameters with the output Head of the Main Model weights."

However, when I printed out the parameters, I found that the values in model.layers.61.shared_head.norm.weight are not equal to those in model.norm.weight. Based on the description above, I understood that they should be the same, but the actual situation is different, which has left me puzzled.

Additionally, due to the lack of specific details about MTP (possibly referring to a certain model structure or technique) in the documentation, I would like to inquire whether the hidden_size passed between the main model and MTP, as well as between subsequent MTP layers, is before or after normalization? This point is crucial for understanding how the model works.

Thank you for your response.

Best regards

Sign up or log in to comment